Mr. Tappella’s response has some information that should be very encouraging and heartening to the depository library community. It also leaves some issues troublingly unaddressed.
Bulk Data Access to Legislative Information
First, it is wonderful to know that GPO is working with the Library of Congress, Congressional Research Service, the Law Library of Congress, and the Senate and House on the issue of access to bulk legislative data!
That news is important and significant. It is also very encouraging because it marks a new direction for dissemination of government information. Taken to its logical conclusion, this would mean that we will have a new route to obtaining government information. No longer will we be limited to information presented as web pages through government-built interfaces. No longer will we have to hope that web scraping will find all the information we want to gather or preserve. Raw information — once locked in the dark web of government databases — will be, potentially, available for libraries and others to download and repurpose.
Unfortunately, we can’t look for this right away. Congress has only asked for a report, not action. The report itself is due “within 120 days of the release of Legislative Information System 2.0.” Presumably that is a reference to a new version of the LIS that is currently only available within the legislative branch. I have not seen an announcement of a date for the release of a new version of the LIS, so it is not clear even when we can expect the report.
Nevertheless, it is certainly good to hear directly from Mr. Tapella that the task force working on this report will develop “a position on access to bulk data” and even intends to “work on making bulk data accessible.”
It is somewhat ironic that this long, drawn-out process itself demonstrates the need for bulk data access. Although there have been calls for bulk data access for years, it literally took a legislative directive to get GPO and LOC and CRS to take the tentative steps they are taking now: to “develop a position” and “work on” the problem. Such passivity and long delays are, perhaps, inherent in a large, bureaucratic system, but they are crippling when it comes to keeping up with technological changes. This demonstrates why it is essential for the government to provide easy, free, reliable access to the raw information of government: doing so will enable others — who can more quickly adopt new technologies — to provide better access to that information faster than the government can.
What about Non-Legislative Data?
It is also unfortunate that the task force is only looking at bulk delivery of legislative information. Will it take another legislative directive to get GPO to “develop a position” on bulk access to other data? See Bulk Data Downloads: A Breakthrough in Government Transparency (by Tim O’Reilly, O’Reilly Radar, Mar 4, 2009) for a short list of other other data for which we need bulk access.
Will GPO Support Collections in FDLP Libraries or Just Backups?
Mr. Tapella’s statement does not indicate that GPO has yet grasped the difference between ‘backups’ and digital deposit. GPO’s focus is apparently still on making sure that its own collection is functional rather than facilitating digital collections in FDLP libraries. The “geographically dispersed content repository” described by Mr. Tapella is only “our backup” designed to ensure GPO’s “continuity of operations” if GPO’s own data repository becomes inoperable. This is a good and necessary feature but it is only a backup for GPO and has nothing to do with digital deposit.
Although Mr. Tapella points out that FDsys supports “repositories that can accept data much like libraries today accept tangible publications distributed from GPO,” it seems clear that this generic design is intended as providing “backups” and would require “enhancements” to include bulk data access. This is a GPO-centric way of thinking. This is still a long way from GPO having a “position” on digital deposit and even further from “working on” making it possible.
Until GPO understands that it needs to support digital deposit so that FDLP libraries can build their own digital collections with their own functionality, FDLP libraries will not be partners in preservation and access; they will be, at best, little more than a backup for GPO.
APIs are not Digital Deposit
Mr. Tapella repeats the advantages of APIs, but fails to address the need for digital deposit. Providing APIs is not the same thing as providing digital deposit. As we have said in our original comment APIs are not magic. Each is a design for access and the product of choices made by the designer. Each has its own constraints built in. But don’t take our word for it; read what developers say about the constraints of using existing government APIs:
- Extracting Government Spending Data via Talend and Ruby into CouchDB, by Rohit Amarnath, Full360 (04/11/2009).
- Improve databases, By Joshua Tauberer, The Hill (06/12/07).
We love APIs! We think they are great! We want more! We are so very glad that GPO will support them at last! But, please, Mr. Tapella, understand that APIs and a web site are only two of the three parts of a complete access system. Bulk data access is essential and we’d like to hear that GPO is planning for it now.
OAIS is not Digital Deposit
We are so very happy that FDsys is based on OAIS. It is something we have long advocated. But, again, Mr. Tapella, please understand that telling us about your preservation system and your intentions to preserve information does not reassure us that everything will be preserved and freely available to everyone forever. As we pointed out in our original comments, regardless of your intentions and the quality of your system, GPO may not always have the funding, resources, or mandate to provide free, permanent, public access to all government information and we therefore cannot rely on it alone to do so. And no single digital archive or repository can ever be as secure and safe as multiple archives. We need digital deposit to guarantee preservation and free access.
The GPO-centric approach to preservation and access is like a medieval town that stores all of its grain in one barn. When lightening strikes, the whole town goes hungry. In this day and age of $200 terabyte hard drives, peer-to-peer networks, and successful preservation systems like LOCKSS, it concerns us greatly that you still don’t understand the need to have many collaborators working together to ensure long-term, free, public access.
There are a couple of sentences in Mr. Tapella’s reply that make me optimistic that GPO is on a path to change and does understand this need for collaborators. He says:
We need help from you and others in the community to help define future enhancements to access and data distribution. We see APIs as a one of the methods to provide advanced access tools, and realize that this is just one part of the ultimate solution.
To me, this says two important things: First, “data distribution” is on the GPO agenda, at least nominally; second, APIs are just one part of a bigger, ultimate, solution. This gives me hope for more. I hope I’m not reading too much into this.
- Bulk data and Legislative Information 2.0.
- Congress’ legislative information systems: THOMAS and the LIS by Jeffrey C. Griffith, Government Information Quarterly 18.1 (2001): 43-60. Apr 16, 2009
- Congressional Research Service Products: Taxpayers Should Have Easy Access, Project on Government Oversight, February 10, 2003.)
- Comparison of Legislative Resources on GPO Access and Selected Government and Non-Government Web Sites
- Remixes: Creative uses of free government information
- OpenHouse Project Op-Ed on Databases