The Public Printer recently released GPO’s letter to the President regarding open government (PDF) (Robert C. Tapella, Public Printer, March 9, 2009). Since it specifically mentions FreeGovInfo, we feel the need to comment and contextualize a bit.
On the one hand, it’s great that GPO is reaching out publicly to offer infrastructural help with the government transparency initiative. We’re happy to assist in any way we can. We hope FDLP libraries will join GPO in such efforts.
On the other hand, FGI has always argued for a geographically dispersed system of local, official digital repositories, so we cannot support GPO’s goal 1 to make FDsys the official repository for Federal Government publications — unless it includes a network of distributed repositories modeled after the Federal Depository Library Program (FDLP). What we can support is FDSys as the official distribution channel for federal government publications.
It’s not a trivial distinction. “Repository” means that GPO assumes sole responsibility for preservation, a role not specified in legislation. “Distribution channel” means GPO continues its solid century and a half record of distributing information to other institutions which will continue their solid century and a half record of preserving government information for future use while making sure it remains freely available over the internet. Since digital deposit is currently #2 on The Sunlight Foundation’s Our Open Government List (OOGL) of top ideas for the President’s open government initiative, we can only assume that the public — or at least those that are most interested in government transparency — agrees that a geographically dispersed system is a key ingredient in government transparency.
We also believe it is important in discussions of transparency to plan for preservation of and long-term access to information. If, in concentrating on short-term access and on information-as-service, we fail to consider long-term access and instantiation of information for long-term preservation, we will inevitably lose information — and that would be bad for transparency.
Incomplete Access
We commend and support GPO for building APIs into FDSys. It is heartening and encouraging to see that GPO is publicly and officially proclaiming that “access” means more than providing a web site. But APIs and a web site are only two of the three parts of a complete access system. GPO has yet to acknowledge or even mention the third part of access: the provision of unfiltered bulk data access to government information.
A GPO web site can provide a human-friendly interface for the public and APIs can provide a computer-program-friendly way of querying, fetching, and using information. But, even taken together, these two access points provide only the government-approved, government-designed, government-hosted view of government information.
The problem with these government-only views of government information is that they are limited. No single provider (government or non-government) can provide unlimited access points or views or interfaces.
APIs are not magic. Each is a design for access and the product of choices made by the designer. Each has its own constraints built in. For example, an API might be tied to a particular agency or department, which would limit cross-agency utility. Or an API might be generalized to work across agencies or departments and thus lose rich access to agency-specific information content or structure.
One way to overcome these limitations is for the government to provide bulk data access. This means allowing the public to download raw content in bulk. Where web sites provide one “page” at a time and APIs can provide one or many “facts” at a time, bulk data access provides the raw information so that users can build their own collections, interfaces, and APIs.
This could improve access in ways that GPO could never hope to do all by itself. Imagine, for example, an agricultural library building a digital collection that contains agricultural reports, data, and audio visual content from the The Department of Agriculture, the EPA, the SBA, and NOAA combined with reports, maps, and GIS data from state and local government agencies and other content from its own institutional repository or university press. Then imagine that specialized digital collection having its own state-specific, agriculture-specific API and web site and bulk data access. Then imagine that these repositories are part of the rapidly expanding cloud and you get a sense of a rich govt information ecology.
Such scenarios are possible, but only if GPO and other government agencies make raw content easily, freely available in bulk for use and re-use and re-purposing. Providing only government web sites and government APIs without bulk data downloads and the ability for others to build collections for specific or general purposes will provide only a tiny fraction of open usability and transparency that we could have. There is nothing standing in the way of this happening today except the will of government agencies to make it happen.
Incomplete Preservation
The Public Printer’s letter glosses over the problems of long-term access and preservation.
Let’s be as clear as we can: we cannot and should not rely solely on GPO for long-term preservation and free access. The shift to digital does not change the methodology for long-term preservation and access. On the contrary, the tenuousness of digital information means that a distributed methodology is even more vital.
We cannot rely solely on GPO because the GPO Electronic Information Access Enhancement Act of 1993 does not even mention permanent access, nor does it guarantee that access will always be free. Indeed, the law specifically allows GPO to charge for access and even for use of its “directory” of information. The law also covers only “appropriate publications distributed by the Superintendent of Documents” — effectively excluding huge bodies of born-digital information from the scope of what is GPO is allowed to handle. Regardless of GPO’s intentions, there is no existing legislative mandate for GPO to provide free, permanent, public access to government information and we therefore cannot rely on it alone to do so.
We should not rely solely on GPO because no single digital archive or repository can ever be as secure and safe as multiple archives, libraries, and repositories. Even if GPO had a legislative mandate to provide permanent preservation and access (which it does not), and even if anyone could guarantee that GPO would always get adequate funding so that it never had to withdraw anything or charge for access for anything (which no one can), it would still be impossible to guarantee that GPO would never lose any information. The nature of digital information is that it can easily be corrupted, altered, lost, or destroyed. It can become unreadable or unusable without constant attention. Relying on any single entity is simply not as safe as relying on multiple organizations. It is more than a truism that Lots of Copies Keep Stuff Safe — safer than backups and “mirror sites.” But this is about more than redundant copies. It is also about relying on different organizations because they have different funding sources, different constituencies, different technologies, and different collections. No single digital collection can ever be as safe as multiple, reliable digital collections.
The good news
The good news is that there are existing organizations that can start working on this right away. There is nothing standing in the way of GPO and the existing FDLP libraries from implementing a digital depository system in which GPO enables FDLP libraries to download bulk data and build local digital collections.
There are existing technologies to facilitate this. The U.S. Government Documents Private LOCKSS Network is preserving “harvested” government information. [w:Peer-to-peer] (P2P) networks (like Napster and BitTorrent) have become increasingly popular because more and more people and some businesses have begun to realize that “distributed files” equals faster access and better preservation. (A geographically dispersed system of local, official digital repositories would be, for all intents and purposes, a P2P network.) Open source software for building digital repositories is widely available and increasingly easy to use.
Summary
APIs are good. They are a necessary part of adequate government information access. But digital distribution is also essential because only digital distribution will enable FDLP libraries and others to build new APIs, to de-ghettoize government information by better integrating it with non-government information, and to ensure long-term, free, public access and usability of government information.
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Latest Comments