John Wonderlich, a Program Director of the Sunlight Foundation and a great friend of libraries, has posted some useful suggestions over at The Sunlight Foundation Blog:
I really like John’s concept of “affirmative disclosure.” I think we could go even further by explicitly addressing the problems of long-term preservation caused by the shift to e-government.
I am starting from the assumption that society needs a reliable way to preserve an accurate, complete historical record. Unfortunately, the systems we have in place today makes it difficult, and in some cases impossible, to guarantee that we will preserve a record that is either complete or accurate.
Consider, for example, the recent case where researchers at the University of Illinois discovered that the White House removed original documents from its web site, altered them, and replaced them with backdated modifications that appear to be originals but are not.
Also consider the project of the Library of Congress, the California Digital Library, the University of North Texas Libraries, the Internet Archive and the U.S. Government Printing Office to try to capture web pages of the current administration by performing a “comprehensive crawl of the.gov domain.”
These examples illustrate the problem of preserving the historical record.
The first shows how the historical record can easily be lost and altered (intentionally or unintentionally — it doesn’t matter which) by lack of accurate metadata (dates, versioning). The second shows the sad state of current preservation: the best record we will have of the government web will be a single, incomplete snapshot of the end of an eight year administration. (Harvesting is imperfect and incomplete: links can break, embedded content can be lost, databases can prohibit or inhibit crawls of their content, and crawls can only save a snapshot of dynamic sites.)
In essence, the government has made a major change in information policy by changing the technology of information dissemination and has done so without really examining the implications of the change or even acknowledging that a policy has changed.
What was the policy change? In the old policy, the role of government was to collect and assemble and edit and create information and then instantiate it in publications and distribute those instantiations to the public. At that point the role of preservation was in the hands of libraries (mostly FDLP libraries) and archives. But, in the new policy, the government does not actively distribute, but “posts” information on web sites where it is subject to alteration and removal without ever being instantiated anywhere. It is up to the public, consumer groups, individuals, libraries, and special projects to identify when information is posted or changed and then attempt to preserve that information. While that may succeed sometime, the approach has two fatal flaws. First, it is ad-hoc and therefore will almost certainly be incomplete at best. Second, it puts the responsibility of instantiation in the wrong hands: not those who create the information (the government) but those who “discover” the information. The government essentially is renouncing its responsibility to actively, affirmatively create a preseveable instance of the information it creates.
While some agencies (e.g. GPO, EIA) are saying that it is now their role to preserve information, other agencies (e.g., NARA) are actually narrowing their role in long-term preservation (notice that NARA is not participating in the “.gov crawl” and says explicitly that “most web records do not warrant permanent retention”).
So, let’s explicitly expand the idea of “affirmative disclosure” to include “active deposit.” By that I mean that the government should be required to actively inform and distribute to the public notifications (metadata) and documents (data) every time a “document” is created or modified or superseded. “Deposit” could be accomplished with technology (e.g., RSS, APIs, OAI and OAI-PMH, etc.) and should be required to include dates and version information.
This is the right way to do this because it recognizes the appropriate roles for the different participants in the life cycle of information: government agencies create information products that are preservable and libraries and others preserve those products outside the .gov domain.