Affirmative Disclosure of Government Information

John Wonderlich, a Program Director of the Sunlight Foundation and a great friend of libraries, has posted some useful suggestions over at The Sunlight Foundation Blog:

Obama and Affirmative Disclosure.

I really like John's concept of "affirmative disclosure." I think we could go even further by explicitly addressing the problems of long-term preservation caused by the shift to e-government.

I am starting from the assumption that society needs a reliable way to preserve an accurate, complete historical record. Unfortunately, the systems we have in place today makes it difficult, and in some cases impossible, to guarantee that we will preserve a record that is either complete or accurate.

Consider, for example, the recent case where researchers at the University of Illinois discovered that the White House removed original documents from its web site, altered them, and replaced them with backdated modifications that appear to be originals but are not.

Also consider the project of the Library of Congress, the California Digital Library, the University of North Texas Libraries, the Internet Archive and the U.S. Government Printing Office to try to capture web pages of the current administration by performing a "comprehensive crawl of the.gov domain."

These examples illustrate the problem of preserving the historical record.

The first shows how the historical record can easily be lost and altered (intentionally or unintentionally -- it doesn't matter which) by lack of accurate metadata (dates, versioning). The second shows the sad state of current preservation: the best record we will have of the government web will be a single, incomplete snapshot of the end of an eight year administration. (Harvesting is imperfect and incomplete: links can break, embedded content can be lost, databases can prohibit or inhibit crawls of their content, and crawls can only save a snapshot of dynamic sites.)

In essence, the government has made a major change in information policy by changing the technology of information dissemination and has done so without really examining the implications of the change or even acknowledging that a policy has changed.

What was the policy change? In the old policy, the role of government was to collect and assemble and edit and create information and then instantiate it in publications and distribute those instantiations to the public. At that point the role of preservation was in the hands of libraries (mostly FDLP libraries) and archives. But, in the new policy, the government does not actively distribute, but "posts" information on web sites where it is subject to alteration and removal without ever being instantiated anywhere. It is up to the public, consumer groups, individuals, libraries, and special projects to identify when information is posted or changed and then attempt to preserve that information. While that may succeed sometime, the approach has two fatal flaws. First, it is ad-hoc and therefore will almost certainly be incomplete at best. Second, it puts the responsibility of instantiation in the wrong hands: not those who create the information (the government) but those who "discover" the information. The government essentially is renouncing its responsibility to actively, affirmatively create a preseveable instance of the information it creates.

While some agencies (e.g. GPO, EIA) are saying that it is now their role to preserve information, other agencies (e.g., NARA) are actually narrowing their role in long-term preservation (notice that NARA is not participating in the ".gov crawl" and says explicitly that "most web records do not warrant permanent retention").

So, let's explicitly expand the idea of "affirmative disclosure" to include "active deposit." By that I mean that the government should be required to actively inform and distribute to the public notifications (metadata) and documents (data) every time a "document" is created or modified or superseded. "Deposit" could be accomplished with technology (e.g., RSS, APIs, OAI and OAI-PMH, etc.) and should be required to include dates and version information.

This is the right way to do this because it recognizes the appropriate roles for the different participants in the life cycle of information: government agencies create information products that are preservable and libraries and others preserve those products outside the .gov domain.

No votes yet

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

From Archival Standpoint, Most Records Not Valuable

"NARA is not participating in the ".gov crawl" and says explicitly that "most web records do not warrant permanent retention")."

In general, archival practice is that most records of any format do not warrant permanent retention. Here's an explanation from the Records Management Section of the Texas State Library and Archives (bolding mine):

Archival Assistance

No one can make a truly informed decision regarding the archival value of a single record series without taking into consideration the entirety of records produced by the concerned agency and others. There are ways to pare down the massiveness of the job, however.

A quick review of the Texas State Records Retention Schedule issued by the Texas State Library, State and Local Records Management Division, demonstrates that only a very small percentage of the general records series maintained by state agencies has been identified as appropriate for archival review and/or preservation. Although the volume of archival records varies with each agency, the most frequently cited estimate of government records having permanent value is between three and five percent of the total output.

It is the Archives and Information Services Division's hope that the creators of government records and the staff charged with records administration gain great satisfaction in knowing that once "their" records are accessioned into the archival holdings of the Texas State Library, they are available to all, in a reference room staffed specifically to provide assistance to researchers and security for the records.

This isn't necessarily a sinister point of view in light of how much emphemera, both online and off there is. But that's for records and not publications.

------------------------------------

"And besides all that, what we need is a decentralized, distributed system of depositing electronic files to local libraries willing to host them." -- Daniel Cornwall, tipping his hat to Cato the Elder for the original quote.

Harder to Identify what is valuable and what is not

Daniel,

Thanks for this qualification to my brief comment on archival value of the web. You are exactly right and NARA is trying hard to cope with a rapidly changing landscape. i think everyone would agree that most records of any type do not warrant retention. The link above to NARA's explanation of why they are not participating in the crawl of .gov is well reasoned.

There were two problems I was thinking about: 1) it is hard to identify what is and what is not valuable in a world when there is no instantiation to evaluate. 2) we do not have a good sense today of who is responsible for archiving what materials.

Post new comment

The content of this field is kept private and will not be shown publicly.
  • Lines and paragraphs break automatically.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Easily link to terms in various wikis. For help, see <a href="/interwiki/3">interwiki</a>.

More information about formatting options

Syndicate content