Home » Commentary » A comment on government contracts and harvesting

Our mission

Free Government Information (FGI) is a place for initiating dialogue and building consensus among the various players (libraries, government agencies, non-profit organizations, researchers, journalists, etc.) who have a stake in the preservation of and perpetual free access to government information. FGI promotes free government information through collaboration, education, advocacy and research.

A comment on government contracts and harvesting

Over the past week, there have been some good conversations about government contracts to digitize government information and the National Archives decision to not conduct a web harvest or snapshot at the end of the current Administration. There is good news and bad news.

The good news
The good news is that NARA’s decision was not nearly as bad as it appeared to be when it was first announced in a memo on March 27, 2008, which was circulated only to Federal records officers (see: The National Archives Is Quietly Destroying Millions of Documents). In a thoughtful post on its web site (National Archives and Records Administration Web Harvest Background Information, April 15, 2008, NARA; pdf version available), NARA outlines in detail the reasons why it would not conduct an end of administration web snapshot or harvest of Executive Branch websites nor require agencies to do so. The reasons, I think, are sound and in keeping with NARA’s commitment to preserving information of historical value.

In addition, the NARA memo of April 15 makes explicit the fact that its decision and memo of March 27 do not apply to Presidential records or to records of the Congress. It says that “NARA will continue to conduct a web harvest of Congressional web sites” and that NARA “will also receive a snapshot of the White House website” noting that “Unlike Federal agencies governed by the Federal Records Act, the White House is governed by the Presidential Records Act, under which all Presidential records are treated as permanent and transferred to NARA for preservation at a Presidential Library.”

The NARA “Background Information” document is also, I think, worth reading for its clear description of the shortcomings of web harvests in general. I think it is very useful for us to be reminded of these shortcomings to the extent that we believe we can rely on them as an adequate form of preservation.

In more good news, the NARA/TGN contract is not as bad as it could have been. I mentioned this in my earlier post here (The NARA/TGN contract as a bad precedent) and similar comments have been made in the useful and interesting thread over at ArchivesNext (NARA latest digitization agreement: One archivist’s perspective). Merrilee Proffitt, of RLG, says in a comment there that the NARA model for contracts with third parties “actually comes out looking pretty good” when compared to the criteria described in the RLG paper Good Terms – Improving Commercial-Noncommercial Partnerships for Mass Digitization (by Peter B. Kaufman and Jeff Ubois, D-Lib Magazine, November/December 2007, Volume 13 Number 11/12).

The bad news
The bad news, as James pointed out this morning, is that the GAO contract for digitizing is very bad indeed (GAO *did* sell exclusive access to legislative history to Thomson West). Quoting Carl Malamud, James notes that GAO gets access to the digitized data but does not get a copy of its own; the rest of the government doesn’t even get access to the data. The public is left with the option of going to GAO headquarters and paying 20 cents per page to copy paper! As Carl says, “This is one of those deals where the public domain got sold off.”

This morning there was more bad news. Kate at ArchivesNext reports that the Citizens for Responsibility and Ethics in Washington (CREW) has a new report Record Chaos: The Deplorable State of Electronic Record Keeping in the Federal Government, that concludes “that the federal government is severely mismanaging its electronic records.” CREW also says that a House Committee proposal to amend federal record keeping laws “is anemic and fails to make the substantial changes necessary to bring the federal government into the 21st century.”

And even the good news is tempered by the fact that we have less than we could and are a long way from an even an adequate system of permanent preservation of digital information or a long-term solution to digitizing non-digital information. We will have to hope that the White House will deliver a snapshot of the White House web site and that the snapshot will be accurate and complete. The behavior of the White House with regard to electronic records and email does not make us optimistic. The NARA/TGN deal is better than the GAO/Thomson deal, but still leaves much to be desired and, as pointed out even by defenders of the deal, it is unlikely that we will ever have free, open, networked access to the digital information that TGN digitizes. That means the real effect of the deal is to privatize the information.

For me, the biggest disappointment in these latest developments is that librarians and archivists seem to be too willing to accept “good enough” and not willing enough to argue harder for “better.” There are lots of people who have good reason to argue for less access, more fees, less privacy, and more control of information, but librarians and archivists should not be among them. I believe that we should not spend time making the case for the private sector; it is fully capable of making its own case. We should spend our time fighting for free, full, open, public access, usability of information, and long term preservation.

The primary mission of private sector companies is to make money, not to serve the public. They may serve the public as a by-product of making money, but no for-profit company will go to its owners and say “we are going to do the best thing for public access” without the qualification “that will make us money.” Unfortunately “making money” often conflicts with public access. Politicians (and some bureaucrats) will argue for greater control of government information; some will argue for secrecy of government information on the one hand and privacy-invading policies on the other. Most government agencies do not have information access or long term preservation of their information as a primary mission and the exceptions are notable (e.g. LOC, NARA).

In contrast, the primary mission of many libraries and archives is to provide free public open access with long term preservation and usability. While others may have some of those pieces as secondary goals, few if any have them all. For many libraries and archives these goals are not just their primary mission but their defining characteristic.

While digitization and digital preservation are neither easy nor inexpensive, that doesn’t mean that we have to pay any and all costs for them. The digital era should be making it possible to provide better access without giving up free use and reuse, without giving up open access, without turning over control to those whose primary mission is something other than free, open, public access and long term preservation. But increasingly we see a combination of politics and economics leaving us with contracts that trump copyright and fair use, with “access” being negotiated at almost any cost (including loss of control), with DRM technologies that prohibit easy (or any) reuse, and with privacy protections being deprecated or even ignored. Even in the case of the NARA/TGN contract that is legally “better” than the GAO/Thomson contract, we are left with the effect of two-tiers of access and network access being essentially privatized and fee-based.

I believe that librarians and archivists should be pushing the boundaries and insisting for more and better, not accepting some benefits by negotiating away the big benefits we could be getting in the digital age. This is particularly important for government information that is in the public domain. If we can’t make this work for public information that is not copyrighted, how will we be able to do so for information that is?

I’m not arguing for a perfect, ideal world that is impractical to achieve. I am suggesting that we should fight for everything we can get. We should celebrate when we make inroads with a contract (like NARA/TGN) that is better than the others (like GAO/Thomson) but we should do so by committing to doing better next time. We should not accept this as “good enough” — because it is not and we can do better next time. In fact, every time we accept a less-than-perfect deal as “good enough,” we make it a little harder to make a better deal next time. We lower the bar if we accept “good enough” and stop trying to achieve better. We should not take the time to convince ourselves or the public that this is as good as we can get; we should take that time to admit to the limitations and trade offs and to commit to doing better next time.

There is lots written these days about “the future of libraries” and “the role of libraries in the digital age” and many people openly wonder if there is a place for libraries at all. I think there are several places where libraries have a unique role to play in society and the areas of digitization and digital access and preservation are important ones.

We need to make the case for the public; for free, open, public access; for long-term preservation and usability; for public accountability in the control of information; for reader privacy. Librarians and archivists have a unique role in doing that. In doing so, we will face an uphill battle and trade offs, but we should never lose sight of our unique role in society. We should never cheapen our professions by making the case for less (there are plenty of people to do that). We should always make the case for more. We will not always succeed and we will have to make trade offs. But we should always do so in the context of staking out a territory that is different from the private sector and those who are willing to get less. We should stand up for rights that others are not willing to fight for. We must fight for it when there are so many forces aligned against free, open access.

I’d like to see us emulate Carl Malamud and CREW and Brewster Kahle more and do less of making excuses for TGN and Thomson.

CC BY-NC-SA 4.0 This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.


  1. I’d like to nominate the Energy Information Administration (http://www.eia.doe.gov/) as another exception to your statement, “Most government agencies do not have information access or long term preservation of their information as a primary mission and the exceptions are notable (e.g. LOC, NARA).”

Leave a comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.