DttP letter to the editor re digital preservation of government information

Jim and I recently wrote a letter to the editor to the GODORT journal Documents to the People (DttP) (published in the Winter 2014 issue) entitled “Digital preservation deserves better coverage.” We post it here to FGI in the hopes that it will “clarify some of the issues and provide a more accurate and more understandable context for action by the GODORT community.” It’s not yet online at the DttP site, but will eventually be posted there. We’ll post a link to the DttP site when it’s online. I’ll be at ALA Midwinter conference next week in Chicago, so please track me down if you’d like to discuss. That is all.


In the Summer 2014 issue of Documents to the People (DttP), an article by Scott Casper, which was highlighted as a “feature,” offered a badly misleading, confoundingly misinformed, and confusingly written account of digital preservation. Digital preservation is an incredibly important topic for government information professionals and it deserves better treatment in DttP.

I think Casper must have had good intentions in writing his article, “Promoting Electronic government Documents: Part Four: Preservation.” Perhaps his intention was simply to promote the importance of digital government information, which is the theme of his series of articles, and the necessity of maintaining access to government information of all types. But whatever his intention was, he does a disservice by conflating important issues, confusing technical terms, and mostly ignoring the very important issue of digital preservation which is his ostensible topic.

It would not be useful to point out every error and misstatement in Casper’s article. There are so many, though, that we would guess that anyone who read his article would be left either confused or badly misinformed. So, instead of trying to correct every error or trying to figure out what he may have meant by every confusing statement, we think it would be more useful to define and describe and give some context to a few of the key concepts that Casper mentions. Our hope is that this will clarify some of the issues and provide a more accurate and more understandable context for action by the GODORT community.

Preservation of born-digital information is a very real and important topic that the government documents community needs to understand and address. DttP readers should be aware, for instance, that more government information is born-digital in a single year than all the printed government information that all FDLP libraries have accumulated in over 200 years. (See Born-Digital U.S. Federal Government Information: Preservation and Access prepared by James A. Jacobs for the Center for Research Libraries.)

Digitization of print information is not a preservation solution; rather, it creates new digital preservation challenges that have not yet been adequately addressed. While digitization offers many promises of better access such as better discoverability, easy accessibility, and enhanced usability, and even a potential form of “preservation” (by protecting fragile paper documents from damage through use), the simple act of converting a paper document into a digital object does not automatically deliver any of those promises. In fact, digitization is only the first of many costly and technically challenging steps needed to ensure long-term access to content. (See Wait! Don’t Digitize and Discard! A White Paper on ALA COL Discussion Issue #1a. and Digitization does not magically preserve paper.)

Access is not preservation. The word “access” is too often used as a buzzword that hides and obscures a number of underlying issues. It is often conflated with preservation as if the two were the same. In fact, they are two very different things that require very different actions. Like two spouses, they are very different but intimately related. So, when we hear the word “access” used, we should always remember two things: First, access without preservation is temporary, at best. Providing access does not guarantee preservation or long-term access — much less free access. Too often libraries are willing to replace public domain collections with “just in time” fee-based access that is encumbered by licensing and DRM restrictions. In our digital age we often see access promoted as a desirable goal in itself, only to see once “accessible” documents suddenly disappear from the web. “Access” without trusted, long-term, reliable preservation is more like a Kmart blue-light special (“Get it while you can! It won’t be here long!”), than a long-term library service. Second, preservation without access is an illusion. As Paul Conway said, “In the digital world, the concept of access is transformed from a convenient byproduct of the preservation process to its central motif.” See Preservation in the Digital World by Paul Conway and The value in being a depository library.

Digital preservation is an essential activity of libraries. Casper fails to recognize this fact when he describes the good work of the EDI (Electronic Documents of Illinois) project without mentioning that it is a service of the Illinois State Library (http://iledi.org/). Digital preservation takes resources and a long-term commitment, but it also takes a very specific understanding of the long-term value of information (even information that is not popular or used by many people), and a commitment to the users of information. These are the strengths of libraries. Digital preservation is not something that can be cavalierly dismissed as the responsibility of others. (See: Preservation for all: LOCKSS-USDOCS and our digital future by James R. Jacobs and Victoria Reich in Documents to the People, Volume 38:3, Fall 2010).

Relying solely on the government to preserve its information is risky. Casper almost recognizes this when he cites the defunding of the Census Bureau’s Statistical Compendia unit and the cessation of the publication of the Statistical Abstract. But this is an example of an agency ceasing to create new information, not an example of an agency failing to preserve already created information. (So far, the Bureau has preserved old digital editions of Statistical Abstract and maintained online access to them.) Worse, Casper calls the privatization of the Statistical Abstract a “happy postscript.” Privatization of public information is hardly something that government documents librarians should be happy about. And it is hard to understand how relying on for-profit companies can be considered a good way to guarantee the preservation of the information or free access to it. Casper misses the opportunity to show that, when we rely only on government to preserve the digital information it creates, it becomes very easy for economics or politics or technology or bureaucracy to result in the loss of information. (See: When we depend on pointing instead of collecting and Government Link Rot and Information is not a Service, Service is not Information and Less Access to Less Information By and About the U.S. Government and Government Documents at the Crossroads.)

Casper does ask the right question early on in his article: “Who is responsible for this preservation?” But the only answer he seems to give is that “there are no answers.” But Casper is wrong. There is an answer and it is right in front of our eyes: libraries should take this responsibility. There are many actions that libraries can take now to promote digital preservation of government information at all levels of government (this is not just a federal issue!).

Preserve Paper copies. The FDLP is successfully preserving documents that were released in paper (and microfiche) quite nicely. We often hear that “digitizing” paper documents will “preserve” them, but we do not need to convert these documents to digital in order to preserve them. Digitization can provide better access and (if proper care and resources are invested in the digitization) increase the flexibility, usability, and re-usability of many documents. But digitization alone does not guarantee the preservation of the content. Worse, there are repeated calls for digitizing paper collections so that the paper collections can be discarded and destroyed. Such actions will endanger preservation of the content if they do not include adequate steps to ensure digital preservation of those newly created digital objects. Given that paper documents do not present a current preservation problem, and given that there is an enormous body of born-digital documents being created that do present a current preservation problem, one thing we can do is avoid creating new problems with proposals to destroy and discard paper collections before we have solved the problems of preservation of born digital documents. (We can still digitize paper documents in order to enhance access, but we should not use digitization as an excuse to discard or destroy the paper originals.) (See Wait! Don’t Digitize and Discard! A White Paper on ALA COL Discussion Issue #1a.)

Move FDsys forward. GPO is doing a good job of capturing born-digital Congressional information (not digitized material as Casper mistakenly points out) and is doing an increasingly good job of capturing Judicial Branch documents. The FDsys system is apparently well designed for long term preservation, too. There are, however, two things that FDLP librarians can do now: First, we can encourage GPO to get FDsys certified as a Trusted Digital Repository. This has been on GPO’s agenda for a few years, but budget uncertainties have delayed it. It would help if GPO heard from the FDLP community that this should be a high priority. Second, even if FDsys gets certified, we need more than one copy of FDsys in the hands of a single government agency in order to reduce the risk of loss of that content. There are several ways the FDLP community can further this goal: Encourage more libraries to become LOCKSS-USDOCS partners; Suggest to GPO that it allow the Internet Archive to crawl FDsys systematically; Investigate partnerships with other government agencies such as NARA (could NARA become a LOCKSS-USDOCS partner?); explore partnerships with the Digital Preservation Network; Create records for the Digital Public Library of America that point to LOCKSS-USDOCS copies when they are made publically accessible; follow up on the digital preservation recommendations in the NAPA report, Rebooting The Government Printing Office: Keeping America Informed in the Digital Age. (Full disclosure: James A. Jacobs has done technical consulting work for the Center for Research Libraries in its certification of digital repositories.)

Preserve More Documents of Executive Agencies. So much that is born-digital is produced by executive department agencies and is not captured by GPO. These are the new fugitive documents (those that are in scope of the FDLP but fall through the cracks; GPO PURLs are not fugitives). To be sure, this needs much more attention by GPO and depository libraries. FDLP libraries should concentrate on collecting born-digital fugitive documents and should work with GPO to develop a plan that focuses on developing programs that are attractive to agencies and that benefit agencies. This needs to be a higher priority for GPO with an increased focus and increased resources. GPO has the infrastructure in place (FDsys) to offer great benefits to agencies and this would help reduce agency fugitives.

Get Digital Deposit. FDLP libraries need to insist that GPO modify its long-outdated and counter-productive Superintendent Of Documents Policy Statement 301 (SOD 301) that limits deposit of digital information to so-called “tangible” products. This policy never made sense — it was nominally supposed to be a response to born-digital information, but instead of acknowledging that GPO could deposit born-digital information with libraries, it created a two-tier structure that authorized it to deposit some and prohibited it from depositing other digital information. SOD 301 says that it is ok for GPO to deposit digital information on “tangible” media, but not ok to deposit “online” digital information. But, worse than not making sense, this policy is actually harmful to digital preservation in two ways. First, it only allows deposit of those digital items that are least preservable and most prone to physical deterioration and file format obsolescence (floppies, CD-ROMs, DVDs, etc.). This burdens depository libraries with an almost impossible task of preservation and access. Second, it prohibits deposit of raw digital information in formats that are more easily preserved and less likely to become obsolete (digital object files in PDF, text, HTML, XML, etc.). These are the digital objects that could have been easily distributed more cheaply and more reliably than “tangible” media. These are the digital objects that FDLP libraries could have been preserving and making accessible (during government shutdowns, for example) — the very kind of digital objects that GPO now enthusiastically distributes to the LOCKSS-USDOCS private network. The effect of this policy has been to delay the active participation of FDLP libraries in digital preservation. There was never a good justification for this policy, but now it is so obviously out-of-date and has failed so demonstrably that keeping it is place should be considered an act of negligence. (See From Production to Preservation to Access to Use: OAIS, TDR, and the FDLP.)

Smart-Archive the Web. Although capturing web pages and preserving them is far from an adequate (or even accurate) form of digital preservation, it is a useful stop-gap until producers understand that depositing preservable digital objects with trusted repositories is the only way to guarantee preservation of their information. Therefore, FDLP libraries should use web archiving tools, including services such as Archive-It (as Casper points out, if in a confusing way). Every FDLP library should at least consider “smart-archiving” of web-based information. Web-archiving should not be seen as everything-or-nothing: libraries can do focused selection to build collections useful to their own users. This is smart-archiving. Selections can be large (an agency or a domain) or small (crawl a few seeds) or even one-document-at-a-time. Examples of these models exist. See, for example: the Chesapeake project, the work of The Columbia Libraries (The Integrity of Research Is at Risk: Capturing and Preserving Web Sites and Web Documents and the Implications for Resource Sharing), the California Digital Library Web Archiving Services(**), and the Stanford Libraries EEMs project (Everyday Electronic Materials in Policy and Practice).

Promote Digital Preservation. Casper’s series of articles is about “promotion” of government information and his recommendation in this article about preservation is that we should “keep promoting these online sources.” He should have stressed the most important promotion that is needed today: the promotion of the role of FDLP libraries in actively preserving digital government information. The time when FDLP libraries could be passive in digital preservation is long past. The time when FDLP libraries could look to others to take care of digital preservation of government information is long past. FDLP libraries can work with others, but we must actually work with them, not leave the work to them.

**Editor’s note: CDL recently announced that the WAS service and collections was being transitioned over the Internet Archive’s Archive-it service.

Warmest Year Ever

Document of the Day. According to NOAA (The National Oceanic and Atmospheric Administration), “The year 2014 was the warmest year across global land and ocean surfaces since records began in 1880.”

This website has Maps and Time Series, tables and graphs and lots of references.

“What the Web said yesterday” and IRC chat on digital collection development

The New Yorker has an interesting piece by Jill Lapore, “What the Web Said Yesterday” which explores the issue of internet preservation and highlights the important work being done by the Internet Archive. Vint Cerf, Chief Internet Evangelist at Google, says we need “digital vellum” or the “twenty-first century will become an informational black hole.”

The International Internet Preservation Consortium is working in this space — and in fact the 2015 IIPC general assembly will be held at Stanford University! — but I believe there’s a need for more libraries and especially more subject librarians to be working in this space. That’s why I announced a virtual discussion session on digital collection development for govt information librarians on Wednesday January 21st at 9am PST / 12 noon EST. I’ll be on IRC (irc.freenode.net) #FDLP channel. I hope you’ll pop in to discuss how to do digital collection development, fugitive hunting, web harvesting etc.

The average life of a Web page is about a hundred days. Strelkov’s “We just downed a plane” post lasted barely two hours. It might seem, and it often feels, as though stuff on the Web lasts forever, for better and frequently for worse: the embarrassing photograph, the regretted blog (more usually regrettable not in the way the slaughter of civilians is regrettable but in the way that bad hair is regrettable). No one believes any longer, if anyone ever did, that “if it’s on the Web it must be true,” but a lot of people do believe that if it’s on the Web it will stay on the Web. Chances are, though, that it actually won’t. In 2006, David Cameron gave a speech in which he said that Google was democratizing the world, because “making more information available to more people” was providing “the power for anyone to hold to account those who in the past might have had a monopoly of power.” Seven years later, Britain’s Conservative Party scrubbed from its Web site ten years’ worth of Tory speeches, including that one. Last year, BuzzFeed deleted more than four thousand of its staff writers’ early posts, apparently because, as time passed, they looked stupider and stupider. Social media, public records, junk: in the end, everything goes.

Web pages don’t have to be deliberately deleted to disappear. Sites hosted by corporations tend to die with their hosts. When MySpace, GeoCities, and Friendster were reconfigured or sold, millions of accounts vanished. (Some of those companies may have notified users, but Jason Scott, who started an outfit called Archive Team—its motto is “We are going to rescue your shit”—says that such notification is usually purely notional: “They were sending e-mail to dead e-mail addresses, saying, ‘Hello, Arthur Dent, your house is going to be crushed.’ ”) Facebook has been around for only a decade; it won’t be around forever. Twitter is a rare case: it has arranged to archive all of its tweets at the Library of Congress. In 2010, after the announcement, Andy Borowitz tweeted, “Library of Congress to acquire entire Twitter archive—will rename itself Museum of Crap.” Not long after that, Borowitz abandoned that Twitter account. You might, one day, be able to find his old tweets at the Library of Congress, but not anytime soon: the Twitter Archive is not yet open for research. Meanwhile, on the Web, if you click on a link to Borowitz’s tweet about the Museum of Crap, you get this message: “Sorry, that page doesn’t exist!”

State Agency Databases Project Activity Report 1/15/2015

Welcome to the first State Agency Databases Project report of 2015!

ORPHAN – Nebraska

At the start of each new year we click on the “history” tab of each state page and check when it was last updated. If it hasn’t been updated during the previous calendar year, we let that volunteer go and put out the page for adoption.

Historically, I’ve had to fill six or seven pages each year. This year was different — we only had two “orphan pages” and one got adopted before I could get this report out. So I am happy to report that the sole orphan page of 2015 is Nebraska.

If you are interested in being the documents specialist for the Nebraska page, please read through our Volunteer Guide. If you feel you can carry out the listed duties please contact me at danielcornwall AT gmail DOT com.



The first few weeks of 2015 saw significant activity at the State Agency Databases Project at http://wikis.ala.org/godort/index.php/State_Agency_Databases . The following pages saw a significant number of changes. (links are to revisions page, click on “page” tab to see regular page):

  • District of Columbia – Susan Paterson
  • Montana – Susanne Caro
  • New Mexico – Susanne Caro
  • Ohio – Kirstin Krumsee
  • Not Databases” – Resources that are either databases of state information NOT produced by state agencies, or resources from state agencies that are not databases.

One other change we made was to our Prisoner Locater page, our most popular subject collection for several years running. Although it was technically outside the scope of our project, we added the Federal Inmate Locator so that regular people using the page to track down a friend, loved one or other person of interest had easy access to both state and federal locator services.

You can always view ALL changes made in the past 14 days by visiting http://tinyurl.com/statedbs14d.

As a reminder, all of the links and text in the State Agency Databases Project is available under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 license. We strongly encourage the use of our links and annotations in projects of your own.

Webinar on fugitive documents: notes and links

These are notes and links and resources mentioned in our webinar on fugitive government documents that Jim and I presented for the “Help! I’m an Accidental Government Information Librarian” webinar series:


Process for “fugitive hunting:”

In the paper era, FDLP librarians would subscribe to mailing lists and make personal contacts with local/regional offices of Federal agencies (EPA, Forest Service and the like) in order to make sure their libraries were collecting all documents in scope of the FDLP. Fugitives in the paper era numbered in the 10s/year. As James A. Jacobs noted in his presentation, the scope of born-digital documents from Federal agencies demands a collaborative, FDLP community-wide, large-scale fugitives project:

  1. keep track of agencies
  2. use tools like Update Scanner firefox plugin to keep track of when a federal agency’s site is changed and when individual documents are published.
  3. Delve into the “dark web:”
    1. create a list of known federal dbs
    2. analyze the dbs to find static url structures
    3. Report fugitive documents (see #5)
  4. Check GPO’s Catalog of Government Publications (CGP) to see if the new publications have been cataloged.
  5. Report fugitive documents to GPO and to the LostDocs blog
  6. Join the “Everyday Electronic Materials” Zotero group and help us test out a newer, faster, more automatic fugitive document workflow!!
  7. Lather, rinse, repeat!

Code Snippets

<form action="http://www.archive-it.org/public/search">
<input type="hidden" name="collection" value="***COLLECTIONID***" />
<input type="text" name="query" />
<input type="submit" name="go" value="Go" />

<form action="http://www.archive-it.org/public/search">
<input type="hidden" name="collection" value="***COLLECTIONID***" />
<input type="text" name="query" />
<input type="submit" name="go" value="Go" />

Note: You can search across the Stanford Archive-It collections via https://archive-it.org/organizations/159. For the Search form to work, you’ll need to edit the ***COLLECTIONID*** to in 2 places with the proper ID:

—Bay Area governments = 903
—Climate Change = 1064
—CRS reports = 1078
—FRUS = 1515
—FOIA = 924
—Fugitives = 2361


Baldwin, Gil. 2003. Fugitive Documents – On the Loose or On the Run. Presentation by Director, Library Programs Service, GPO American Association of Law Libraries Conference Seattle, WA, July 15, 2003. Administrative Notes Vol. 24, no. 10 (August 15, 2003).

Bower, Cynthia. Federal Fugitives, DND, and other Aberrants: a Cosmology. Documents to the People v17 n3 (Sep 1989) p.120–126.

Chesapeake Digital Preservation Group. 2014. “Link Rot” and Legal Resources on the Web: A 2014 Analysis by the Chesapeake Digital Preservation Group

DiMario, Michael F. 1997. PUBLIC PRINTER. Prepared Statement Before The Subcommittee On Legislative Branch Appropriations Committee On Appropriations U.S. Senate On Appropriations Estimates For Fiscal Year 1998. (JUNE 5, 1997)

FDsys Collections

Jacobs, James A. 2014. Born-Digital U.S. Federal Government Information: Preservation and Access. 2014. Report prepared for Leviathan: Libraries and Government Information in the Era of Big Data, CRL (April 25, 2014). Also see: Government Records and Information: Real Risks and Potential Losses. [Presentation slides and audio recording] and Speaker notes, additional links, examples, and accompanying material.

Kott, Katherine B. 2010. Everyday Electronic Materials in Policy and Practice. CNI Fall 2010 Project Briefings.


Shaw, Thomas Shuler. 1966. Library Associations and Public Documents, Library Trends (July,1966) p167–177.

Stanford University, Social Sciences Resource Group. Archive-It collections.

U.S. Bureau of Justice Assistance. Increasing analytic capacity of state and local law enforcement agencies…

U.S. Code. Title 44

U.S. Department of State. Keystone XL Pipeline Project Final Supplemental Environmental Impact Statement (SEIS).

U.S. President.Executive Order 13662.
Other copies: White House, Federal Register, Federal Register printer-friendly, GPO Federal Regsiter PDF, GPO Federal Register html, GPO html, GPO mods, GPO Premis, GPO zip

U.S. White House. The White House current third party (social media) pages / accounts

Zotero Group: Everyday Electronic Materials

Selected Technologies and Infrastructures