Abbey Potter’s comments about preserving digital news are also very relevant to the preservation of government information.
Potter is the Program Officer with the the National Digital Information Infrastructure and Preservation Program (NDIIPP). In her post on The Signal blog, she elaborates on her closing keynote address at the Dodging the Memory Hole II: An Action Assembly meeting in Charlotte NC last month.
Dodge that Memory Hole: Saving Digital News by Abbey Potter, The Signal Library of Congress Digital Preservation Blog (June 2, 2015).
Take Action: What comes next? Closing Keynote, Dodging the Memory Hole II: An Action Assembly Charlotte, NC, (May 12, 2015).
She quotes a presentation by Andy Jackson of the UK Web Archive in which he addresses the questions: “How much of the content of the UK Web Archive collection is still on the live web?” and “How bad is reference rot in the UK domain?”
- Ten years of the UK Web Archive: What have we saved? [ppt] Andy Jackson International Internet Preservation Consortium, General Assembly 2015 (April 27, 2015).
By sampling URLs collected in the UK Web Archive, Jackson examined URLs that have moved, changed, or gone missing. He analyzed both link rot (a file gone missing) and content drift (a file that has changed since being archived). He shows that 50 percent of content had gone, moved, or changed so as to be unrecognizable in only one year. After three years the figure rose to 65 percent.
Potter says that it is safe to assume that the results would be similar for newspaper content on the web. It would probably also be similar for U.S. government web sites.
What can we learn from this and what can we do? For newspapers, Potter says, libraries have acquisition and preservation methods that are too closely linked to physical objects and that too often exclude digital objects. This results in libraries having gaps in their collections – “especially the born-digital content.” She summarizes the problem:
Libraries haven’t broadly adopted collecting practices so that they are relevant to the current publishing environment which today is dominated by the web.
This sounds exactly like what is happening with government information.
First, because GPO has explicitly limited actual deposit of government information to so-called “tangible” products (Superintendent Of Documents Policy Statement 301 [SOD 301]). This policy does exactly what Potter says is wrong: it establishes collecting practices that are not relevant to the current publishing environment. (See more on the effects of SOD 301 here.)
Second, because most of the conversation within the FDLP in the last few years has been about our historic paper collections rather than about the real digital preservation issue we should be facing: born-digital government information. (See Born-Digital U.S. Federal Government Information: Preservation and Access.)
As Potter says, “We have clear data that if content is not captured from the web soon after its creation, it is at risk.” And, “The absence of an acquisition stream for this [born-digital] content puts it at risk of being lost to future library and archives users.”
Potter outlines a plan of action for digital newspaper information that is surprisingly relevant for government information. She suggests that libraries should establish relationships (and eventually agreements) with the organizations that create, distribute, and own news content. That sounds like exactly what FDLP libraries have always done for 200+ years with paper and should be doing, could be doing, with digital government information today. There is no legal or regulatory barrier to GPO depositing FDLP digital files with FDLP libraries; indeed, GPO is already doing this de facto with its explicit actions that allow “USDocs” private LOCKSS network partners to download FDsys content.
Potter also recommends web archiving as another promising strategy. Since many agencies are reluctant to deposit digital content with FDsys, and because they are allowed by law to refrain from doing so, web archiving is a practical alternative, even if it is imperfect. Indeed, GPO does its own web harvesting program. Although some libraries also do web harvesting that includes U.S. Federal government web sites, more needs to be done in this area. (See: Webinar on fugitive documents: notes and links.)
I find it ironic that libraries are not at least experimenting with preserving born-digital government information. It is difficult to find an article about library projects that does not assert scarcity of funds or high barriers of copyright to overcome in digital library projects. So, why not use born-digital government information as a test bed for preserving digital content? The FDLP agreements and commitments are already in place, most of the content is public domain, and communities of interest for the content already exist. FDLP libraries could start today by building digital library collections and test-bed technology for government information and later expand to other more difficult collections and build on a base of experience and success. The fact that this would help our designated communities, preserve essential information, and further the goals of the FDLP would be welcome side-effects.