Our mission

Free Government Information (FGI) is a place for initiating dialogue and building consensus among the various players (libraries, government agencies, non-profit organizations, researchers, journalists, etc.) who have a stake in the preservation of and perpetual free access to government information. FGI promotes free government information through collaboration, education, advocacy and research.

LOCKSS and CLOCKSS: Interview

Here’s a short, informative interview with Vicky Reich, director of the LOCKSS programme at Stanford University Libraries, and Randy Kiefer, executive director of the CLOCKSS archive:


VR: If you don’t preserve digital content then it won’t exist. Most of society’s culture and commercial assets are now digital but, generally, the move from print to electronic is about access rather than preservation….

VR: The web as a publishing platform enables many things never envisaged in the print world. The web started with a document model, then evolved to include dynamic elements, such as advertisements and embedded videos. But first with AJAX and now with HTML5, the web is becoming a networked operating system inside the browser. It is no longer enough to parse content collected from the web to find the links and follow them; the content must be executed to discover the web resources from which it is composed. Some of these resources are web services, such as Google Maps. Preserving executable content and the services on which it depends is a major challenge that the LOCKSS programme is working to address.

The best way to preserve digital content…

Here is an interesting article that examines the use of criminal digital forensic tools to discover and repair corrupted digital information in digital archives, but there is another story here as well. Although the title doesn’t tell you this, Fox actually looks at two alternatives for digital preservation: digital forensics and what he calls “the buddy system.”

Fox describes the buddy system this way: “[W]hen more than one system is responsible for maintaining the integrity of any given digital object. If each system in question has a copy of the object, and they are verifying the integrity of that object against the objects that their “peers” possess, there is a much higher probability when they agree that the integrity of the object is intact. This is a “digital buddy system” of sorts, because each peer helps the other peers in it’s network maintain the integrity of commonly held digital objects. This is the principle behind the LOCKSS electronic resource preservation system (LOCKSS, n.d.; Rosenthal and Reich, 2000), which is a peer-to-peer preservation system now in wide use, and developed and maintained by Standford University.”

He notes further:

Studies over the last decade have indicated that digital preservation is most successful when the information “is best preserved by replicating it at multiple archives run by autonomous organizations”…. These concepts have been in place for almost ten years, but it has only been in the last four-to-five years that libraries have attempted to preserve anything beyond e-journal content using P2P network systems. [emphasis added]


LOCKSS-USDOCS at Best Practices Exchange

I just got back from Best Practices Exchange 2010 (check out the growing list of available presentations and the twitter back channel!). It was a really solid conference — a healthy mix of archivists, documents and other librarians, and technologists having project-oriented presentations with a healthy dose of discussion. The cherry on top was the engaging keynote by the David Ferriero, the Archivist of the US (AOTUS) (here’s a good summary of AOTUS’ talk).

I was on a panel with Arlene Weible from OR State Library (Arlene gave a great talk on RAT, OSL’s tool for collecting state documents — I hope she posts her slides soon!) and presented about LOCKSS-USDOCS, the distributed documents preservation project. Take a look at the slides. We’re looking for other participant libraries so email me if your library is interested (jrjacobs AT stanford DOT edu).

David Rosenthal: Stepping Twice Into The Same River

Last month, David Rosenthal, chief scientist on the LOCKSS Project, gave the keynote address entitled Stepping Twice Into The Same River to the ACM/IEEE Joint Conference on Digital Libraries (JCDL) and the annual International Conference on Asia-Pacific Digital Libraries (ICADL) (or just ICDL/ICADL!) in Queensland, Australia. It was wide-ranging, thoughtful and provocative — in short everything you’d want in a keynote to a major international digital library conference.

David hit on publishers and the publishing industry and practices, scholarly communication, digital preservation, the intersection between technology and economics and the current state and future of libraries. He makes a great argument that the upheaval and disruption currently affecting the 3 parallel fields of publishing, libraries, and archives (what he terms “technological and economic discontinuity”) creates the perfect opportunity for radical technological change toward a collaborative archival academic cloud in order to define the future of information access and preservation (at least in terms of universities and scholarly communication) in beneficial and long-term sustainable ways.

Here are some main points that I gleaned from David’s presentation:

  1. publishers are in a similar boat to news organizations and have sacrificed long-term viability for short term economic gain — and that’s going to ultimately destroy them;
  2. libraries and archives need to focus their preservation goals on dynamic services rather than the static content:

    “…it’s less about what we are preserving and more about how preserved information is accessed. Less about HTML and other formats, and more about HTTP and other protocols. The reason is that static information is a degenerate case of dynamic information; a system designed for dynamic information can easily handle static information. The converse isn’t true.”

  3. distributed digital preservation and archives offer the more economically and technologically sound opportunities in the long run;
  4. data preservation will take steady long-term funding;
  5. since ingest is a major cost for any digital preservation system, universities need to start seeing their Web space/infrastructure in terms of academic clouds rather than leasing from commercial cloud companies like Amazon’s Elastic Compute Cloud (Amazon EC2):

    “Unless something dramatic happens, scholars who want to publish services wrapped around their, or other people’s, data will take the path of least resistance and use Amazon’s services. Miss a credit card payment, your data and service are history. Worse, do we really want to end up with Amazon owning the world’s science and culture?”…

    …What Universities get for the extra cost is the permanence they need. The permanence comes from the fact that the University already has its hands on the data and the services in which it is wrapped, instantiated in highly robust and preservable hardware. Thus, no ingest costs and very low preservation costs. With the model of Amazon and a separate archiving service, as well as paying Amazon, Universities have to pay the archiving service, and pay the ingest costs. When these extra costs are taken in to account, because the ingest costs dominate, it is likely that Amazon would be more expensive.

I highly recommend that folks read David’s keynote at least twice. there are a lot of pearls of wisdom in there. I think he makes a compelling case for a viable digital future for scholarly communication, one in which libraries and archives can play a vital role.

This is BIG: GPO + LOCKSS (update)

Last week, James made a modest announcement of the biggest development in digital deposit in decades.

This means that GPO is assisting the LOCKSS-USDOCS project in preserving content harvested from fdsys.gov. That means we are developing a geographically distributed network of digital archives. There are already 18 libraries participating, including 4 regionals. As James pointed out, this “replicates key aspects of the FDLP in the digital environment and furthers the concept of ‘digital deposit,’ an essential component of the digital FDLP.”

One indicator of the importance of this project in the world of digital preservation is that the Association of Computing Machinery’s technology newsletter, ACM TechNews, lists the project today.

Although LOCKSS-USDOCS is still essentially a backup of FDSYS (the content only gets made accessible if the live content goes away), this is still an enormous step in the right direction for digital preservation, both technically and politically. It was fairly recently that GPO seemed to want nothing to do with LOCKSS (See: GPO LOCKSS report: Why LOCKSS vs. FDsys? and GPO, LOCKSS, IP Authentication, and the future of FDLP — more clarification needed.) Now, GPO is actively collaborating with depository libraries by putting LOCKSS permission statements throughout the FDsys.gov site in order for LOCKSS-USDOCS to harvest GPO content. This is a huge change in GPO’s attitude from 3 years ago!

Now that we are beginning to have a distributed digital backup of FDsys, we can begin to look forward to the next steps of digital deposit in which documents and data will be deposited into live digital library collections for active retrieval and use.

Congratulations go to James, Stanford, LOCKSS, and GPO!!