Digital deposit

Note to FullTextReports followers — Grab It When You See It!

Our friends Gary Price and Shirl Kennedy over at Full Text Reports have a handy reminder today:

...some of the papers and reports posted on FullTextReports.com are freely available online for just a limited time before they disappear behind a paywall (or go away entirely). If you see something you suspect might be useful to you (or a colleague) in the future, download it the day you see it because it may not be accessible later without a subscription (or it may have been moved or taken offline).

-- Note to FullTextReports followers — Grab It When You See It!, Full Text Reports (April 17, 2013).

Just another reason to remember that libraries should be collecting, not pointing. (See: When we depend on pointing instead of collecting.)

(By the way, in case you hadn't noticed: the left hand navigation pane here at FGI has a feed of the latest reports listed at Full Text Reports!)

When we depend on pointing instead of collecting

NASA took its Technical Report Server (http://ntrs.nasa.gov/) offline this week, saying :

The NASA technical reports server will be unavailable for public access while the agency conducts a review of the site's content to ensure that it does not contain technical information that is subject to U.S. export control laws and regulations and that the appropriate reviews were performed. The site will return to service when the review is complete. We apologize for any inconvenience this may cause.

As Steven Aftergood reported at Secrecy News [emphasis added]:

In other words, all NASA technical documents, no matter how voluminous and valuable they are, should cease to be publicly available in order to prevent the continued disclosure of any restricted documents, no matter how limited or insignificant they may be.

"There is a HUGE amount of material on NTRS," said space policy analyst Dwayne Day. "If NASA is forced to review it all, it will never go back online."

      -- "NASA Technical Reports Database Goes Dark" by Steven Aftergood (March 21st, 2013).

Michael L. Nelson of the Department of Computer Science at Old Dominion University investigated the availability of some of the NASA reports at other archives and reports his findings on his blog:

Nelson found that some reports are available at http://naca.central.cranfield.ac.uk/ which is an archive of some NASA information that Nelson helped establish after NASA websites were taken down after September 11, 2001. He notes that the removal of information from NASA servers at that time "made it clear to me that NASA information was too important to be left on *.nasa.gov computers." He found more data at the Internet Archive's "NASA Technical Documents" collection: http://archive.org/details/nasa_techdocs and in Mark Phillips NACA collection at http://digital.library.unt.edu/explore/collections/NACA/ .

Nelson draws some conclusions from all this [emphasis added]:

...it is events like this that demonstrate the value of copying by-value and not just by-reference.

In other words, pointing to web sites is much less valuable and much more fragile than acquiring copies of digital information and building digital collections that you control. The OAIS reference model for long term preservation makes this a requirement, saying that an organization that intends to provide information to its user community for the long-term, must "Obtain sufficient control of the information provided to the level needed to ensure Long-Term Preservation." Pointing to a web page or PDF at nasa.gov is not obtaining any control.

He also makes a distinction between those things that are saved because of their popularity and things that will not be saved unless special care is taken to preserve them:

I'm not concerned about popular culture artifacts disappearing (e.g., see our TPDL 2011 paper about music redundancy in YouTube), but it is not clear that long tail content like NASA reports will enjoy that same level of uncoordinated refreshing and migration. The moral of the story: make copies of the content...

And he notes the importance of multiple copies:

...a 1994 NASA TM of mine is on at least six different hosts, none of which are *.nasa.gov.

...If NTRS was a LOCKSS participant then access would be uninterrupted...

And Aftergood concludes [emphasis added]:

The upshot is that the government is not an altogether reliable repository of official records. Members of the public who depend on access to such records should endeavor to make and preserve their own copies whenever possible.

Here at FGI, we have repeatedly argued that identifying important information that warrants explicit preservation is the age-old role of libraries in society and that it still is (or should be) the key value of libraries in the digital age. Many government agencies, including NASA and the Government Printing Office have good intentions and good programs for preservation and access, but those agencies cannot guarantee that they will always provide preservation and access. In the case of the NTRS web site, Aftergood and others speculate that the take down was a response to a demand by a single Congressman who said in a press conference on March 18 [emphasis added]:

NASA should immediately take down all publicly available technical data sources until all documents that have not been subjected to export control review have received such a review and all controlled documents are removed from the system.

-- http://spaceref.com/news/viewpr.html?pid=40365

The NTRS web site was taken offline on March 19.

Government agencies are subject to political activities like this and budgetary limitations. Very bad things can happen which, in cases like this can remove from access, "all NASA technical documents, no matter how voluminous and valuable they are" in a single moment.

Libraries should still be selecting, acquiring, organizing, and preserving information for their user-communities, and providing access to and services for those collections. Libraries do no one a long-term service by simply pointing to resources over which they have no control and which someone else can simply make unavailable literally at the flick of a switch.

FDLP libraries should demand digital deposit from GPO and should actively select and acquire that digital public government information that is of value to their user communities that GPO cannot deposit because it is outside the scope of Title 44.

GPO joins LOCKSS: digital deposit a reality

According to yesterday's press release, GPO has joined the LOCKSS alliance! The Stanford News Service also wrote a story about this historic event, complete with a goofy picture of yours truly :-)

But what the GPO press release didn't explain is that, as part of GPO's participation in the LOCKSS Alliance, GPO will assist the LOCKSS-USDOCS project (which I'm organizing) in preserving content harvested from fdsys.gov in a geographically distributed network of digital archives. GPO has put LOCKSS permission statements (for example here, and here and here) throughout the FDsys.gov site in order for LOCKSS-USDOCS to harvest GPO content. LOCKSS-USDOCS -- which is 18 libraries strong (including 4 regionals!) and growing -- replicates key aspects of the FDLP in the digital environment and furthers the concept of "digital deposit," an essential component of the digital FDLP.

We're actively looking for other libraries to participate in the project, especially regionals. Together we can provide an essential digital preservation piece to the FDLP. Please contact me (jrjacobs AT stanford DOT edu) with questions or interest.

--That is all.

FDsys Program Review

In addition to the recent GPO Inspector General's report on FDsys (see The State of FDsys and the Future of the FDLP), there is another new report on FDsys.

  • FDsys Program Review. Bob Tapella, Ric Davis, Mike Wash,Scott Stovall, Selene Dalecky, John Shuler, Suzanne Sears, Mike White. Government Printing Office (April 7, 2010)

    Summary: On Wednesday, April 7, 2010, Bob Tapella, Public Printer, United States Government Printing Office (GPO), convened a public meeting to review the status of GPO’s Federal Digital System (FDsys) program. The objective of the meeting was to receive a program status update and to discuss program successes, issues, and opportunities with key stakeholders including GPO’s Library Services and Content Management (LSCM) business unit, the Office of the Federal Register (OFR), and representatives from the Federal Depository Library Council. The meeting was also attended by observers from GPO, the House Administration Committee, and the House Committee on Appropriations.

This report gives a much more sanguine view of the state of FDsys than the Inspector General report gives. It does, indeed, step through "program successes, issues, and opportunities." As I noted in my coverage of the IG report, there are successes and there is lots to hope for when all the system requirements are met. This report notes that "The estimated cost to complete Release 1 was reduced from $62 million to $42 million, saving $20 million" while the IG report focuses on the fact that the original cost estimate for the first phase of FDsys implementation was $16 million and the fact that GPO has redefined "Release 1" (which originally was slated to include "basic, additional, and final features") to include only "basic" features and now calls "additional and final features" "Release 2."

Nevertheless, it does a good job of pointing out what GPO has accomplished, which is significant.

The new report also identifies one critical risk to FDsys:

[T]here is risk associated with a delayed completion of the core system. Mitigation steps include maintaining sufficient investment to complete the core system and preventing loss of key resources resulting in more cost and time.

It also includes this statement of purpose:

The purpose of FDsys is not to serve as a portal, but instead to provide access to official and authentic content from all three branches of the U.S. government on our site, and through links to official agency and partnering web sites. Our main system functions encompass publishing information, enabling searching for information, preserving the information, and providing version control.

This is a sound, and probably sustainable, purpose. The report notes with satisfaction that the provision of XML formatted information has powered other, more user friendly, websites such as FedThread.org, GovPulse.us, and Regulations.gov. This vision of FDsys is, perhaps, close to view of those who say that the government should reimagine its role as an information provider to providing raw data and leave the fancy websites to others. (See The Federal Government Must Reimagine Its Role As An Information Provider.)

It is, however, probably not as close to the view that FDLP librarians have of easy access to government information. In light of the problems described in the IG report, it makes me wonder if there is a slight "re-imagining" of FDsys going on to make its vision fit closer with what GPO can do rather than what FDLP would like it to do. Time will tell.

Update. When asked about this issue at DLC meeting yesterday (Monday, April 26, 2010), the Supt. of Docs. responded (as reported by Shari Laster): "It's an advanced search system, a content management system, and a digital repository. Is GPO Access/etc. a portal? No. This is an official content repository."

The report also intriguingly notes that "FDsys content is available in all major search engines." I did a couple of quick Google searches of full text hearings that are in FDsys and got no hits. I would be interested to hear if GPO has more details about what is "available" in all major search engines and what is not. (If you have different results, please share them!)

Oh, yes. One other little thing. Ric Davis, Director of Library Services and Content Management and Acting Superintendent of Documents lists several "opportunities" afforded by FDsys. One is "Digital Dissemination"!

While having a repository of content available at GPO is critical, there are opportunities to facilitate the availability of digital collections in libraries. Some in the FDLP community have expressed strong interest in having Access and/or Preservation level files digitally deposited in FDLP libraries. This will further the model established for tangible collections of content by having dispersed collections of electronic content, and through partnerships better ensure access and preservation of content.
(FDsys Program Review, page 7)

Thanks Ric!!

Creating an Institutional Repository for State Government Digital Publications

An interesting case study:

  • Meikiu Lo and Leah M. Thomas. Creating an Institutional Repository for State Government Digital Publications. The Code4Lib Journal (22 Mar. 2010).

    In 2008, the Library of Virginia (LVA) selected the digital asset management system DigiTool to host a centralized collection of digital state government publications. The Virginia state digital repository targets three primary user groups: state agencies, depository libraries and the general public. DigiTool’s ability to create depositor profiles for individual agencies to submit their publications, its integration with the Aleph ILS, and product support by ExLibris were primary factors in its selection. As a smaller institution, however, LVA lacked the internal resources to take full advantage of DigiTool’s full set of features. The process of cataloging a heterogenous collection of state documents also proved to be a challenge within DigiTool. This article takes a retrospective look at what worked, what did not, and what could have been done to improve the experience.

2009 Fall DLC Meeting: "Demystifying Digital Deposit: What It Is and What It Could Do for the Future of the FDLP"

At the Fall 2009 Depository Library Council (DLC) meeting in Arlington, VA, James A. Jacobs and I (Rebecca Blakeley) introduced attendees to the concept of "digital deposit" that maps out the pieces of the FDLP cloud and what it could do for the future of the FDLP. Our slides and notes are available for you to view and download online.

MetaArchive publishes guide to distributed digital preservation

Please check out the new book published by the MetaArchive Cooperative called A Guide to Distributed Digital Preservation. It's both timely and handy.

[Full disclosure: the book is primarily about LOCKSS and mentions specifically the project that I'm working on LOCKSS-USDOCS, FGI and I receive no compensation from the sales of the book.]

Announcement: publication of A Guide to Distributed Digital Preservation

Authored by members of the MetaArchive Cooperative, A Guide to Distributed Digital Preservation is the first of a series of volumes from the Educopia Institute describing successful collaborative strategies and articulating specific new models that may help cultural memory organizations work together for their mutual benefit.

This volume is devoted to the broad topic of distributed digital preservation, a still-emerging field of practice for the cultural memory arena. Replication and distribution hold out the promise of indefinite preservation of materials without degradation, but establishing effective organizational and technical processes to enable this form of digital preservation is daunting. Institutions need practical examples of how this task can be accomplished in manageable, low-cost ways.

This guide is written with a broad audience in mind that includes librarians, archivists, scholars, curators, technologists, lawyers, and administrators. Readers may use this guide to gain both a philosophical and practical understanding of the emerging field of distributed digital preservation, including how to establish or join a network.

Readers may access A Guide to Distributed Digital Preservation as a freely downloadable pdf and/or as a print publication for purchase. Please visit http://www.metaarchive.org/GDDP to download or order the book.

******

The MetaArchive Cooperative provides low-cost, high-impact preservation services to help ensure the long-term accessibility of the digital assets of universities, libraries, museums, and other cultural memory organizations. In addition to preserving members' digital content in a distributed digital preservation network, the Cooperative also offers consulting and education services to institutions that seek training in digital preservation planning, policy creation, and implementation, including setting up and running Private LOCKSS Networks (http://www.lockss.org).

For more information, please contact Program Manager Katherine Skinner (katherine.skinner@metaarchive.org).

Demystifying Digital Deposit: What It Is and What It Could Do for the Future of the FDLP

At the Fall Depository Library Council Meeting in Arlington, VA, Rebecca Blakeley gave a presentation that she and I wrote on "Demystifying Digital Deposit: What It Is and What It Could Do for the Future of the FDLP." Although a PDF version of the presentation is available on the FDLP web site, it only has the slides, not the text of the presentation.

The complete, original PowerPoint file, including the "speaker notes" with the complete text of the presentation, is available on slideshare:

Digital Deposit

This page is to collect information on digital deposit.

Kentucky Shows How to Publish and Deposit Government Documents

The State of Kentucky has developed a best-practices manual for publishing -- and depositing -- government documents digitally.

The Kentucky Department for Libraries and Archives (KDLA) has been the official repository for Kentucky state agency publications since 1958

...Kentucky state agencies are required to send their publications to KDLA. The Public Records Division (PRD) and State Library Services (SLS) at KDLA work together not only to provide access to the valuable information contained in state agency publications, but also to preserve the publications for future generations.

The handbook says that "Electronic publications should be forwarded in Adobe Portable Document Format (PDF)." We would love to see all U.S. government publications in PDF format deposited in FDLP libraries. It would be a great first step toward digital deposit of all government information.

Syndicate content Syndicate content