Home » Posts tagged 'future of government information in libraries'

Tag Archives: future of government information in libraries

Our mission

Free Government Information (FGI) is a place for initiating dialogue and building consensus among the various players (libraries, government agencies, non-profit organizations, researchers, journalists, etc.) who have a stake in the preservation of and perpetual free access to government information. FGI promotes free government information through collaboration, education, advocacy and research.

This is not a drill. The future of Title 44 and the depository library program hang in the balance.

As we wrote last week the Government Publishing Office (GPO) has asked the Depository Library Council (DLC) to recommend changes to Title 44 of the US Code. We believe this is a bad idea at this moment in history and worry that the unspecified changes GPO wants will result in damage to the FDLP and long-term free public access to government information. We recognize that GPO may pursue this avenue anyway. We have now heard that there is a draft bill being worked on as we speak, so even the extremely short time-frame for public input suggested by GPO (AUGUST 31!!) may be too late for libraries to have *any* say in a law which deeply effects how libraries and the depository program work. We therefore present our suggestions to DLC. We invite you to submit your own recommendations and use any of our suggestions that you like.

I. Do Not open Title 44 to changes at this time

We believe that the wisest course of action at this time is to refrain from suggesting that Title 44 be amended at all. We have seen no evidence of any “champions in Congress” for GPO, FDLP, or long-term free public access to government information. In addition, the current political divisiveness in Congress and the current lack of support for government services in general makes it unlikely that we could get positive changes to Title 44. In fact, it seems to us more likely that we would get changes that converted free access to fee-based cost recovery or privatization or both. In short, We suggest you send your comments to DLC before the Aug 31 deadline and:

Recommend to DLC that GPO refrain from asking for changes to Title 44 at this time.

II. Principles of Free Public Access in the Digital Age

If DLC, or GPO, or the Committee on House Administration (CHA) — GPO’s oversight committee — insist on trying to modify Title 44, we believe the government information community should insist that some principles be preserved and even strengthened. The last time GPO voiced principles was 1996 (it repeated those principles in 2016). Although those principles are good as far as they go, they are grossly outdated in the digital age and do not adequately address either the nature of digital information or the needs of users in a digital age. We suggest you write to DLC and:

Recommend to DLC that any changes to Title 44 reflect these four principles:

  1. Privacy
  2. Preservation
  3. Free Access and Free Use
  4. Modernized scope of information covered by Title 44

(more…)

What Are We To Keep? (FAQ)

This document is meant to accompany the article, “What are we to Keep?” by James R. Jacobs, Documents to the People (Spring 2015) p 13-19.

FAQ

  • What is a Preservation Copy?

    Research that was prompted by JSTOR’s desire to determine how to guarantee that all of the printed material within its journals would remain available defined preservation copies as “clean copies that retain full information accuracy from the vantage point of the researcher” (Yano). Thus when we think about “preservation copies” we are looking to be able to ensure that copies are available for the long-term and that those copies are complete and accurate. “Informational Accuracy” a “perfect copy” — a copy that is as good as new. A preservation copy is, therefore, a “clean” copy that is quality-checked and repaired, if necessary, on a page by page basis.

  • Why do we need Preservation Copies?

    Even if we had perfect digital copies of paper documents, we still need preservation paper copies for two reasons. First, there is evidence that digital documents degrade more rapidly than print material (Rosenthal), so it is necessary to have a paper copy that could be used to re-digitize. Second, Digitization does not magically preserve paper; or, to put it another way, digital copies are not the same as print copies and may inherently lose information by the very dint of reformatting to a new presentation.

  • Why do we need Access-Copies?
  • Unless we have perfect, page-verified digitizations that are as complete, as accurate, and as easily usable as the original paper copies (Jacobs and Jacobs), users will inevitably need to go back to the original paper copy in order to get either the complete and accurate content or the functional usability of the original paper medium. Some libraries have already reported that digitization of paper copies has increased the demand for access to the paper copies. Additionally, some users/uses will require access to physical copies via Interlibrary Borrowing. ILL can only happen if there is a surplus of copies. As the # of copies goes toward 0 (scarcity), libraries will no longer be willing to lend to ILL. Therefore, it is imperative that there not be a dearth of geographically distributed copies.

  • Why do we need re-digitization copies?

    Unless we create perfect copies that adequately anticipate the future needs of users, we will need to create new digitizations in order to meet those future needs. (See “An alarmingly casual indifference to accuracy and authenticity” What we know about digital surrogates.)

Checklist:
What should I think about before discarding government documents?

What are we to keep? thoughts on the National Collection (DttP Spring 2015 feature article)

The Spring 2015 issue of Documents to the People (DttP) just arrived at my door. The feature article in this issue is titled “Thoughts on the National Collection” and was collaboratively written by myself, James R. Jacobs, along with Shari Laster, Aimee C. Quinn, and Barbie Selby. I’m posting my segment titled “What Are We to Keep?” as it was written under a Creative Commons Attribution-NonCommercial-Share-Alike CC BY-NC-SA license. The other pieces include: “Segmenting the Government Information Corpus” by Shari Laster; “Who is Responsible for Permanent Public Access?” by Aimee C. Quinn; and “Where Do We Go From Here?: Some Thoughts” by Barbie Selby. I’ll post the other segments if I get permission from my collaborators.

The question of “how many copies” of print documents the FDLP should collectively keep is the wrong question asked for the wrong reasons and trying to answer it will only lead to the wrong answers and irreparable loss of information. For me, even thinking about answering it raises more questions. How can we know how many copies to keep unless we specify the purposes for which we wish to keep them? What are those purposes? How will we know if we are meeting our goals? How will discarding paper benefit users? How can we be sure that we are not losing information when we discard paper copies if we do not have an inventory of the paper copies that exist? How can we implement a policy that is so vague that it doesn’t define things like “a requisite number of copies,” and how decisions will be made, and which apparently treats a born-digital XML document created by GPO and an indifferent digitization without OCR text and missing its maps and foldouts as of equal value?

Let’s be clear. We are talking about the records of our democracy. Loss of even a single page could damage the ability of historians, journalists, economists, and citizens to understand our history and hold our government accountable for it successes and its failures. We have those documents now in our libraries; there are not hundreds or even dozens of copies of these documents floating around in used bookstores or elsewhere. They are in our charge.

Keep reading “What are we to keep?”…

Also see the What Are We To Keep FAQ for further context and bibliography.

Focusing on the essentials at DLC

In anticipation of this week’s Depository Library Council meeting, FGI suggests a focus on the biggest challenges facing long-term preservation and access.

The scope of the challenges we face is large and clear. The quantity of government information that is “born-digital” each year is literally orders of magnitude greater than the quantity of government publications accumulated over the entire 200+ year history of the FDLP. (Born-Digital U.S. Federal Government Information: Preservation and Access). Although the redundancy of copies of the historical FDLP paper collections ensures at least their passive preservation, repeated calls for discarding those collections endanger both the preservation of that content as well as access to it. Consequently, the inadequacy of bibliographic records for those collections now poses a significant threat to their long-term preservation and access. In this time of proliferation of government information, it is essential for the FDLP to have a clear understanding of exactly what information exists, what is being preserved, and who is accepting responsibility for long-term preservation of and free access to government information.

Congressional support of government information programs is at an all time low. Over the last decade, appropriations bills have steadily decreased budgetary support for the Government Publishing Office (GPO). This year, a House bill proposes 9% cut to GPO’s budget – a cut that would negatively affect the maintenance and development of FDsys. Additionally, Congress has proposed shuttering the National Technical Information Service (NTIS) and Congressional pressure resulted in taking NASA technical reports offline. Anti-government sentiment is so strong it is difficult for agencies to reliably maintain even essential basic services (including GPO’s own PURL server) and Congress has even shut down the entire government more than once and some in Congress continue to threaten future shutdowns. While GPO is doing a good job of preserving in Fdsys most born-digital official Congressional information, preservation of the digital information of the Judicial and Executive branches is haphazard, uncoordinated and fragile. GPO has (with the acquiescence of FDLP libraries) arrogated to itself the job of being solely responsible for preservation of born-digital government information. This has actually weakened the infrastructure of preservation by changing a system that relied on 1200 partners to a system that depends on a single government agency. In this context, a single budget cut would mean a loss of a huge quantity of digital government information – if not for the innovative, active cooperation of a hardy band of FDLP libraries that participate in the LOCKSS U.S. Documents project. (This project only serves as backup of the information in FDsys and does not, currently, provide any means of making that information accessible.) In this time of fragile support for government action, it is more essential than ever to reverse the twenty year old model of centralization and return to a model of shared responsibility with the participation of as many non-profit, service-oriented libraries as possible.

It is time for specifics and time for libraries that claim to value permanent free access to government information to step up and take new digital-library responsibilities. GPO’s proposed “National plan for access to US government information” and “Federal Information Preservation Network (FIPNet)”) have, so far, been vague outlines with few specifics. We at FGI propose that FDLP librarians and GPO use this week’s virtual Depository Library Council (DLC) meeting to 1) clarify the existing state of government information; and 2) specify an agenda for what is needed in order to have a successful national library plan for a sustainable system of government information collection, preservation and provision. We propose that DLC use this meeting to flesh out and expand the parameters of the discussion and more fully describe what needs to be done by the FDLP community. The following is our own take on these two ideas.

Clarifying the state of things

The current state of identification of government information is fragmented and incomplete. GPO uses the Catalog of Government Publications (CGP) to meet its legal requirement to maintain an “electronic directory of Federal electronic information” (44USC4401). But the CGP is incomplete. It is complemented by the historic Monthly Catalog, the 1909 Checklist, and GPO’s “shelflist” project. GPO’s online digital collections (which include FDsys, the Federal Depository Library Program Web Archive, and the Federal Depository Library Program Electronic Collection (FDLP/EC) Archive provide additional, but still incomplete, pieces of the bibliographic puzzle. HathiTrust’s government documents registry project promises to better define the breadth and depth of the historic national bibliography, but it has a limited scope. These separate projects provide an incomplete and confusing picture and they fail to provide any unified tool for managing long-term preservation and access. There are at least two areas in particular that require clarification:

  1. GPO’s PURL policies and actions. GPO uses PURLs to provide permanent URLs to digital resources, but it is not clear how GPO policies ensure accurate linkage of metadata to digital objects. For example, we understand that some PURLs point to agency web sites and some point to digital objects in permanent.access.gpo.gov. GPO should provide answers to the following questions:
    • How does GPO deal with the metadata for information that changes (not just moves) on agency web sites?

    • Are there clear policies that govern the creation of PURLs and how they are checked for accuracy over time?

    • Is there metadata that clarifies the relationship between agency copies and GPO copies?

    • Is there a reason that GPO forbids the Internet Archive to harvest documents using PURLs?

    • Are there polices that deal with versioning of digital documents?

    • Has GPO compared the functionality of PURLs and the functionality of DOIs and the possibility of pointing to multiple copies of the same item?

    (For reference, here are examples of existing PURLs):

    Note: All 3 of the above PURLs had the same error message in the Internet Archive’s wayback machine: “Page cannot be crawled or displayed due to robots.txt.”

    The final landing pages for the above GPO PURLs in wayback machine and got mixed results:

     

  2. Questions about GPO Web Harvesting.
     

What we need

In order to define the national bibliography, bring it under the control of GPO and FDLP libraries and accurately and successfully manage FDLP collections (paper, born-digital, and digitized) for the long-term, the FDLP (and the public!) need accurate, complete, up-to-date, unified metadata for all FDLP ‘publications.’ This includes:

Content:

  • everything in FDsys
  • everything with a PURL.
  • everything in permanent.access.gpo.gov
  • everything in GPO’s Archive-It collection
  • Every Executive agency’s Website (ideally including the proposed development of ../publications and ../data directories on every agency site)
  • Every digital surrogate qualifying for the Digital Surrogate Seal of Approval to assure quality and completeness of digitizations.

Data:

  • Metadata should accurately link to specific digital objects
  • Metadata should have specific information about versions and editions and the relationship of GPO copies to agency copies of web-harvested information.
  • Metadata should include an indication about who is providing preservation services.

What libraries can do to help

  • Expand the HT registry through a process of collaborative cataloging and metadata creation. Even cataloging one document per month aids in the ongoing effort to thoroughly describe the national bibliography;
  • Develop and participate in a new, digital [[Farmington Plan]] in which libraries divvy up, adopt, and track digital documents from executive agencies;
  • Participate in fugitive hunting (see “Want to be a fugitive hunter?”);
  • Develop tools for collaborative and targeted Web harvesting and community crowd-sourcing of Web crawl Quality Assurance (QA) (i.e., tuning Web harvests and checking to make sure they collect the targeted material);
  • Manage historic collections with a more geographically holistic view toward collection access and preservation;
  • Develop and participate in community-based projects for contacting executive agency CIOs/CTOs to advocate for ingest of agency publications and data into FDsys.

And the list could go on. Some of these tasks are large and expensive, but some of them can be done on a regular basis in as little as 1 hour/month. The FDLP needs all hands on deck. One thing is for sure: if FDLP libraries and librarians do not step up to the admittedly large task of continuing to build digital FDLP collections, we could potentially see the end of the historic record.

We look forward to the coming conversations.

Jim Jacobs and James R. Jacobs


See also

DttP letter to the editor re digital preservation of government information

Jim and I recently wrote a letter to the editor to the GODORT journal Documents to the People (DttP) (published in the Winter 2015 issue) entitled “Digital preservation deserves better coverage.” We post it here to FGI in the hopes that it will “clarify some of the issues and provide a more accurate and more understandable context for action by the GODORT community.” It’s not yet online at the DttP site, but will eventually be posted there. We’ll post a link to the DttP site when it’s online. I’ll be at ALA Midwinter conference next week in Chicago, so please track me down if you’d like to discuss. That is all.

>>>>>>>>>>>>>>>

In the Summer 2014 issue of Documents to the People (DttP), an article by Scott Casper, which was highlighted as a “feature,” offered a badly misleading, confoundingly misinformed, and confusingly written account of digital preservation. Digital preservation is an incredibly important topic for government information professionals and it deserves better treatment in DttP.

I think Casper must have had good intentions in writing his article, “Promoting Electronic government Documents: Part Four: Preservation.” Perhaps his intention was simply to promote the importance of digital government information, which is the theme of his series of articles, and the necessity of maintaining access to government information of all types. But whatever his intention was, he does a disservice by conflating important issues, confusing technical terms, and mostly ignoring the very important issue of digital preservation which is his ostensible topic.

It would not be useful to point out every error and misstatement in Casper’s article. There are so many, though, that we would guess that anyone who read his article would be left either confused or badly misinformed. So, instead of trying to correct every error or trying to figure out what he may have meant by every confusing statement, we think it would be more useful to define and describe and give some context to a few of the key concepts that Casper mentions. Our hope is that this will clarify some of the issues and provide a more accurate and more understandable context for action by the GODORT community.

Preservation of born-digital information is a very real and important topic that the government documents community needs to understand and address. DttP readers should be aware, for instance, that more government information is born-digital in a single year than all the printed government information that all FDLP libraries have accumulated in over 200 years. (See Born-Digital U.S. Federal Government Information: Preservation and Access prepared by James A. Jacobs for the Center for Research Libraries.)

Digitization of print information is not a preservation solution; rather, it creates new digital preservation challenges that have not yet been adequately addressed. While digitization offers many promises of better access such as better discoverability, easy accessibility, and enhanced usability, and even a potential form of “preservation” (by protecting fragile paper documents from damage through use), the simple act of converting a paper document into a digital object does not automatically deliver any of those promises. In fact, digitization is only the first of many costly and technically challenging steps needed to ensure long-term access to content. (See Wait! Don’t Digitize and Discard! A White Paper on ALA COL Discussion Issue #1a. and Digitization does not magically preserve paper.)

Access is not preservation. The word “access” is too often used as a buzzword that hides and obscures a number of underlying issues. It is often conflated with preservation as if the two were the same. In fact, they are two very different things that require very different actions. Like two spouses, they are very different but intimately related. So, when we hear the word “access” used, we should always remember two things: First, access without preservation is temporary, at best. Providing access does not guarantee preservation or long-term access — much less free access. Too often libraries are willing to replace public domain collections with “just in time” fee-based access that is encumbered by licensing and DRM restrictions. In our digital age we often see access promoted as a desirable goal in itself, only to see once “accessible” documents suddenly disappear from the web. “Access” without trusted, long-term, reliable preservation is more like a Kmart blue-light special (“Get it while you can! It won’t be here long!”), than a long-term library service. Second, preservation without access is an illusion. As Paul Conway said, “In the digital world, the concept of access is transformed from a convenient byproduct of the preservation process to its central motif.” See Preservation in the Digital World by Paul Conway and The value in being a depository library.

Digital preservation is an essential activity of libraries. Casper fails to recognize this fact when he describes the good work of the EDI (Electronic Documents of Illinois) project without mentioning that it is a service of the Illinois State Library (http://iledi.org/). Digital preservation takes resources and a long-term commitment, but it also takes a very specific understanding of the long-term value of information (even information that is not popular or used by many people), and a commitment to the users of information. These are the strengths of libraries. Digital preservation is not something that can be cavalierly dismissed as the responsibility of others. (See: Preservation for all: LOCKSS-USDOCS and our digital future by James R. Jacobs and Victoria Reich in Documents to the People, Volume 38:3, Fall 2010).

Relying solely on the government to preserve its information is risky. Casper almost recognizes this when he cites the defunding of the Census Bureau’s Statistical Compendia unit and the cessation of the publication of the Statistical Abstract. But this is an example of an agency ceasing to create new information, not an example of an agency failing to preserve already created information. (So far, the Bureau has preserved old digital editions of Statistical Abstract and maintained online access to them.) Worse, Casper calls the privatization of the Statistical Abstract a “happy postscript.” Privatization of public information is hardly something that government documents librarians should be happy about. And it is hard to understand how relying on for-profit companies can be considered a good way to guarantee the preservation of the information or free access to it. Casper misses the opportunity to show that, when we rely only on government to preserve the digital information it creates, it becomes very easy for economics or politics or technology or bureaucracy to result in the loss of information. (See: When we depend on pointing instead of collecting and Government Link Rot and Information is not a Service, Service is not Information and Less Access to Less Information By and About the U.S. Government and Government Documents at the Crossroads.)

Casper does ask the right question early on in his article: “Who is responsible for this preservation?” But the only answer he seems to give is that “there are no answers.” But Casper is wrong. There is an answer and it is right in front of our eyes: libraries should take this responsibility. There are many actions that libraries can take now to promote digital preservation of government information at all levels of government (this is not just a federal issue!).

Preserve Paper copies. The FDLP is successfully preserving documents that were released in paper (and microfiche) quite nicely. We often hear that “digitizing” paper documents will “preserve” them, but we do not need to convert these documents to digital in order to preserve them. Digitization can provide better access and (if proper care and resources are invested in the digitization) increase the flexibility, usability, and re-usability of many documents. But digitization alone does not guarantee the preservation of the content. Worse, there are repeated calls for digitizing paper collections so that the paper collections can be discarded and destroyed. Such actions will endanger preservation of the content if they do not include adequate steps to ensure digital preservation of those newly created digital objects. Given that paper documents do not present a current preservation problem, and given that there is an enormous body of born-digital documents being created that do present a current preservation problem, one thing we can do is avoid creating new problems with proposals to destroy and discard paper collections before we have solved the problems of preservation of born digital documents. (We can still digitize paper documents in order to enhance access, but we should not use digitization as an excuse to discard or destroy the paper originals.) (See Wait! Don’t Digitize and Discard! A White Paper on ALA COL Discussion Issue #1a.)

Move FDsys forward. GPO is doing a good job of capturing born-digital Congressional information (not digitized material as Casper mistakenly points out) and is doing an increasingly good job of capturing Judicial Branch documents. The FDsys system is apparently well designed for long term preservation, too. There are, however, two things that FDLP librarians can do now: First, we can encourage GPO to get FDsys certified as a Trusted Digital Repository. This has been on GPO’s agenda for a few years, but budget uncertainties have delayed it. It would help if GPO heard from the FDLP community that this should be a high priority. Second, even if FDsys gets certified, we need more than one copy of FDsys in the hands of a single government agency in order to reduce the risk of loss of that content. There are several ways the FDLP community can further this goal: Encourage more libraries to become LOCKSS-USDOCS partners; Suggest to GPO that it allow the Internet Archive to crawl FDsys systematically; Investigate partnerships with other government agencies such as NARA (could NARA become a LOCKSS-USDOCS partner?); explore partnerships with the Digital Preservation Network; Create records for the Digital Public Library of America that point to LOCKSS-USDOCS copies when they are made publically accessible; follow up on the digital preservation recommendations in the NAPA report, Rebooting The Government Printing Office: Keeping America Informed in the Digital Age. (Full disclosure: James A. Jacobs has done technical consulting work for the Center for Research Libraries in its certification of digital repositories.)

Preserve More Documents of Executive Agencies. So much that is born-digital is produced by executive department agencies and is not captured by GPO. These are the new fugitive documents (those that are in scope of the FDLP but fall through the cracks; GPO PURLs are not fugitives). To be sure, this needs much more attention by GPO and depository libraries. FDLP libraries should concentrate on collecting born-digital fugitive documents and should work with GPO to develop a plan that focuses on developing programs that are attractive to agencies and that benefit agencies. This needs to be a higher priority for GPO with an increased focus and increased resources. GPO has the infrastructure in place (FDsys) to offer great benefits to agencies and this would help reduce agency fugitives.

Get Digital Deposit. FDLP libraries need to insist that GPO modify its long-outdated and counter-productive Superintendent Of Documents Policy Statement 301 (SOD 301) that limits deposit of digital information to so-called “tangible” products. This policy never made sense — it was nominally supposed to be a response to born-digital information, but instead of acknowledging that GPO could deposit born-digital information with libraries, it created a two-tier structure that authorized it to deposit some and prohibited it from depositing other digital information. SOD 301 says that it is ok for GPO to deposit digital information on “tangible” media, but not ok to deposit “online” digital information. But, worse than not making sense, this policy is actually harmful to digital preservation in two ways. First, it only allows deposit of those digital items that are least preservable and most prone to physical deterioration and file format obsolescence (floppies, CD-ROMs, DVDs, etc.). This burdens depository libraries with an almost impossible task of preservation and access. Second, it prohibits deposit of raw digital information in formats that are more easily preserved and less likely to become obsolete (digital object files in PDF, text, HTML, XML, etc.). These are the digital objects that could have been easily distributed more cheaply and more reliably than “tangible” media. These are the digital objects that FDLP libraries could have been preserving and making accessible (during government shutdowns, for example) — the very kind of digital objects that GPO now enthusiastically distributes to the LOCKSS-USDOCS private network. The effect of this policy has been to delay the active participation of FDLP libraries in digital preservation. There was never a good justification for this policy, but now it is so obviously out-of-date and has failed so demonstrably that keeping it is place should be considered an act of negligence. (See From Production to Preservation to Access to Use: OAIS, TDR, and the FDLP.)

Smart-Archive the Web. Although capturing web pages and preserving them is far from an adequate (or even accurate) form of digital preservation, it is a useful stop-gap until producers understand that depositing preservable digital objects with trusted repositories is the only way to guarantee preservation of their information. Therefore, FDLP libraries should use web archiving tools, including services such as Archive-It (as Casper points out, if in a confusing way). Every FDLP library should at least consider “smart-archiving” of web-based information. Web-archiving should not be seen as everything-or-nothing: libraries can do focused selection to build collections useful to their own users. This is smart-archiving. Selections can be large (an agency or a domain) or small (crawl a few seeds) or even one-document-at-a-time. Examples of these models exist. See, for example: the Chesapeake project, the work of The Columbia Libraries (The Integrity of Research Is at Risk: Capturing and Preserving Web Sites and Web Documents and the Implications for Resource Sharing), the California Digital Library Web Archiving Services(**), and the Stanford Libraries EEMs project (Everyday Electronic Materials in Policy and Practice).

Promote Digital Preservation. Casper’s series of articles is about “promotion” of government information and his recommendation in this article about preservation is that we should “keep promoting these online sources.” He should have stressed the most important promotion that is needed today: the promotion of the role of FDLP libraries in actively preserving digital government information. The time when FDLP libraries could be passive in digital preservation is long past. The time when FDLP libraries could look to others to take care of digital preservation of government information is long past. FDLP libraries can work with others, but we must actually work with them, not leave the work to them.

James A. Jacobs, Emeritus Data Librarian, UC San Diego
James R. Jacobs, Federal Government Information Librarian, Stanford University
Co-founders, Free Government Information

**Editor’s note: CDL recently announced that the WAS service and collections was being transitioned over the Internet Archive’s Archive-it service.

Archives