Home » Commentary » Agenda for 2019: Exploring Digital Deposit

Our mission

Free Government Information (FGI) is a place for initiating dialogue and building consensus among the various players (libraries, government agencies, non-profit organizations, researchers, journalists, etc.) who have a stake in the preservation of and perpetual free access to government information. FGI promotes free government information through collaboration, education, advocacy and research.

Agenda for 2019: Exploring Digital Deposit

As 2018 ends, it is time to start setting the agenda for the FDLP for 2019. This year has a lot of potential despite (or because of) the failure of Title 44 reform, the shutdown of the government, and the general political gridlock of Congress.

2018: The Year of "Modernizing"

The biggest FDLP news in 2018 was the progress of the FDLP Modernization Act of 2018. Although it failed to pass during the 115th Congress, there is a good chance that it will be re-introduced in the 116th Congress, possibly with some changes. Although we believe that the last version of the bill still had some significant flaws, it also had some much-needed improvements, particularly in the areas of privacy, preservation, and free public access — providing much-needed corrections to the obvious flaws of the 1993 legislation.1 The biggest leap forward in the bill was its introduction of the concept of "digital deposit" into law.

2019: The Year of Exploring Digital Deposit

The Depository Library Council (DLC) has now recommended the creation of a working group to explore digital deposit. (See announcement, description, recommendations (PDF) [Recommendation #3]). DLC says that appointing the Working Group (WG) is critically important for "reaching consensus on how federal information in digital forms should be disseminated to and amongst the FDLP community for the benefit of all our users" and we agree.

This is a potentially important development. It gives the FDLP an opportunity to explore not just "dissemination" (or what we like to call "deposit") but the future of preservation and access and service.

To be sure, this is only a potential opportunity. If the WG and the FDLP community focus only on technical issues of delivery of digital objects, it will fail. This, as DLC wisely says, is about benefits to users of government information.

We believe that this should be an opportunity for the whole FDLP community to participate in imagining the future of the FDLP. This is not a time to stand on the sidelines and wait for a committee report. With that in mind, we offer our own preliminary thoughts on how the FDLP community might explore digital deposit in the coming year.

Foremost, we think the process should truly explore the concept of digital deposit by looking forward, not backward. This should not be about item numbers and replicating the model of deposit of physical items. It should be thinking about functionality, outcomes, and opportunities. The exploration should not be about technical questions of how to move digital bits to libraries, but about the opportunities that digital deposit brings for enhancing and improving access, preservation, and service to users. This should not be about a small change to the status quo, but about a better future for preservation and access to government information.

These ideas are worth considering now, regardless of the status of the Title 44 reform bill. We can imagine the future today and there are things that we can and should be doing now that do not require a change in the law.

Let’s remind ourselves that, in effect, digital deposit has already existed for almost 10 years. Thirty-six LOCKSS-USDOCS libraries are already getting a copy of everything in govinfo.gov every day. We should be asking, What more can we do? and What should we do next? How can we improve FDLP for users?

Background and Context

Background: For those who haven’t been following this issue, here is a bit of background and context.

More than twenty years ago, GPO chose to explicitly exclude almost all digital content from deposit with FDLP libraries and it has continued to do so as it updated its policies.2 While most libraries, underfunded and ill-staffed for the new digital era, were content to cede this responsibility to GPO twenty years ago, times have changed. Responses to biennial survey questions between 2005 and 2015 reveal that hundreds of FDLP libraries are interested in having digital content deposited along with print content, and that between 100 and 200 libraries are already downloading digital documents and hosting them locally.3 After initially rejecting a proposed LOCKSS collaboration with libraries,4 GPO relented, and now thirty-six libraries are actively, collaboratively preserving GPO’s govinfo.gov content in the LOCKSS-USDOCS program.5 The proposed revision to Title 44 includes a provision (§1743a) that allows "optional digital deposit" but isolates it from the responsibilities of FDLs (§1744-§1746), limits it with qualifying language ("unless impracticable"), and gives GPO the authority to further limit it in unspecified ways.6

2019 Context: A lot has changed in the last twenty-five years since the current system was devised. And we know a lot more today. Any exploration of digital deposit today should be informed by the current state of access and preservation and technology. We should use this opportunity to plan for the future needs of users, not just update an antiquated system. The recent Environmental Scan from the Preservation of Electronic Government Information (PEGI) Project provides much needed context for any discussion of digital deposit. It says:

[T]he proliferation of born-digital government information requires new approaches to collection and preservation. The web has made access to government information faster and digital government publications more readily available for many users. However, the sheer volume of information being produced, and its decentralized distribution through hundreds of government domains, has disrupted traditional models of preservation.7 (page 18)

It should be obvious to everyone by now that GPO cannot preserve all government information by itself. GPO acknowledged this in its 2016 National Plan, noting the need for collaborators.8 There are metrics that make this obvious. GPO’s Catalog of U.S. Government Publications (CGP) (also known as the "National Bibliography of U.S. Government Publications") is nowhere near complete; of the many millions of digital objects published by the government over the last forty years, the CGP includes only about 500,000.9 Even among those items that are in the CGP, many are not controlled or preserved by GPO — for those, CGP merely points to government websites, not to govinfo.gov.10 GPO’s recent PURL report says that almost 30% of all PURLs (permanent URLs) point to such content. The PEGI report makes clear that most federal government information is not in the control of GPO. Compare, for example, the 160 million and 350 million URLs harvested by the 2012 and 2016 End of Term Crawls to the approximately 2.5 million titles currently managed and made available by GPO through govinfo, other GPO servers, and official agreements with agency partners.

Figure 1 from the PEGI report, comparing the approximately 2.5 million titles currently managed and made available by GPO through govinfo, other GPO servers, and official agreements with agency partners, to the 160 million and 350 million URLs harvested in 2012 and 2016 End of Term Crawls, respectively.

Ideas for the Working Group

We suggest that the FDLP community should start thinking and sharing ideas now. Let’s give the Working Group a head start with some innovative ideas and wish lists. We will get this started with our suggestions for things to avoid and some ideas to explore.

What to avoid

We’ve all been on committees that went nowhere. How do we avoid that trap and use this opportunity to really explore the future? We suggest that there are several things the WG should avoid doing.

  • Do not choose specific technologies. There will be a temptation to immediately start talking about things like FTP, OAI-PMH and "push vs. pull." Lots of technologies for moving digital bits around exist and new ones will become available over time. The WG would be wasting its time if it focused on the short-term, how-do-we-do-it-today, technological issues. It would be better to acknowledge that technical approaches today will, inevitably, age and become obsolete in short order.

  • Avoid discussing costs. Yes, we all have limited resources. But if we assume that we are stuck with last year’s budget or an even smaller budget next year, we will not be able to explore or imagine new things — only smaller, cheaper things. Instead of imagining the worst budget, imagine what you’d do if you had all the funding you need. This is a time for exploring and imagining. Once we know where we want to go, we can assess costs, create budgets, and establish priorities. We might even find that our funding partners are more willing to increase our budgets if we can demonstrate a positive vision that will offer improvements that they can use.

  • Avoid replicating physical deposit. Do not start with "item numbers." Avoid the temptation to replicate physical deposit in the digital world. Concepts that we used for physical deposit were based on limitations (shelf space, box size, mailing costs, those wonderful lighted bins at GPO) that do not exist in the digital world. Yes, the digital world has its own limitations, but address those when you get to them; don’t start out with "legacy" limitations.

Ideas to Consider

  1. Improve services for users. Digital deposit is more than moving bits. It is about FDLP libraries having their own digital collections for which they can build their own digital services for their own Designated Communities. And those Communities can include people anywhere in the world. Imagine FDLP leading a movement of shared, free, public, enhanced access to important content. So, just as a matter of attitude in starting the exploration, imagine what such digital collections and services might look like for users. We do not have to replicate govinfo.gov. We do not have to replicate agency web sites. Why would we? Those already exist! Instead, we can begin by imagining that libraries might build collections and services that directly address the needs of their communities in ways that agencies and GPO do not. Think about collections built around user-needs rather than provenance. Think about how much libraries pay for “value-added” commercial services and how your users would thank you if you added value to government information. How would we structure digital deposit to facilitate better services for users?

  2. Enhance Selection for libraries. Consider how digital deposit can be much more flexible than paper deposit was. For example, instead of selecting static, pre-defined, narrow categories based solely on provenance, what about a “faceted selection” tool that would allow them to specify what they want in several different dimensions simultaneously:

    • by format (pdf, html, AV, spreadsheets, raw data….)
    • by keyword
    • by agency
    • by subject (e.g., cataloged LCSH subjects, subjects auto-generated by topic-modeling)
    • by series
    • special categories focused on editions or generations of issues (e.g., slip laws or final laws, public or private laws, current or superseded materials, etc.)
  3. Promote flexible library technologies. Instead of starting with technologies and asking what they can do, consider starting with functions, outcomes, and services, and ask which technologies can provide those. Instead of recommending specific technical approaches, consider recommending the creation of a permanent technical committee to provide advice and support to GPO and to libraries that acquire digital content. Such a committee could provide ongoing, up-to-date recommendations of software, hardware, best practices, and solutions for issues such as:
    • push or pull deposit
    • Technical methods (User Interface) of specifying selection.
    • Storage technologies
    • Technologies for providing services (e.g., indexing, site maps, User Interfaces)
    • Cataloging and metadata issues.
  4. Improve access for users. Link rot is a persistent problem. When a link has changed, moved, or disappeared, it can be difficult or impossible for a user to get access to that content. “Content drift” (when a link still works, but the content has changed) can be an even worse problem for users.11 How can digital deposit improve accurate access? Here is one idea: Recommend a system of using permanent URLs (such as DOIs) that support a single URL that can point to multiple copies of the same digital object (e.g., on an agency site, in govinfo.gov, and in various FDL’s digital repositories). This would protect against any failure at GPO or an agency by automatically falling-over to an FDLP library copy (and vice versa). DOIs could also provide one piece of the puzzle for ensuring the authenticity of deposited digital objects. It could also enhance the efficiency of the delivery of content to end users by acting as a virtual content delivery network (CDN).12

  5. Enhance existing digital deposit. As noted above, thirty-six libraries are already collaboratively preserving the content of govinfo.gov. Currently those LOCKSS caches are essentially dark archives. Why not open those existing copies up to the public? The WG could explore how to work with the LOCKSS program to investigate making the content already in the LOCKSS-USDOCS network open to the public. This could include:

    • At minimum, access at digital-object (i.e., "title") level with DOIs.

    • Explore the possibility of replicating the LOCKSS-USDOCS cache in the Internet Archive.

    • Explore ways to make a public interface to LOCKSS content replicate the GPO govinfo.gov interface and ways to create new (different from GPO) interfaces.

  6. Enhance Preservation and Access. GPO has cataloged about 500,000 items in the CGP since 1976, but almost 30% of the items in CGP with PURLs are not controlled or made public by GPO. Why not make that 30% public through the FDLP today? The WG could recommend ways for FDLP libraries to acquire government information that is already cataloged by GPO in the CGP but that GPO does not ingest into govinfo.gov.

  7. Acquire Fugitives. As noted above, most born-digital government information is "fugitive" — outside of GPO and the FDLP. The WG could develop plans to acquire and preserve those digital fugitives. Since, with digital deposit, libraries will be building collections, this would explore how those libraries might ingest content that is missing from govinfo.gov. This could also address how FDLP libraries could, in turn, share metadata and content with GPO and collaboratively expand the National collection as represented by CGP and by govinfo.gov. One way to do this would be to promote web harvesting:

    • Include web harvests as "deposit". The WG could explore how to promote ongoing web harvesting by FDLP libraries and labeling those harvests as officially part of the FDLP and the National Collection. Such harvests would be like the End-of-Term harvests, but small, focused, and ongoing.
    • Promote specialty harvests. (e.g., like LoC does for hot topics).
  8. Help GPO, Help Agencies. GPO cannot do everything by itself and preservation and long-term access is low on the agenda of most agencies. What if the WG explored ways to promote and support partnerships between FDLP libraries and government agencies? A recent report by the Library of Congress suggests that GPO needs to do more direct outreach to agencies. What if FDLP libraries did some of that outreach in conjunction with GPO? Individual FDLP libraries could develop relationships with individual agencies, offering to mirror agency content, host agency data, provide agency content backup, and so forth. Such FDLP/agency partnerships could also help agencies understand the need for creating preservable digital objects suitable for long-term life-cycle preservation.13

  9. Reach out to new partners. A lot of work outside of (or parallel to) the FDLP has been done to preserve government information recently. Like the LOCKSS-USDOCS project, a lot of that information is either dark or languishing behind primitive user interfaces. The WG could explore how FDLP could collaborate with existing projects such as EOT archives, data-rescue harvests, Azimuth Climate Data Backup Project, Climate Mirror, and the EDGI project to make the content of these projects more easily discoverable and usable and a visible part of the National Collection.

  10. Identify start-up funding. Once we have envisioned our future, it will be time to think about costs. A lot of the above ideas could benefit from one-time funding and start-up funding. The WG could identify good sources for a large metadata creation project, for hardware purchases, and for training, for example.

Conclusions

We believe that the WG should acknowledge that digital content offers new possibilities as well as new challenges for access, preservation, and user-services. Most importantly, the goal of digital deposit solutions should be to benefit end-users of digital content and address their needs. The attitude we bring to these issues will be key to the success or failure of the Working Group. If we approach digital deposit as an "unfunded mandate" that will reduce library "flexibility," it will fail. If the FDLP community leaves this huge exploration project up to a single small committee, it will fall short of what we need. If we think of “digital deposit” as a narrow issue of moving digital bits around, we will fail our users. But, if we, as a community, approach the issues by looking for opportunities that will strengthen and expand digital preservation and access and services for information users, we will all win.

What are your ideas for the Working Group? Share them here.

Authors:
James A. Jacobs, University of California San Diego
James R. Jacobs, Stanford University

endnotes

  1. This is not a drill. The future of Title 44 and the depository library program hang in the balance by James A. Jacobs and James R. Jacobs. FGI (July 27, 2017).
  2. In the mid-1990s, two reports, The Electronic Federal Depository Library Program: Transition Plan, FY 1996 – FY 1998 and the Study to identify measures necessary for a successful transition to a more electronic Federal Depository Library Program, limited the responsibility of FDLP libraries to so-called "tangible information products" and gave to GPO alone the responsibility for so-called "remotely accessible information products." This was later instantiated in Superintendent of Documents Policy Documents, the current version of which is SOD 301 ("Dissemination/Distribution Policy for the Federal Depository Library Program"). It has been further fixed in policy by the creation of so-called "All or Mostly Online Federal Depository Libraries" a policy which defines "depostory libraries" as including those that provide "access to online resources" without having any content (even "tangible" content) deposited with them.
  3. Digital Deposit And The Biennial Survey: context and actions, by James A Jacobs, FGI (November 1, 2017).
  4. GPO LOCKSS Report: Mistakes and Irrelevancies by Dan Corwall, FGI (April 8, 2007).
  5. ‘Issued for Gratuitous Distribution’: The History of Fugitive Documents and the FDLP. James R. Jacobs. Against the Grain 29(6) December 2017/January 2018.
  6. H.R.5305 — FDLP Modernization Act of 2018, 15th Congress (2017-2018).
  7. Environmental Scan of Government Information and Data Preservation Efforts and Challenges by Sarah K Lippincott. Atlanta, Georgia: Educopia Institute, 2018.
  8. National Plan for Access to U.S. Government Information: A Framework for a User-Centric Service Approach to Permanent Public Access, U.S. Government Publishing Office (February 2016).
  9. About the Catalog of U.S. Government Publications.
  10. In a recent announcement about the current government shutdown, GPO notes that "The Catalog of U.S. Government Publications (CGP) will be available, however GPO cannot ensure that all PURLs pointing to other Federal agency resources will work during this time."
  11. See, for example, Thoughts on Referencing, Linking, Reference Rot by Herbert Van de Sompel, and Temporal Context in Digital Preservation, and Link Rot up to 51% for .gov domains, and Government Link Rot, and Another study of link rot and content drift.
  12. Could DOIs Solve Three Depository Challenges? by James A Jacobs, FGI (March 24, 2010).
  13. Disseminating and Preserving Digital Public Information Products Created by the U.S. Federal Government: A Case Study Report, Prepared by the Federal Research Division, Library of Congress under an Interagency Agreement with the Library Services and Content Management Directorate, U.S. Government Publishing Office August 2018. See also: New Report on Disseminating and Preserving Digital Government Information, by James A Jacobs FGI (August 30, 2018).

CC BY-NC-SA 4.0 This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.


Leave a comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Archives