As a follow-up to our recent post, “Some facts about the born-digital “National Collection,” we want to suggest some specific actions that GPO and FDLP libraries can take to do a better job of collecting and preserving born-digital content for the “National Collection”.
For context, our starting assumption is that GPO and FDLP have two connected priorities: preservation and user services. The two go hand-in-hand. To be “preserved,” content must be discoverable, deliverable, readable, understandable, and usable by people. Broadly speaking, this can be understood as “user services.” Addressing these priorities at scale will require innovative, collaborative approaches. Old solutions that do not scale will not work.
With regard to preservation, digital objects have to be under sufficient control of the preservationist to be preservable. As we pointed out in our previous post, the vast bulk of born-digital government Public Information is not being preserved by GPO or FDLP libraries. But, worse than this, GPO and FDLP have no active plan to address that gap in preservation. While there are lots of projects to digitize historic paper documents in FDLs, there is no active project to acquire, describe, store, manage, and preserve — ie., curate! — the bulk of born-digital content (the End of Term crawl notwithstanding). Regardless of what minor steps GPO is taking, the results are, at best, insignificant when compared to the scale of the problem. What is needed is a recognition of the problem of the huge gap in digital preservation and a specific plan for developing active strategies to address the problem. Waiting for agencies to deposit with GPO doesn’t work. Simply advertising GPO’s publishing services is not enough. GPO needs new strategies.
The two most important aspects of user services are “discovery” (providing tools that enable users to find the information they need) and “usability” (providing tools that enable users to use the content they discover). The two approaches GPO uses for discoverability (catalog records in the Catalog of Government Publications and a hierarchical presentation of agencies and publication types and dates in govinfo.gov) are woefully incomplete in the 21st century. One resembles a legacy card catalog and the other resembles a 1990s Yahoo!-like directory interface. Each has some utility, but they are not sufficient. GPO needs to work with FDLP libraries to develop new user-centric tools for discovery.
As for usability, GPO’s approach is still very document-centric, being designed to deliver one document at a time for reading. It should be evident to all that there are many more potential uses of government information than simply retrieving one document at a time. 21st century users are more sophisticated and have more use-case needs than that. We believe that GPO should continue to provide the services it does through Govinfo, but it should supplement that work by developing programs, tools, and support for FDLs to develop new uses built on the specific use-case needs of Designated Communities of users — and potential users. Doing that will have the additional benefit of helping drive collection development — and preservation.
GPO already has policies in place that can be read to include the broader vision we offer here. For example, GPO’s Draft Strategic Plan Fiscal Years 2023 Through 2027, while explicitly mentioning digitizing paper collections also includes the vague phrase “focus on adding new collections and filling the gaps in existing collections.” Although, in the context, it seems to imply filing in gaps of paper/digitized collections, it could be taken as a broader mission to address the real preservation gap of new, born-digital content. Nevertheless, vague phrases, are not enough. Policies and projects need to specifically address the massive and growing born-digital preservation gap with action plans.
Given our assumptions and priorities, here are some suggestions for steps GPO can take now.
- Publicly and explicitly, acknowledge and publicize the born-digital preservation gap.
- Develop an aggressive, active strategy for gaining agreements with executive branch agencies to deposit their born digital content with GPO. Work with Congress to provide funding to agencies for providing those deposits and to GPO for receiving and processing them;
- Develop an aggressive, active strategy to promote and enforce existing OMB A-130 policy (“making Government publications available to depository libraries through the Government Publishing Office regardless of format”) for depositing executive branch content with GPO. The policy exists, but OMB does nothing to enforce it. The strategy could include working with NARA, the Federal CIO Council, the Federal Web Archiving Group (consisting of GPO, NARA, Library of Congress, the National Library of Medicine, the Smithsonian Institution, Department of Education, and Department of Heath and Human Services) to support OMB enforcement of that policy and set new policies and regulations for preserving federal agency publications and data;
- Develop an aggressive, active strategy for the development of new tools for harvesting and processing Public Information and metadata, and for the processing of that harvested data for the automated generation of rich metadata for the description, management, preservation, discovery, delivery, and use of harvested data and metadata. Develop tools, workflows, and policies to help FDLs preserve born-digital government information. This can include identifying and acquiring unreported documents, new methods of selection to build digital collections, metadata creation, and the development of digital repositories connected by APIs and a robust system of stable Permanent Identifiers;
- Develop a plan for active, continual harvesting of born-digital content that remains undeposited by agencies with GPO. Develop new strategies for targeting content by document and file-type, use-case, and source. Develop workflows to allow FDLs and other libraries and harvesters to feed their web archiving activities into the National Collection through ingest or cooperative metadata creation, or both;
- Develop next-generation tools and methods for extracting digital objects and metadata from existing Web archives for inclusion in the National Bibliography;
- Develop an active plan for obtaining federal funding to fund libraries, agencies, and GPO to do this ongoing and critical work.
Now THAT’s an “all-digital FDLP”!
James A. Jacobs, University of California San Diego
James R. Jacobs, Stanford University
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
please send this statement to every member of the Joint Committee on Printing, the Librarian of Congress, Archivist of the US and head of OMB. I do not think they are reading your posts. This is advice needed by the policy makers.
Thanks for your comment Bernadine. I’m sure you know those folks far better than I do. Please feel free to forward to everyone you know who has a say in this critical policy decision about curating and preserving publications and data produced at taxpayer expense.
Thanks for this. You guys have really covered the waterfront with these comments and your earlier post of Apr 2022. What GPO needs is the funding required for the staff and contractor support to implement your recommendations. Congress recently voted to give the Director a 10 year term. Now it needs to give GPO the money it will need to get this job done.
Thanks for the comment Andy. Nice to know that you’re still following in these important policy discussions. Please feel free to send to anyone you know on the Hill. It’s important to know that to do public information right for the long term and to assure free public access funds need to be budgeted. The internet doesn’t have some secret button that presto curates and preserves information.
I couldn’t agree more. Hopefully the good folks at GPO, including its new SuDocs, will look closely at your recommendations and come up with estimates of the necessary funding that can then be shared with its oversight and appropriations committees.
Hi again Andy. I hope you’ve submitted comments to the draft report for an all-digital FDLP as we have done in our most recent post.
I submitted comments but was unable to save them to my system. I urged that JCP separate office and staff be restored so there is a strong voice in Congress for both digital and paper programs. GPO does not have the clout with the other branches that JCP and could have again.
I also strongly support paper and preservation of paper and digital.