Home » Posts tagged 'digital fdlp'
Tag Archives: digital fdlp
Public access to Congressionally mandated reports one step closer to reality!
Congress passed the Access to Congressionally Mandated Reports Act (ACMRA) as part of the 2023 defense authorization bill — and many including FGI cheered!
This week, the collection of these important reports came one step closer to reality as the White House Office of Management and Budget released detailed guidance for agencies to implement the ACMRA starting in October, 2023. The Federal News Network has more context. In a nutshell, “starting on Oct.16, anytime an agency is drafting a legally-required report to Congress, they’ll also need to prepare to send it to the Government Publishing Office to be hosted in a new publicly-accessible web portal GPO is building.” GPO has also announced its work on this important project for government transparency. This will also be a boon to the Federal Depository Library Program (FDLP) as thousands of Congressionally mandated reports make their way into depository library catalogs and collections.
My great hope is that this will be a template going forward for how executive agencies can work with GPO to bring their publications and data into the National Collection of U.S. Government Public Information where it can be collected, described, preserved and given broad public access via the internet and through the federal depository library network.
Some facts about the born-digital “National Collection”
We want to contribute a couple of facts and context about the born-digital “National Collection” to help inform the discussions on the priorities of GPO and FDLP libraries at the upcoming spring 2022 Depository Library Conference as well as discussions surrounding the work of the all-digital FDLP task force.
We believe these facts lead to an unavoidable conclusion: GPO and FDLP need to explicitly state a strong priority of how to deal with unpreserved born-digital government information.
Here are the facts.
Who produces born-digital government information?
We have been examining data from the 2020 End-of-Term crawl. We found (not surprisingly) that, by far, the most prominent types of born-digital content on the web are web pages (HTML files) and PDF files. We counted just unique web pages and PDF files from the government web in EOT20 and found more than 126 million web pages and more than 2.8 million PDF files for a total of more than 129 million born-digital items. More than 80% of that content is from the executive branch.
What is GPO preserving?
GOVINFO: There are roughly 2 million PDFs in Govinfo. These items are secure and preserved in GPO’s certified trusted digital repository. By our count, 74% of the born-digital PDF content in Govinfo is from the judicial branch, 24% from the legislative branch, and only 2% from the executive branch. In other words, GPO devotes almost 3/4 of its born-digital preservation space to the judiciary, which produces only about 2% of all born-digital government information. Conversely, GPO devotes only 2% of its born-digital preservation space to the executive branch, which produces more than 80% of born-digital government information.
FDLP-WA. The FDLP Web Archive on the Internet Archive’s Archive-It servers had 211 “collections” or “websites” when we counted earlier this year. Most of the content of the FDLP-WA is from the executive branch (by our count, it only includes 3 congressional agencies and one judicial agency). GPO describes its web harvesting as targeted at small websites. By our count, using the EOT20 data, there are 23,666 “small” government websites and altogether they contain only .06% of the public information posted on the government web. By contrast 99% of Public Information on the government web is hosted by 1,882 “large” websites, none of which GPO is targeting.
GPO also stores some copies of some cataloged web-based content on its permanent.fdlp.gov server. We do not have exact figures on the quantity of content stored, but we do know that, on average, GPO catalogs just over 19,000 titles a year. As a percentage of just the PDFs on the government web in 2020, that is less than 1% per year.
GPO has a few “digital access” partnerships (NASA, NLM, GAO and a couple of others), but there’s only 1 digital preservation stewardship agreement: with University of North Texas (UNT) libraries (check out the difference between a “digital access partner” and a “digital preservation steward” here).
Although we do not have data on how quickly content on the web is altered or removed, one study determined that 83% of the PDF files present in the 2008 EOT crawl were missing in the 2012 EOT crawl.
Conclusions
-
GPO is doing a good (though not comprehensive) job of preserving born-digital content from the judicial and legislative branches but, by our rough estimate, this accounts for only about 15% of born-digital government information.
-
GPO is preserving very, very little of the born-digital content of the executive branch, which is where about 80% of born-digital publishing is being done.
-
To ensure the preservation of this executive branch born-digital government information, GPO needs an active program to acquire and preserve it. Depository Library Council (DLC) should create a strong statement recognizing this huge gap in digital preservation and recommending that GPO prioritize developing plans for addressing it.
Authors
James A. Jacobs, University of California San Diego
James R. Jacobs, Stanford University
GPO Director Halpern appoints task force to study an all-digital FDLP
GPO Director Hugh Halpern has just named a task force to study the feasibility of moving to an all-digital Federal Depository Library Program (FDLP). There are 23 people named to the task force including from Depository Library Council, depository librarians, library associations, and federal agencies.
I’ll withold comment on the feasibility of this task force until I see some of the work they’re doing. But I hope that the work of current GPO working groups on PURLs, Digital Deposit, and Collections and Discovery Services – as well as the long-running work on digital preservation and information access of the PEGI Project and FGI(!) – will be leveraged to come up with solid recommendations for a positive future for public access to and preservation of federal government information supported by FDLP libraries and librarians.
The U.S. Government Publishing Office (GPO) Director Hugh Nathanial Halpern appointed a task force to study the feasibility of an all-digital Federal Depository Library Program (FDLP). As 97 percent of Federal publications are born digital, Director Halpern is charging the Task Force on a Digital Federal Depository Library Program (“Task Force”) to determine whether an all-digital FDLP is necessary, and if so, define the scope of an all-digital depository program and make recommendations as to how to implement and operate such a program. This will include an examination of the current landscape in Federal depository libraries, of FDLP-related operations at the GPO, and of the dissemination of publications by Federal agencies.
Legislative Revisions to Title 44 U.S.C., Chapter 19 re the FDLP
A new Congress has begun, and that means another shot at “modernizing” the Federal Depository Library Program (FDLP), which hasn’t seen any new and substantive legislative change since the 1993 GPO Access Act. Here at FGI, we’re busy pouring over GPO’s proposed legislative revisions for Title 44 along with Depository Library Council’s feedback to GPO. We plan to submit feedback and recommendations to GPO and you can too! You have until March 5, 2021 to submit feedback via GPO’s form. Do it today!
FGI’s recommendations for creating the “all-digital FDLP”
April 11, 2022 / 7 Comments on FGI’s recommendations for creating the “all-digital FDLP”
As a follow-up to our recent post, “Some facts about the born-digital “National Collection,” we want to suggest some specific actions that GPO and FDLP libraries can take to do a better job of collecting and preserving born-digital content for the “National Collection”.
For context, our starting assumption is that GPO and FDLP have two connected priorities: preservation and user services. The two go hand-in-hand. To be “preserved,” content must be discoverable, deliverable, readable, understandable, and usable by people. Broadly speaking, this can be understood as “user services.” Addressing these priorities at scale will require innovative, collaborative approaches. Old solutions that do not scale will not work.
With regard to preservation, digital objects have to be under sufficient control of the preservationist to be preservable. As we pointed out in our previous post, the vast bulk of born-digital government Public Information is not being preserved by GPO or FDLP libraries. But, worse than this, GPO and FDLP have no active plan to address that gap in preservation. While there are lots of projects to digitize historic paper documents in FDLs, there is no active project to acquire, describe, store, manage, and preserve — ie., curate! — the bulk of born-digital content (the End of Term crawl notwithstanding). Regardless of what minor steps GPO is taking, the results are, at best, insignificant when compared to the scale of the problem. What is needed is a recognition of the problem of the huge gap in digital preservation and a specific plan for developing active strategies to address the problem. Waiting for agencies to deposit with GPO doesn’t work. Simply advertising GPO’s publishing services is not enough. GPO needs new strategies.
The two most important aspects of user services are “discovery” (providing tools that enable users to find the information they need) and “usability” (providing tools that enable users to use the content they discover). The two approaches GPO uses for discoverability (catalog records in the Catalog of Government Publications and a hierarchical presentation of agencies and publication types and dates in govinfo.gov) are woefully incomplete in the 21st century. One resembles a legacy card catalog and the other resembles a 1990s Yahoo!-like directory interface. Each has some utility, but they are not sufficient. GPO needs to work with FDLP libraries to develop new user-centric tools for discovery.
As for usability, GPO’s approach is still very document-centric, being designed to deliver one document at a time for reading. It should be evident to all that there are many more potential uses of government information than simply retrieving one document at a time. 21st century users are more sophisticated and have more use-case needs than that. We believe that GPO should continue to provide the services it does through Govinfo, but it should supplement that work by developing programs, tools, and support for FDLs to develop new uses built on the specific use-case needs of Designated Communities of users — and potential users. Doing that will have the additional benefit of helping drive collection development — and preservation.
GPO already has policies in place that can be read to include the broader vision we offer here. For example, GPO’s Draft Strategic Plan Fiscal Years 2023 Through 2027, while explicitly mentioning digitizing paper collections also includes the vague phrase “focus on adding new collections and filling the gaps in existing collections.” Although, in the context, it seems to imply filing in gaps of paper/digitized collections, it could be taken as a broader mission to address the real preservation gap of new, born-digital content. Nevertheless, vague phrases, are not enough. Policies and projects need to specifically address the massive and growing born-digital preservation gap with action plans.
Given our assumptions and priorities, here are some suggestions for steps GPO can take now.
Now THAT’s an “all-digital FDLP”!
Authors
James A. Jacobs, University of California San Diego
James R. Jacobs, Stanford University
Share this: