Home » Articles posted by James A Jacobs

Author Archives: James A Jacobs

Our mission

Free Government Information (FGI) is a place for initiating dialogue and building consensus among the various players (libraries, government agencies, non-profit organizations, researchers, journalists, etc.) who have a stake in the preservation of and perpetual free access to government information. FGI promotes free government information through collaboration, education, advocacy and research.

Some facts about the born-digital “National Collection”

We want to contribute a couple of facts and context about the born-digital “National Collection” to help inform the discussions on the priorities of GPO and FDLP libraries at the upcoming spring 2022 Depository Library Conference as well as discussions surrounding the work of the all-digital FDLP task force.

We believe these facts lead to an unavoidable conclusion: GPO and FDLP need to explicitly state a strong priority of how to deal with unpreserved born-digital government information.

Here are the facts.

Who produces born-digital government information?

We have been examining data from the 2020 End-of-Term crawl. We found (not surprisingly) that, by far, the most prominent types of born-digital content on the web are web pages (HTML files) and PDF files. We counted just unique web pages and PDF files from the government web in EOT20 and found more than 126 million web pages and more than 2.8 million PDF files for a total of more than 129 million born-digital items. More than 80% of that content is from the executive branch.

What is GPO preserving?

GOVINFO: There are roughly 2 million PDFs in Govinfo. These items are secure and preserved in GPO’s certified trusted digital repository. By our count, 74% of the born-digital PDF content in Govinfo is from the judicial branch, 24% from the legislative branch, and only 2% from the executive branch. In other words, GPO devotes almost 3/4 of its born-digital preservation space to the judiciary, which produces only about 2% of all born-digital government information. Conversely, GPO devotes only 2% of its born-digital preservation space to the executive branch, which produces more than 80% of born-digital government information.

FDLP-WA. The FDLP Web Archive on the Internet Archive’s Archive-It servers had 211 “collections” or “websites” when we counted earlier this year. Most of the content of the FDLP-WA is from the executive branch (by our count, it only includes 3 congressional agencies and one judicial agency). GPO describes its web harvesting as targeted at small websites. By our count, using the EOT20 data, there are 23,666 “small” government websites and altogether they contain only .06% of the public information posted on the government web. By contrast 99% of Public Information on the government web is hosted by 1,882 “large” websites, none of which GPO is targeting.

GPO also stores some copies of some cataloged web-based content on its permanent.fdlp.gov server. We do not have exact figures on the quantity of content stored, but we do know that, on average, GPO catalogs just over 19,000 titles a year. As a percentage of just the PDFs on the government web in 2020, that is less than 1% per year.

GPO has a few “digital access” partnerships (NASA, NLM, GAO and a couple of others), but there’s only 1 digital preservation stewardship agreement: with University of North Texas (UNT) libraries (check out the difference between a “digital access partner” and a “digital preservation steward” here).

Although we do not have data on how quickly content on the web is altered or removed, one study determined that 83% of the PDF files present in the 2008 EOT crawl were missing in the 2012 EOT crawl.


  • GPO is doing a good (though not comprehensive) job of preserving born-digital content from the judicial and legislative branches but, by our rough estimate, this accounts for only about 15% of born-digital government information.

  • GPO is preserving very, very little of the born-digital content of the executive branch, which is where about 80% of born-digital publishing is being done.

  • To ensure the preservation of this executive branch born-digital government information, GPO needs an active program to acquire and preserve it. Depository Library Council (DLC) should create a strong statement recognizing this huge gap in digital preservation and recommending that GPO prioritize developing plans for addressing it.


James A. Jacobs, University of California San Diego
James R. Jacobs, Stanford University

Government recommendations to preserve government information not preserved by government

James and I are writing a book on preserving government information. In the course of researching the book, we find ourselves hunting down government publications that we need but that are not available from the government or from any FDLP library. Each of these documents has its own explanation for why it is missing and each explanation tells a story about the gaps in preservation of government information.

This is one of those stories. Think of this as a long footnote to a future book.

In 2002, Congress established the Interagency Committee on Government Information (ICGI). One of its charges (more…)

House Staff Report on Political Interference with Coronavirus Response

The staff of the House Select Subcommittee On The Coronavirus Crisis has issued a 15 page report that documents 47 separate instances of political interference with the federal response to the coronavirus crisis.

press release:

report [PDF, 15pp]:


  • The Trump Administration’s Pattern Of Political Interference In The Nation’s Coronavirus Response, Select Subcommittee On The Coronavirus Crisis, Staff Analysis (Oct 1, 2020).

    The analysis demonstrates that over the last eight months, the Administration engaged in a persistent pattern of political interference—repeatedly overruling and sidelining top scientists and undermining Americans’ health to advance the President’s partisan agenda.

    The Administration’s interference has occurred both in public view and in private, led by President Trump, Vice President Pence, White House officials, and political appointees at the Department of Health and Human Services (HHS) and other agencies. The apparent goal of this unprecedented, coordinated attack on our nation’s public health agencies during the pandemic was, in the President’s words, to “play it down.”


Threshold Concepts for Government Information

There’s an interesting thread on the govdoc-l listserv regarding “threshold concepts” for government information — core concepts which, once understood, “transform perception of a given subject, phenomenon, or experience” or as someone on govdoc-l stated “concepts that are so foundational that people immersed in a discipline take them for granted.” We think some of the issues being discussed there can benefit from making explicit some of those foundational ideas of govinfo that we probably all hold in common but that we rarely articulate or discuss. Here are seven of them:

1. Public Information.

The U.S.Code and the Office of Management and Budget define different categories of government information. Perhaps most familiar to government information specialists are the categories of "records" and "publications." But these are just two of six categories — each one narrower than the one above it. The six categories are defined in OMB Circular A-130 (pp. 26‑37).

…… Federal Information
………… Records
……………… Public Information
…………………… Information dissemination product
………………………… Government Publication

In discussing threshold concepts, we believe it is essential to have a clear understanding of this hierarchy of information and the difference between levels.

As we have suggested before we believe that the most appropriate of those levels when discussing library policies is that of "Public Information," as defined in Chapter 35 of Title 44:

The term "public information" means any information, regardless of form or format, that an agency discloses, disseminates, or makes available to the public.

2. Information vs. Information Services.

While it is certainly true in the digital age that the federal government provides access online to much of its public information and "organizes" it for access and use, we should understand that there is a difference between such services and the actual content provided by those services. These days, it is common to speak of the services as "e-government."

E-government is a service. It is like the gate at a national park. The park is a resource and the gate is a service that protects the resource and provides access to it — but is not the resource itself. Agencies keep their resources in silos and decide how to organize and present that information through their websites.

Public information is a resource, like a national park, and government websites are like the gates at national parks. When the government controls the resource by keeping it in its own information silos and allowing access only through its gates, the government controls what we can use. We lose access to the information during a government shutdown or when an agency takes information offline. Agencies can alter and move information or impose fees or restrictions on access when they control the only copy of the information resource. And no agency can guarantee that its siloing or its organization and presentation of its content will meet the needs of every community of users, or of user communities in the future.

We have analyzed this issue in more detail here: Information is not a Service, Service is not Information and here: FDLP: Services and Collections.

3. Public Information is essential to democracy.

In order for a democracy to function, it is essential for citizens to have an accurate record of its government including authentic government records of its actions and the data it collects, creates, and uses.

Familiar examples include Congressional debates and hearings, laws and regulations (e.g., USC, CFR), official statistics (e.g. GDP, CPI, censuses, surveys), judicial hearings and decisions, administrative records (e.g. aggregations of state records of births, deaths, marriages, crime, health, etc.), position statements, policies, press releases, and transcripts of press conferences.

Such records need to be accessible to the public in order for citizens to be able to hold government accountable. Citizens also need these records to be preserved over time so that they can have an accurate record of the history of government actions, changes in policy, and the data government use to determine those policies and actions.

Such records need to be preserved and accessible in context. For example, a record of a single speech in Congress needs to be preserved in the context of all speeches in Congress; a single law must be preserved in the context of all laws and regulations; the census of the population of a city needs to be preserved in the context of its populations in previous censuses and in the context of the censuses of other cities.

"Context" also includes the methodologies used to measure, collect, aggregate and present raw information. For example, it is important that The Congressional Record makes clear that it is only a "substantially verbatim" record and that Members may revise and extend their remarks after the fact, before they are published.

And it is essential for methodologies used to create economic indicators such as GDP an CPI be part of the record of those indicators.

4. Accuracy.

The records of government must be accurately preserved without alteration and must be accessible in such a way as to assure users that those records are authentic and complete.

Note that government records may contain inaccuracies and both those inaccuracies as well as any corrections to those inaccuracies must be accurately preserved. The official record of a government is the record of "what the government knew" and what officials said at any given point in time. It is only by preserving this record that citizens can hold agencies and officials accountable.

There are many controversies over the methodologies used to create government statistics. Such controversies (and all methodologies) reflect political assumptions and political goals. Preserving the methodologies, raw data, and published statistical indicators, provides the potential for creating better policies with better data and better indicators.

5. Gray areas of information distribution.

In the digital age, "government information" is available from many sources including non-official sources. While the Congressional Record is the official record of Congress, it is easy to find video recordings of what actually happened on the floor of Congress (without revisions or "extensions of remarks") on C-Span and Twitter.

The most notable example of this gray area in current administration is, of course, tweets by the President under his personal twitter account.

The digital age thus often presents citizens with confusing and contradictory information and presents libraries with complex and difficult policy and collection choices.

6. Misinformation, Disinformation, and Propaganda.

It is essential to have accurate and authentic preservation of Public Information in order to counter misinformation, disinformation, and propaganda whether distributed by government or non-government sources.

We do not have to agree on the accuracy or mendacity of official (and un-official) statements, speeches, comments, tweets, etc. to agree that it is essential that "Public Information" produced and created by government agencies and individual officials be accurately preserved. It is only by doing so that citizens will have an accurate record of policy debates and decisions and can contextualize that record within the wider historical record.

It is only by having an accurate and authentic record of Public Information that the truthfulness and accuracy of the content of those records can be determined. It is only by having such a record of Public Information that agencies and officials can be held accountable for their claims, policies, and actions.

7. Role of libraries.

Libraries should accept, continue and maintain their traditional role of ensuring long-term access to Public Information for their own user-communities because government agencies cannot and are not doing so.

Preservation. Although the federal government makes Public Information available through e-government information services, very few government agencies have legal mandates, or budgets, or policies for preserving that information or for providing long-term, free public access to it. Most Public Information is not being adequately preserved or curated. Libraries need to continue their critical societal role in curating government information. They can do this by building their own digital collections of government Public Information.

Access. As noted above, no agency can guarantee that its e-government information service will meet the needs of every community of users. Libraries can address the access and use-needs of specific communities-of-interest, while government agencies attempt to address only a broad, monolithic public.

Service. When libraries build services for their user communities based on digital collections that they acquire and control, they will be able to combine non-government information with Public Information from different agency silos to create unique user-experiences that no government agency can. By addressing the needs of their specific user-communities, libraries can provide more services and better services than any government agency can provide. These services can include both traditional reference services provided by subjects specialists as well as online digital services. We have written more about this idea here: Building a Collaborative FDLP.

James A. Jacobs, University of California San Diego
James R. Jacobs, Stanford University

New year’s resolutions for 2020: setting a new agenda for a new FDLP

Happy 2020! Now that we’re starting a new decade(!) — and GPO has set up a working group to study and consider digital deposit and Depository Library Council (DLC) will soon announce its PURL working group! — it is time for FGI to make its new year’s resolutions and envision a new agenda for a new Federal Depository Library Program (FDLP). This new digital FDLP will focus on the digital needs of users by building digital services based on digital collections. It will lead the way for libraries of all kinds, showing the value of digital libraries in the twenty-first century.


We recognize that (more…)