Home » Posts tagged 'digital preservation'

Tag Archives: digital preservation

Our mission

Free Government Information (FGI) is a place for initiating dialogue and building consensus among the various players (libraries, government agencies, non-profit organizations, researchers, journalists, etc.) who have a stake in the preservation of and perpetual free access to government information. FGI promotes free government information through collaboration, education, advocacy and research.

Happy 16th Birthday to the LOCKSS-USDOCS network!

Wow it’s hard to believe that the LOCKSS-USDOCS network of over 30 libraries has been up and running for 16 years! We can finally get our driver’s permit 🙂 LOCKSS-USDOCS harvests all of the content currently published on the Government Publishing Office (GPO)’s GOVINFO content management system and includes the most recently created collection of Congressionally Mandated reports.

If you and your library are interested in participating in this collaborative digital preservation project of the digital FDLP(!), please contact James R. Jacobs at jrjacobs AT Stanford DOT edu.

The USDocs Private LOCKSS Network (PLN) was launched in 2008 as a digital repository for the U.S. Federal Depository Library Program (FDLP), a network of over 1,100 participating libraries across the United States. The FDLP has for over 200 years ensured the safeguarding of documents published by the U.S. Federal Government through the same “lots of copies keep stuff safe” principle that now drives LOCKSS digital preservation networks. FDLP libraries select a basic subset of federal reports and documents, with individual libraries providing a wide range of additional specialized services and collections…

…At the heart of the FDLP lies the idea that libraries across the United States are not just passive repositories but active stewards of these vital materials. They dedicate themselves to preserving content for the good of society and to safeguarding democracy. FDLP libraries are driven by a mission: to provide free public access to official information, thereby empowering citizens with knowledge and fostering transparency. As the FDLP becomes increasingly digital, the GPO will print and distribute a decreasing number of titles in hard copy to just a few libraries in each of the four National Collection Service Areas (NCSAs), and the USDocs PLN will be collecting and preserving a growing percentage of the U.S. Federal Government’s output.

DttP student article re SIGAR and the tenuous nature of born-digital preservation

The Fall, 2023 issue of Documents to the People (DttP) just came out. This issue is always interesting because it includes a section of MLIS student submissions. This time around was no different. An article by Miguel Beltran, a grad student at University of IL at Urbana Champaign (which also happens to be my alma mater!) caught my attention because it was on a subject that FGI has long written about: the exigency of born-digital preservation of government information.

Citation: Lessons Learned in Born-Digital Preservation. Miguel Beltran. Documents to the People (DttP), Fall, 2023. DOI: https://doi.org/10.5860/dttp.v51i3.8124.

Beltran’s insightful analysis revolves around the documents of the Special Inspector General for Afghanistan Reconstruction (SIGAR) and an investigative report by the Washington Post entitled “At war with the truth.” Beltran points to the BIG ELEPHANT in the FDLP room: the processes, workflows, and infrastructures needed to curate (collect, preserve, and give long-term access) government information are not currently in place and that “clear strategies and widespread collaboration are necessary to preserve government information on these mediums.”

As more government documents are created in digital mediums, it is increasingly important that agencies could preserve and make them available to the public. This article discusses one group of government documents related to the war in Afghanistan and the
landscape that would potentially preserve them. Based on the current conditions, there is a possibility that these documents and those of a similar nature may be overlooked and lost to future generations.

I checked the Catalog of Government Publications (CGP) for author: “United States. Office of the Special Inspector General for Afghanistan Reconstruction” and the newest SIGAR report there is from May of 2022. Herein lies the problem as Beltran notes. Without a agreement in place between SIGAR and GPO, many of this agency’s reports will fall through the cracks and not be cataloged for the National Collection or actively preserved. The main SIGAR site has been harvested by the Internet Archive many times since 2009 (but the reports page and its corresponding RSS feed have been collected far fewer times since only 2015, at very random intervals, and NOT by GPO!). That means that, though the SIGAR site is in the wayback machine, the reports from this agency are not necessarily even in wayback and certainly NOT in GPO’s FDLP web archive.

Therefore, the ONLY way to assure that these born-digital documents are curated is to go through the list one-by-one in a brute force kind of way to check to see if they’ve been cataloged in CGP and then report them as “unreported” documents to GPO. So that’s what I’m going to do 🙂

Thanks again to Miguel Beltran for again raising the important issue of born-digital preservation. Have you reported a document to GPO today? I challenge all of my FDLP colleagues around the country to report 5 documents per week to GPO. Together we can fill some of the cracks that are currently in the National Collection.

Happy 2023! The state of government information in 2023

Happy new year 2023! We hope all our readers had a relaxing holiday break and are ready to get back to the important work of preserving government information and assuring its long-term access!

In the latest First Branch Forecast — you really should subscribe to this important newsletter if you haven’t already! — a side comment about the findings of the January 6th Committee caught our attention. In discussing the release of the COmmittee’s final reports, as well as the many witness transcripts, Daniel Schuman noted “We’re linking to the PDF on the Wayback Machine because the Committee’s website will be toast in early January.”

This is the troubling reality we find ourselves in. Digital government information turns out to be extremely fragile and reliant on the political winds of Washington DC. The Government Publishing Office (GPO) captured the committee’s final report and various hearings (though NOT the various witness testimony transcripts that the committee has released to its website (of which I’m also linking to the Wayback Machine!)), the final report has already been published by a private company (in this case the New Yorker and Celadon Books), and the report will no doubt be be saved by Library of Congress, NARA, and various libraries around the country. But each of those will have their own URL rather than the official URL from the actual committee that did the work. It would be amazing if there were a system of permanent URLs (called PIDs) that stay permanent and point to all the copies in the same way that DOIs work for journal articles. I and many of my depository library colleagues are working hard on putting a system like this in place for US government information. It was one of FGI’s resolutions for 2020 and I’ve been busy working on the Depository Library Council (DLC) Working Group Exploring the Durability of PURLs and Their Alternatives (charge). The working group is finishing up its work and will soon release its final report and recommendations.

Let’s hope that 2023 is the year that electronic government information is collected, preserved, and made easily accessible for the public!

Government recommendations to preserve government information not preserved by government

James and I are writing a book on preserving government information. In the course of researching the book, we find ourselves hunting down government publications that we need but that are not available from the government or from any FDLP library. Each of these documents has its own explanation for why it is missing and each explanation tells a story about the gaps in preservation of government information.

This is one of those stories. Think of this as a long footnote to a future book.

In 2002, Congress established the Interagency Committee on Government Information (ICGI). One of its charges (more…)

EDGI Releases Dataset of Federal Environmental Website Changes Under Trump

Thanks to the Environmental Data and Governance Initiative (EDGI) for releasing the Federal Environmental Web Tracker. This tool is a public dataset of searchable records of approximately 1,500 significant changes to federal agency environmental webpages under the Trump administration, these changes were almost always precursors or responses to policy changes. These changes came from a “list of 25,000 federal Web pages related to climate, energy, and the environment, including pages for 20 federal agencies such as EPA, NOAA, and NASA.” Here’s the Tracker’s explanatory page for more context and background.

EDGI continues to do important work in tracking the federal .gov Web domain. EDGI’s work goes hand in hand with the work of the End of Term Web Archive which has harvested the .gov/.mil Web space every 4 years since 2008 and is now deep into its 2020 harvest. And we’re still accepting nominations, so go to the End of Term Nomination Tool hosted by the University of North Texas (UNT) library. Help us collect a snapshot of the federal Web domain!

Today, the Environmental Data & Governance Initiative (EDGI) publishes searchable records of approximately 1,500 changes to federal agency environmental webpages under the Trump administration. For four years, EDGI’s website monitoring team has identified and catalogued significant changes to federal websites using their open source monitoring software. EDGI’s Federal Environmental Web Tracker makes records of significant changes publicly available.

The information that’s available on federal websites can have important policy implications. As EDGI has often reported over the past four years, changes to the information that’s available on federal websites are almost always precursors or responses to policy changes. Federal websites provide information that the public is likely to access before commenting on a proposed rule to learn about current regulatory efforts, the science underlying a new policy decision, or likely impacts of a proposed rule. The information found (or not found) on a federal website can impact public participation in regulatory processes.

In the weeks after Trump’s election in November 2016, newly-formed EDGI compiled a list of 25,000 federal web pages related to climate, energy, and the environment, including pages for 20 federal agencies such as EPA, NOAA, and NASA. First using proprietary software and then building and using novel open source software, EDGI has compared versions of these web pages weekly since January 2017. This new dataset represents the documented changes that EDGI’s website monitoring team flagged as significant in some way over the past four years.

EDGI’s Federal Environmental Web Tracker gives journalists, academic researchers, and the public data that can be used to provide insight, documentation, and analysis of the information policies and priorities of the Trump administration.

The Federal Environmental Web Tracker will be updated quarterly as EDGI continues to monitor federal environmental websites.

HT to InfoDocket!

Archives