Home » Posts tagged 'Digital collection development'
Tag Archives: Digital collection development
- Four easy steps to reporting “unreported” publications
- Strategies for finding “unreported” documents (more tips and tricks!)
- Historically “Unreported” materials of particular interest
- History of the problem
- Appendix: how to fill out the askGPO form
“Unreported” publications (which were, until recently, called “fugitive” publications) are those that are within scope of the Federal Depository Library Program (FDLP) but for various reasons have slipped through the cracks and not been collected and cataloged by the Government Publishing Office (GPO), distributed to FDLP libraries, or included in the “National Collection” (See a partial list of historically “unreported” publications below).
We here at FGI consider “unreported” publications as the paramount problem facing the FDLP today. FDLP librarians, with their critical information skills and expertise about the structure and publishing activities of the federal government, are a vital piece of the solution to this vexing problem. The National Collection is at the core of what FDLP libraries have done for the last 200+ years, so “unreported” publications erode that very foundation. During the spring 2021 virtual Depository Library Conference, I challenged every FDLP librarian to search for, find, and report to GPO five “unreported” documents every month. I’d like to reiterate that challenge here on FGI. If every one of the 1100+ FDLP librarians were to find and report 5 documents each month, through this iterative process we’d soon put a dent in this existential “unreported” documents problem.
To that end, we’d like to share some simple steps for how to find and report “unreported” documents to GPO:
- find an interesting federal document or information product like a report, data set, video, or slide deck (see the “strategies” section below for tips and tricks for finding documents);
- Search the Catalog of Government Publications (CGP) to see if GPO has cataloged it;
- If it’s NOT in the CGP, go to askGPO and fill in the “unreported document” form. See appendix for how to fill out the askGPO form;
- Rinse and repeat!
- Read the news with an eye toward those news items and sources which cover federal policies; (See for example, https://federalnewsnetwork.com, https://www.govexec.com, https://www.washingtonpost.com, etc.)
- Set up Google search and news alerts for publications from your favorite agency(ies), especially the Inspector Generals’ offices of those agencies (Inspector General reports are an especially critical and long-standing type of “unreported” document! Only a portion are even posted publicly on Oversight.gov);
- Find and report documents you use to answer reference/research consultations;
- Bookmark and visit the publications- and/or press release page of your favorite agency(ies);
- Follow on social media your favorite agency(ies), heads of agencies, your state’s Congressional delegation, known people within the executive branch, and Federal watchdog groups. New publications are often announced on government social media accounts.
- Agency Inspector General reports;
- Executive branch agency publications. See the LostDocs project for examples of documents that have been reported to GPO;
- Communication/Letters from members of Congress to executive branch agencies;
- Communication/Letters from federal officials to a Presidential administration;
- Public datasets;
- Congressional Research Service (CRS) reports* (*CRS reports were, until 2018, considered “privileged communication” between Congress and the Library of Congress and were therefore never released via the FDLP. Here’s the back story).
Since 1813 when the FDLP started, there have always been “unreported” documents which slipped through the cracks and were lost to the sands of time (until very recently, these were termed “fugitive” documents) [Footnote 1]. This problem has grown exponentially as executive agencies’ publishing operations have exploded, now that they can easily and freely distribute content online, and very few if any of them follow Title 44 regulations and send their documents to GPO as they are required to by law. Only a minuscule fraction of born-digital executive branch information is cataloged in the Catalog of Government Publications (CGP) or makes it into the “National Collection.” This means that every year, thousands — if not hundreds of thousands! — of Federal documents, datasets, maps, and other born-digital materials [Footnote 2] — are never preserved and are lost to the fog of history as websites are updated and historical content removed [Footnote 3].
Depository librarians reporting found publications are a critical part of a holistic solution to the “unreported” documents problem. By identifying federal information resources that are important to their local constituents, librarians are making sure that these documents will be cataloged, captured, and made accessible to a wider audience. Reporting documents also adds to a National Collection pipeline for long-term access and helps to make sure that what is collected and preserved reflects the needs and interests of the wide-ranging communities and the public which libraries serve.
Many hands make light work. Won’t you join in the effort? Please contact us if you have questions or comments at freegovinfo AT gmail DOT com.
1. See “‘Issued for Gratuitous Distribution:’ The History of Fugitive Documents and the FDLP.” James R. Jacobs. Article in special issue of Against the Grain: “Ensuring Access to Government Information”, 29(6) December 2017/January 2018.
2. My back of the napkin estimate is that well over 1/2 of the “National Collection” is unreported! The executive branch is far and away the largest portion of the National Collection, and is almost completely “unreported.” See slide 5 of my 2018 Canadian Govinfo presentation for some context. Jim Jacobs’ chart cites the 2008 End of Term crawl for context on how many born-digital government publications are on the Web. The 2016 End of Term crawl nearly doubled the 2008 crawl and went from 160 million URLs to 310 million URLs harvested. I expect the 2020 End of Term crawl happening at the time of this post’s publication to far surpass 310 million!
3. FGI has written about “link rot,” “content drift,” and other issues which make it difficult to collect and preserve born-digital information.
The AskGPO form can be used for single documents or for reporting multiple documents, for example, those listed on an agency’s publications index page. See below for the steps to filling out the askGPO form. If a site is extremely large and/or complex (eg., the Office of the Director of National Intelligence (ODNI) reports site) send the URL and description of the site to the GPO Web archiving team at FDLPwebarchiving AT gpo DOT gov.
- Log in to ask.gpo.gov (This will automatically fill in your contact information and depository library number in the form if you have used the system before);
- Click on “Federal Depository Library Program”;
- Select category “Fugitive Publications” (which will soon be changed to “unreported publications”);
- Choose single publication or multiple publications (there’s an excel template if you prefer to collect multiple documents and submit them all at once!);
- Enter title, publishing agency, publication URL, format (other fields are not required). Use your best guess if you are not sure;
- Upload PDF file as attachment (not required but helpful for GPO staff to have the document “in hand” when cataloging);
- Add any additional context that you think may aid GPO staff;
- Do the reCAPTCHA “I’m not a robot” test;
- Submit the document(s)!
The Government Publishing Office (GPO) recently released its updated document entitled GPO’s System of Online Access: Collection Development Plan (here are the 2016 and 2018 Plans for comparison) which is “revised annually to reflect content added to govinfo in the preceding fiscal year, in-process titles, and current priorities.” The Plan explains GPO’s designated communities for govinfo, the broad content areas that fall within scope of govinfo, and the various codes — basically Title 44 of the US Code and Superintendent of Documents policies (SODs) — which undergird GPO’s collection development activities. While there is no mention in this document of the “National Collection”, it describes the three major pillars of GPO’s permanent public access efforts as govinfo, the FDLP, and the Cataloging & Indexing program (which produces the bibliographic records for the Catalog of Government Publications (CGP)).
The central part of the Plan is where GPO defines the govinfo collection depth level — defined in Appendix A of the Plan as collection levels modified from the Research Libraries Group (RLG) Conspectus collection depth levels and going from Comprehensive, Research, Study or Instructional Support, Basic, Minimal, to Out of Scope — of the various public information products of the legislative, executive, and judicial branches of the US government.
MSU scholars find $21 trillion in unauthorized government spending. Agencies disable links to key documents
I visited the MSU Today news site for the headline about massive unauthorized spending happening at the Department of Defense and Housing and Urban Development. That in and of itself was troubling. But what really drew my attention was in the 2nd paragraph where it stated that the agencies’ Inspector Generals(!) — which are supposed to be the watchdogs of their agencies! — had “disabl[ed] the links to all key documents showing the unsupported spending” and the parenthetical note about the researchers having downloaded and saved their documents locally. Read more of the story at USAWatchdog.
This is the reason why libraries need to get on the ball and become active in digital collection development. Professor Skidmore luckily downloaded and made the documents available. But as long as govt publications are only available on .gov websites and Title 44 regulations for executive agencies to make their documents available to the FDLP are ignored by agencies and the OMB, then this kind of thing will continue to happen. Whether it’s 1 document or 100TB of data, FDLP libraries owe it to themselves and their local communities to do this kind of work. I’m now mulling about how to best provide space for the documents that my library’s researchers download to do their research.
Earlier this year, a Michigan State University economist, working with graduate students and a former government official, found $21 trillion in unauthorized spending in the departments of Defense and Housing and Urban Development for the years 1998-2015.
The work of Mark Skidmore and his team, which included digging into government websites and repeated queries to U.S. agencies that went unanswered, coincided with the Office of Inspector General, at one point, disabling the links to all key documents showing the unsupported spending. (Luckily, the researchers downloaded and stored the documents.)
We here at FGI have been making the argument against the destruction of physical collections in connection with digitization efforts for a long time (see e.g., Wait! Don’t Digitize and Discard! A White Paper on ALA COL Discussion Issue #1a and What You Need to Know about the New Discard Policy). So it’s nice to hear the same argument from Jeff MacKie-Mason, recently hired University Librarian and Chief Digital Scholarship Officer at UC Berkeley on his blog madLibbing: Muddling Along in the Information Age. Mackie-Mason clearly and succinctly points out the reasons that libraries still need physical collections: many digitized works are still in copyright and their digital surrogates are therefore not shareable online, print copies are easier to read with higher comprehension rates, there is “little or no confidence that we can guarantee long-term digital preservation” (emphasis his!), and current digital surrogates from large digitization projects are less than complete (we’ve pointed this out repeatedly e.g., in “‘An alarmingly casual indifference to accuracy and authenticity.’ What we know about digital surrogates.”). So we hope the next time your library weeds a government document under the assumption that it’s online, you’ll check the digital surrogate for completeness and at least start the discussion with your administrators about the need for a local digital archive to assure the preservation of the digital surrogate that you’re about to weed. It could mean the difference between access and frustration for your user community.
One huge misconception we face is that digitizing our collections means we don’t need the print anymore. For example, we are participants in the Google Books / HathiTrust project, and most of our 11 million regular volumes have been digitized. Why not burn our print copies?
- For starters, about half of the collection is still in copyright. The HathiTrust collection can be searched, full-text, to find the existence of books, but we are not allowed to let people use the digital copy (with limited exceptions, e.g., for the blind, who can listen to a text-to-voice conversion). Decades before this need for our print copies goes away.
- Second, we are here not to build collections for their own sake, but to serve our faculty and students. And many of them vastly prefer doing their work from print copies. Those who read long monographs find it easier and their comprehension higher. Those who need to study large images or maps, in high resolution, or who want to see side-by-side page comparisons, need the print. And for many rare and historical documents, the materiality of the original document itself is of enormous importance for scholarship, from the marginal annotations to the construction of the volume.
- Next, we can have little or no confidence that we can guarantee long-term digital preservation. Digital storage has been around a relatively short time In that time, formats change frequently. Hardware and software to render digital formats changes. Bits on storage media rot. Keeping bits and being able to find and access them in the future requires large annual expenditures, and those expenditures are getting larger as the amount of content we want to preserve grows enormously fast. Further, much of scholarly content currently is held on servers of for-profit companies, and we have no guarantee those companies will survive, or that they will take care to ensure that their archives of scholarly publications survive.
- The Google project has been very good, but it is not complete. It does not scan fold-out pages, for example, which are in many scholarly books (maps, charts, tables). We have discovered that sometimes they miss pages, or the quality is not readable.
So, for now, there is pretty much consensus among research scholars and librarians that we must keep print copies for preservation in all cases, and for continuing use in many cases.
Here’s a good piece in the Boston Globe, “The race to preserve disappearing data”. While primarily focusing on the film industry, it also mentions link rot, disappearing government information in the form of Supreme Court decisions and other issues on which government information librarians should be working. I’ve said it often and I’ll say it again, when documents librarians focus on digitizing historic government publications, they ignore the far greater danger of the disappearance of born-digital government information. We need the entire documents community to step up and work on the issue of born-digital collection development lest we risk becoming a “digital dark age.”
The problem of preservation is not unique to the film industry. It spans the digital artifacts of our age — from photos to music to scientific research data. One study of more than 500 biology papers published from a 20-year span found that as time passes, less original research data can be found; it suggested that up to 80 percent of raw data collected for studies in the early 1990s is lost. A crucial virtue of science is that researchers can reproduce findings or correct them over time by reevaluating original data. Fields from epidemiology to education to climate change require records that span decades or longer.
Lost data also plagues the legal world. A 2013 study of Supreme Court decisions by Harvard University Law School professors found that so-called link rot is eroding intellectual foundations of legal scholarship: Nearly half of all Supreme Court decisions up to that date and more than 70 percent of law journals from 1999 to 2012 referred to Web pages that no longer existed…
…What was once a race to rescue information from going-extinct media (think of old files trapped on floppy disks) has morphed into a mounting need to copy and curate massive troves of data, says Dr. David Rosenthal, the founder of a library-led digital preservation network run out of the Stanford University libraries. Digital information decays over time and files grow corrupt from “bit rot,” which Rosenthal says is best fended off by creating copies of data in multiple virtual and physical locations…
…“Digital preservation is essentially a hot potato problem, where everyone wants to pass responsibility onward,” said Berman, also a professor of computer science at Rensselaer Polytechnic Institute. She notes that in the private sector, companies invest in preserving data that give them a competitive advantage. The larger challenge is preserving those digital artifacts that have broad societal relevance for the future, but no urgent private interest.
Publicly funded archives such as the National Archives and those supported by federal R&D agencies fulfill only a fraction of the preservation needed to pass on society’s knowledge to the future. Less than 1 percent of the Library of Congress’s 1.4 million archived videos and film reels were born digital. While the Library of Congress can preserve digital films if filmmakers share their unencrypted files, less than a dozen filmmakers and studios have done so, and the library has yet to preserve a single born-digital feature-length film.