Home » Posts tagged 'future of government information in libraries'
Tag Archives: future of government information in libraries
Twitter and newspapers are buzzing with complaints about widespread problems with access to government information and data (see for example, Wall Street Journal (paywall 😐 ), ZDNet News, Pew Center, Washington Post, Scientific American, TheVerge, and FedScoop to name but a few).
Maybe when/if the government opens again, we should scrape the NIST and CSRC websites, put all those publications somewhere public. It’s worrying that *every single US cryptography standard* is now unavailable to practitioners.
— Matthew Green (@matthew_d_green) January 12, 2019
Matthew Green, a professor at Johns Hopkins, said “It’s worrying that every single US cryptography standard is now unavailable to practitioners.” He was responding to the fact that he could not get the documents he needed from the National Institute of Standards and Technology (NIST) or its branch, the Computer Security Resource Center (CSRC). The government shutdown is the direct cause of these problems.
Others who noticed the same problem started chiming in to the discussion Green started, noting that they couldn’t find the standards they needed in Google’s cache or the Wayback machine, either. Someone else suggested that “Such documents should be distributed to multiple free and public repositories” and said that “These documents are “Too important to have subject to a single point of failure.” Someone else said that he downloads personal copies of the documents he needs every month, but had missed one that he uses “somewhat often.” One lone voice wondered about “Federal Depository Libraries, of which I believe there is at least one in every state.” (James responded to that one, letting people know about the FDLP and End of Term crawl!)
There are at least two reasons why users cannot get the documents they need from government servers during the shutdown. In some cases, agencies have apparently shut off access to their documents. (This is the case for both NIST and CSRC.) In other cases, the security certificates of websites have expired — with no agency employees to renew them! — leaving whole websites either insecure or unavailable or both.
Regardless of who you (or your user communities) blame for the shutdown itself, this loss of access was entirely foreseeable and avoidable. It was foreseeable because it has happened before. It was avoidable because libraries can select, acquire, organize, and preserve these documents and provide access to them and services for them whether the government is open or shut-down.
Some libraries probably do have some of these documents. But too many libraries have chosen to adopt a new model of “services without collections.” GPO proudly promotes this model as “All or Mostly Online Federal Depository Libraries.” GPO itself is affected by this model. Almost 20% of the PURLs in CGP point to content on non-GPO government servers. So, even though GPO’s govinfo database and catalog of government publications (CGP) may still be up and running, during the shut-down GPO cannot ensure that all its “Permanent URLs” (PURLs) will work.
This no-collections-model means that libraries are too often choosing simply to point to collections over which they have no control — and we’ve known what happens “When we depend on pointing instead of collecting” for quite some time. When those collections go offline and users lose access, users begin to wonder why someone hasn’t foreseen this problem and put “all those publications somewhere public.”
The gap between what libraries could do to prevent the kind of loss of access the shutdown is causing and what they are doing is particularly notorious in the area of government information. Most federal government information is in the public domain and is available without technical or copyright restrictions or fees. There is nothing preventing libraries from building collections to support users except the will to do so.
Many library administrators are eager to proclaim that pointing to collections they do not control is the new role of libraries in the digital age. Those who promote this new model of services without collections then struggle to demonstrate the value of libraries to their user communities. This is difficult when those communities go directly to collections of information, bypassing libraries and, perhaps, wondering why libraries still exist at all.
This represents a failure by libraries to fulfill their role in society and in the digital information ecosystem.
When the shutdown ends, access will, presumably, be restored. In the wake of the many other problems caused by the shutdown (many of them immediate and even dangerous), this temporary loss of access to some government information may not seem pressing. But librarians should see this as another wake-up call. Hopefully, Depository Library Council’s recent recommendation regarding digital deposit will answer that call. Libraries should not focus on bemoaning the short-term problem. We should, instead, focus on making the next crisis impossible. We can do this by focusing on the long-term problems of digital collection development, preservation and access. The current crisis may be temporary, but when we rely only on the government to provide access to these important resources, access will remain vulnerable to the next crisis or misstep or conscious decision to cut off access. We need to recognize that government agencies do not always have the same priorities as our users.
Today, libraries cannot ensure long-term access to government information because they do not control it. But, if libraries select, acquire, organize, and preserve the government information that is vital to their user communities, then they can ensure long-term access to it. You will not have to persuade your users of the value of your library when you do what they value.
James A. Jacobs, University of California San Diego
James R. Jacobs, Stanford University
[UPDATE 1:30pm 09122018: The bill going forward in the Senate is S. 2944, NOT 2673. And S.2944 includes reference to the depository library program! I’ve updated the link below to the correct Senate bill. JRJ]
Heads up! There’s a bill at the beginning of the legislative process called “Preventing Additional Printing of Electronic Records Act of 2018″ or the PAPER Act of 2018. Don’t you just love how Congress has to acronymize their bill titles?! This bill seeks to limit the printing of the Congressional Record, one of our most important Congressional publications, the official record of the proceedings and debates of the US Congress. It’s important to the Federal Depository Library Program to keep publishing the CR in paper for research utility and preservation purposes.
The House version mentions the FDLP, but the Senate version does not:
(d) Depository libraries
The Director of the Government Publishing Office shall furnish to the Superintendent of Documents as many daily and bound copies of the Congressional Record as may be required for distribution to depository libraries.
This bill is at the very beginning of the process, so it’s not time to get nervous. But the depository community ought to keep an eye on this bill in case it gathers momentum in the House and/or Senate.
In a recent post on the blog of the Web Science and Digital Libraries Research Group, Shawn Jones reports on research that is vital to all those interested in long term access to government information.
- How well are the National Guideline Clearinghouse and the National Quality Measures Clearinghouse Archived? Shawn M. Jones, Web Science and Digital Libraries Research Group (July 15, 2018).
In the post, Jones reports on his research into how much of the content of two sites (more…)
Here’s a a thought-provoking tweet forwarded to me by Jonathan Petters. Random twitter user Kenny Jacoby found a giant 1,200 page document that he wanted to use — not read! So what did he do? He banged on it for 8 hours in order to extract all the data from the PDF and convert it to a structured dataset. Good on him for doing that, but how many people have the skills and time to be able to do that?
And this, kind readers, is a perfect example of what we’ve been writing about here at FGI for some time. In the age of near-ubiquitous online access, it’s not enough for governments to publish PDFs, they need to provide more and better access for both humans and machines.
Information must be:
not just preserved, but discoverable [2.2.2]
not just discoverable, but deliverable [2.3.3]
not just deliverable as bits, but readable [2.2.1]
not just readable, but understandable [2.2.1]
not just understandable, but usable [184.108.40.206]
(*numbers in brackets refer to sections of the OAIS standard.)
This is a nut that our friends at the Congressional Data Coalition are trying to crack. And some federal agencies are wading into this space as well — see for example Data.gov. But there are still far too many examples of this data/publication divide. It’s going to take a concerted effort by the public, watchdog groups like Sunlight Foundation and OpenTheGovernment along with librarians to push for this change in the way we think about government information.
Spent about 8 hours writing and debugging code to scrape a hideous 1,200-page PDF into a structured dataset because a public agency refused to give up its raw data.
Don’t mess with me. pic.twitter.com/Eiz1SZl3Xx
— Kenny Jacoby (@kennyjacoby) May 28, 2018
Please join the PEGI Project for their May webinar. There’s a great list of speakers who will be talking about various efforts and projects to identify, collect, and preserve born-digital government information. Please RSVP and forward on to any of your colleagues and networks who may be interested. See you there!
Please join the PEGI project for a webinar on Monday, May 14th, 2018 at 12:00pm EDT to hear directly from trailblazing organizations about projects underway to identify, collect, and preserve born-digital government information. Leading figures from these organizations will be on hand to discuss the advocacy and coordination necessary to make an impact, and they can answer your questions about more ways to contribute to national efforts at a local level.
To hear about the current state of preservation efforts and contribute your ideas and priorities, please RSVP at the following link: http://bit.ly/PEGIMayWebinarRSVP.
Heather Joseph, Executive Director, SPARC
Brandon Locke, Director of LEADR at Michigan State University & Founder & co-organizer of Endangered Data Week
Rachel Mattson, Curator of the Tretter Collection for GLBT Studies at the University of Minnesota Libraries & Founder/co-leader of the Digital Library Federation’s interest group on Government Records Transparency & Accountability
Bernard F. Reilly, President, Center for Research Libraries
Justin Schell, Director, Shapiro Design Lab & Member of EDGI (Environmental Data & Governance Initiative)
Bethany Wiggin, Founding Director, Penn Program in Environmental Humanities (PPEH)
Shari Laster, PEGI Project Steering Committee
If you have any questions or comments, please direct them to [email protected]