Home » Posts tagged 'Digital collection development'
Tag Archives: Digital collection development
The Government Publishing Office (GPO) recently released its updated document entitled GPO’s System of Online Access: Collection Development Plan (here are the 2016 and 2018 Plans for comparison) which is “revised annually to reflect content added to govinfo in the preceding fiscal year, in-process titles, and current priorities.” The Plan explains GPO’s designated communities for govinfo, the broad content areas that fall within scope of govinfo, and the various codes — basically Title 44 of the US Code and Superintendent of Documents policies (SODs) — which undergird GPO’s collection development activities. While there is no mention in this document of the “National Collection”, it describes the three major pillars of GPO’s permanent public access efforts as govinfo, the FDLP, and the Cataloging & Indexing program (which produces the bibliographic records for the Catalog of Government Publications (CGP)).
The central part of the Plan is where GPO defines the govinfo collection depth level — defined in Appendix A of the Plan as collection levels modified from the Research Libraries Group (RLG) Conspectus collection depth levels and going from Comprehensive, Research, Study or Instructional Support, Basic, Minimal, to Out of Scope — of the various public information products of the legislative, executive, and judicial branches of the US government.
MSU scholars find $21 trillion in unauthorized government spending. Agencies disable links to key documents
I visited the MSU Today news site for the headline about massive unauthorized spending happening at the Department of Defense and Housing and Urban Development. That in and of itself was troubling. But what really drew my attention was in the 2nd paragraph where it stated that the agencies’ Inspector Generals(!) — which are supposed to be the watchdogs of their agencies! — had “disabl[ed] the links to all key documents showing the unsupported spending” and the parenthetical note about the researchers having downloaded and saved their documents locally. Read more of the story at USAWatchdog.
This is the reason why libraries need to get on the ball and become active in digital collection development. Professor Skidmore luckily downloaded and made the documents available. But as long as govt publications are only available on .gov websites and Title 44 regulations for executive agencies to make their documents available to the FDLP are ignored by agencies and the OMB, then this kind of thing will continue to happen. Whether it’s 1 document or 100TB of data, FDLP libraries owe it to themselves and their local communities to do this kind of work. I’m now mulling about how to best provide space for the documents that my library’s researchers download to do their research.
Earlier this year, a Michigan State University economist, working with graduate students and a former government official, found $21 trillion in unauthorized spending in the departments of Defense and Housing and Urban Development for the years 1998-2015.
The work of Mark Skidmore and his team, which included digging into government websites and repeated queries to U.S. agencies that went unanswered, coincided with the Office of Inspector General, at one point, disabling the links to all key documents showing the unsupported spending. (Luckily, the researchers downloaded and stored the documents.)
We here at FGI have been making the argument against the destruction of physical collections in connection with digitization efforts for a long time (see e.g., Wait! Don’t Digitize and Discard! A White Paper on ALA COL Discussion Issue #1a and What You Need to Know about the New Discard Policy). So it’s nice to hear the same argument from Jeff MacKie-Mason, recently hired University Librarian and Chief Digital Scholarship Officer at UC Berkeley on his blog madLibbing: Muddling Along in the Information Age. Mackie-Mason clearly and succinctly points out the reasons that libraries still need physical collections: many digitized works are still in copyright and their digital surrogates are therefore not shareable online, print copies are easier to read with higher comprehension rates, there is “little or no confidence that we can guarantee long-term digital preservation” (emphasis his!), and current digital surrogates from large digitization projects are less than complete (we’ve pointed this out repeatedly e.g., in “‘An alarmingly casual indifference to accuracy and authenticity.’ What we know about digital surrogates.”). So we hope the next time your library weeds a government document under the assumption that it’s online, you’ll check the digital surrogate for completeness and at least start the discussion with your administrators about the need for a local digital archive to assure the preservation of the digital surrogate that you’re about to weed. It could mean the difference between access and frustration for your user community.
One huge misconception we face is that digitizing our collections means we don’t need the print anymore. For example, we are participants in the Google Books / HathiTrust project, and most of our 11 million regular volumes have been digitized. Why not burn our print copies?
- For starters, about half of the collection is still in copyright. The HathiTrust collection can be searched, full-text, to find the existence of books, but we are not allowed to let people use the digital copy (with limited exceptions, e.g., for the blind, who can listen to a text-to-voice conversion). Decades before this need for our print copies goes away.
- Second, we are here not to build collections for their own sake, but to serve our faculty and students. And many of them vastly prefer doing their work from print copies. Those who read long monographs find it easier and their comprehension higher. Those who need to study large images or maps, in high resolution, or who want to see side-by-side page comparisons, need the print. And for many rare and historical documents, the materiality of the original document itself is of enormous importance for scholarship, from the marginal annotations to the construction of the volume.
- Next, we can have little or no confidence that we can guarantee long-term digital preservation. Digital storage has been around a relatively short time In that time, formats change frequently. Hardware and software to render digital formats changes. Bits on storage media rot. Keeping bits and being able to find and access them in the future requires large annual expenditures, and those expenditures are getting larger as the amount of content we want to preserve grows enormously fast. Further, much of scholarly content currently is held on servers of for-profit companies, and we have no guarantee those companies will survive, or that they will take care to ensure that their archives of scholarly publications survive.
- The Google project has been very good, but it is not complete. It does not scan fold-out pages, for example, which are in many scholarly books (maps, charts, tables). We have discovered that sometimes they miss pages, or the quality is not readable.
So, for now, there is pretty much consensus among research scholars and librarians that we must keep print copies for preservation in all cases, and for continuing use in many cases.
Here’s a good piece in the Boston Globe, “The race to preserve disappearing data”. While primarily focusing on the film industry, it also mentions link rot, disappearing government information in the form of Supreme Court decisions and other issues on which government information librarians should be working. I’ve said it often and I’ll say it again, when documents librarians focus on digitizing historic government publications, they ignore the far greater danger of the disappearance of born-digital government information. We need the entire documents community to step up and work on the issue of born-digital collection development lest we risk becoming a “digital dark age.”
The problem of preservation is not unique to the film industry. It spans the digital artifacts of our age — from photos to music to scientific research data. One study of more than 500 biology papers published from a 20-year span found that as time passes, less original research data can be found; it suggested that up to 80 percent of raw data collected for studies in the early 1990s is lost. A crucial virtue of science is that researchers can reproduce findings or correct them over time by reevaluating original data. Fields from epidemiology to education to climate change require records that span decades or longer.
Lost data also plagues the legal world. A 2013 study of Supreme Court decisions by Harvard University Law School professors found that so-called link rot is eroding intellectual foundations of legal scholarship: Nearly half of all Supreme Court decisions up to that date and more than 70 percent of law journals from 1999 to 2012 referred to Web pages that no longer existed…
…What was once a race to rescue information from going-extinct media (think of old files trapped on floppy disks) has morphed into a mounting need to copy and curate massive troves of data, says Dr. David Rosenthal, the founder of a library-led digital preservation network run out of the Stanford University libraries. Digital information decays over time and files grow corrupt from “bit rot,” which Rosenthal says is best fended off by creating copies of data in multiple virtual and physical locations…
…“Digital preservation is essentially a hot potato problem, where everyone wants to pass responsibility onward,” said Berman, also a professor of computer science at Rensselaer Polytechnic Institute. She notes that in the private sector, companies invest in preserving data that give them a competitive advantage. The larger challenge is preserving those digital artifacts that have broad societal relevance for the future, but no urgent private interest.
Publicly funded archives such as the National Archives and those supported by federal R&D agencies fulfill only a fraction of the preservation needed to pass on society’s knowledge to the future. Less than 1 percent of the Library of Congress’s 1.4 million archived videos and film reels were born digital. While the Library of Congress can preserve digital films if filmmakers share their unencrypted files, less than a dozen filmmakers and studios have done so, and the library has yet to preserve a single born-digital feature-length film.
Docs of the week: Ferguson Grand Jury, 100 years of INS annual reports, and the historic Moynihan Report
Here at Stanford libraries, my colleague Kris Kasianovitz and I are busy putting context to the *massive* haystack that is the Internet — and we could use some help (want to be a lostdocs collector?!)! Below are just a few of the documents we’ve collected in the last week, stored in our Stanford Digital Repository and made accessible through our library catalog.
1)The Negro family, the case for national action AKA the Moynihan Report. This document came to me from a recent New Yorker article “Don’t Be Like That: Does black culture need to be reformed?” by Kelefa Sanneh. The article, a book review of a new anthology called “The Cultural Matrix: Understanding Black Youth,” contextualized the sociology and cultural history of being black in America, describing in detail the ground-breaking work of Daniel Patrick Moynihan, trained as a sociologist and well known later as the liberal Senator from NY. As Sanneh notes, the Moynihan Report — which was originally printed in a run of 100 with 99 of them locked in a vault — was leaked to the press causing the Johnson administration to release the entire document. Moynihan’s overarching theme was “the deterioration of the Negro family” and he called for a national program to “strengthen the Negro family.”
2) Annual Report of the Immigration and Naturalization Service. This one started out as a research consultation. A student wanted to analyze this report over the 100+ years that it’s been published. She found that the Immigration and Naturalization Service had digitized their historic run, but for some reason had taken the link down from their site and not restored it for over 2 weeks. I contacted INS and got the digitized documents restored, then downloaded them, deposited them in SDR and had the purl added to our bibliographic record. The added benefit to collecting this digital annual report is that it makes it easier for future users to access this important annual report chock full of important statistics — our paper collection is shelved in several different areas of the US documents collection as INS has shifted around over the years (causing its call# to change over time) among different agencies from Treasury (call# T21.1:) to Labor (call# L3.1: and L6.1:) to Justice (call# J21.1:) to Homeland Security (call# HS4.200).
3) Documents from the Ferguson Grand Jury. Ferguson has been in the news over the last year because of the fatal shooting of African American youth Michael brown by police officer Darren Wilson and the ensuing protests it sparked. This important historic series of 105 Missouri state documents from the Grand Jury were released via Freedom of Information requests from CNN. Some of our government information colleagues around the country wondered online how to collect and preserve these documents for posterity and future researchers. Luckily, SUL is one library able to collect and preserve historically important born-digital government documents.
The overwhelming majority of state, local, US and international government documents these days are born-digital. Here at Stanford libraries, we continue to look for ways to maintain and expand both our historic and born-digital documents collections. Self-deposit will no doubt be one strategy among several (including Web archiving, LOCKSS and future initiatives) as we look to serve the information needs of citizens, faculty, students and researchers.