Home » Posts tagged 'born-digital'
Tag Archives: born-digital
This week is Endangered Data Week, a new effort to raise awareness about publicly available data and the threats to its creation, sharing and retention. Follow along with the conversation at the Twitter hashtag #EndangeredData, check out the Endangered Data events near you, tune in on friday for the webinar hosted by the Digital Library Federation (DLF) “Endangered Accountability: A DLF-Sponsored Webinar on FOIA, Government Data, and Transparency” and definitely sign up for the new DLF Interest Group on Records Transparency/Accountability.
There’s never been such an open window of opportunity for govt information librarians to prove their metal and work together to assure the preservation of born-digital govt information in all its guises. So jump in and get involved today!
Political events in the United States have shed new light on the fragility of publicly administered data. In just the first few weeks of the Trump administration and 115th Congress, the Environmental Protection Agency was allegedly ordered to remove climate change information from its website, the USDA removed animal welfare data from its website, and the House passed H.Res.5, specifically excluding changes to the Affordable Care Act from mandatory long-term cost data analysis. The Senate and House of Representatives have both received proposed bills (S.103 and H.R.482) prohibiting funding from being used “to design, build, maintain, utilize, or provide access to a Federal database of geospatial information on community racial disparities or disparities in access to affordable housing.” While researchers, archivists, librarians, and watchdog groups work hard to create and preserve open data, there’s little guarantee that information under federal control will always survive changes to federal agencies.
Here at FGI, we have been raising concerns about the many issues surrounding the preservation of born-digital govt information for quite some time. Over the last year, there have been some fruitful discussions about digital preservation. Out of those discussions and meetings has grown a new collaborative project called the Preservation of Electronic Government Information (PEGI) project (pronounced PEGGY). Over the next 2 years, this group will hold public meetings and begin work to scope out the problems, do an environmental scan of the govt information landscape and explore possible solutions surrounding the preservation of electronic government information by cultural memory organizations for long term use by the citizens of the United States.
We are pleased to announce a new project: Preserving Electronic Government Information (PEGI). Librarians, technologists, and other information professionals from the Center for Research Libraries, the Government Publishing Office (GPO), the University of North Texas, the University of California at Santa Barbara, the University of Missouri, University of North Carolina at Greensboro, and Stanford University are undertaking a two year project to address national concerns regarding the preservation of electronic government information (PEGI) by cultural memory organizations for long term use by the citizens of the United States.
The PEGI project has been informed by a series of meetings between university librarians, information professionals, and representatives of federal agencies, including the Government Publishing Office and the National Archives and Records Administration. The focus of the PEGI proposal is at-risk government digital information of long term historical significance which is not being adequately harvested from the Web or by other automated means. The project website is located at the Center for Research Libraries.
Public PEGI project meetings are being scheduled in conjunction with selected upcoming conferences, including the OA Symposium, ALA Annual, and the 2017 Federal Depository Library Conference in October. If you would like to contact the project team for more information, or to ask to attend one or more of the meetings, please post an inquiry on the project’s google group.
Dr. Martin Halbert, University of North Texas, Project Steering Committee Chair
Roberta Sittel, University of North Texas
Marie Concannon, University of Missouri
James R. Jacobs, Stanford University
Lynda Kellam, University of North Carolina at Greensboro
Shari Laster, University of California, Santa Barbara
Scott Matheson, Yale University
Bernard Reilly, Center for Research Libraries
David E. Walls, Government Publishing Office (GPO)
Marie Waltz, Center for Research Libraries
Here’s a good piece in the Boston Globe, “The race to preserve disappearing data”. While primarily focusing on the film industry, it also mentions link rot, disappearing government information in the form of Supreme Court decisions and other issues on which government information librarians should be working. I’ve said it often and I’ll say it again, when documents librarians focus on digitizing historic government publications, they ignore the far greater danger of the disappearance of born-digital government information. We need the entire documents community to step up and work on the issue of born-digital collection development lest we risk becoming a “digital dark age.”
The problem of preservation is not unique to the film industry. It spans the digital artifacts of our age — from photos to music to scientific research data. One study of more than 500 biology papers published from a 20-year span found that as time passes, less original research data can be found; it suggested that up to 80 percent of raw data collected for studies in the early 1990s is lost. A crucial virtue of science is that researchers can reproduce findings or correct them over time by reevaluating original data. Fields from epidemiology to education to climate change require records that span decades or longer.
Lost data also plagues the legal world. A 2013 study of Supreme Court decisions by Harvard University Law School professors found that so-called link rot is eroding intellectual foundations of legal scholarship: Nearly half of all Supreme Court decisions up to that date and more than 70 percent of law journals from 1999 to 2012 referred to Web pages that no longer existed…
…What was once a race to rescue information from going-extinct media (think of old files trapped on floppy disks) has morphed into a mounting need to copy and curate massive troves of data, says Dr. David Rosenthal, the founder of a library-led digital preservation network run out of the Stanford University libraries. Digital information decays over time and files grow corrupt from “bit rot,” which Rosenthal says is best fended off by creating copies of data in multiple virtual and physical locations…
…“Digital preservation is essentially a hot potato problem, where everyone wants to pass responsibility onward,” said Berman, also a professor of computer science at Rensselaer Polytechnic Institute. She notes that in the private sector, companies invest in preserving data that give them a competitive advantage. The larger challenge is preserving those digital artifacts that have broad societal relevance for the future, but no urgent private interest.
Publicly funded archives such as the National Archives and those supported by federal R&D agencies fulfill only a fraction of the preservation needed to pass on society’s knowledge to the future. Less than 1 percent of the Library of Congress’s 1.4 million archived videos and film reels were born digital. While the Library of Congress can preserve digital films if filmmakers share their unencrypted files, less than a dozen filmmakers and studios have done so, and the library has yet to preserve a single born-digital feature-length film.
Docs of the week: Ferguson Grand Jury, 100 years of INS annual reports, and the historic Moynihan Report
Here at Stanford libraries, my colleague Kris Kasianovitz and I are busy putting context to the *massive* haystack that is the Internet — and we could use some help (want to be a lostdocs collector?!)! Below are just a few of the documents we’ve collected in the last week, stored in our Stanford Digital Repository and made accessible through our library catalog.
1)The Negro family, the case for national action AKA the Moynihan Report. This document came to me from a recent New Yorker article “Don’t Be Like That: Does black culture need to be reformed?” by Kelefa Sanneh. The article, a book review of a new anthology called “The Cultural Matrix: Understanding Black Youth,” contextualized the sociology and cultural history of being black in America, describing in detail the ground-breaking work of Daniel Patrick Moynihan, trained as a sociologist and well known later as the liberal Senator from NY. As Sanneh notes, the Moynihan Report — which was originally printed in a run of 100 with 99 of them locked in a vault — was leaked to the press causing the Johnson administration to release the entire document. Moynihan’s overarching theme was “the deterioration of the Negro family” and he called for a national program to “strengthen the Negro family.”
2) Annual Report of the Immigration and Naturalization Service. This one started out as a research consultation. A student wanted to analyze this report over the 100+ years that it’s been published. She found that the Immigration and Naturalization Service had digitized their historic run, but for some reason had taken the link down from their site and not restored it for over 2 weeks. I contacted INS and got the digitized documents restored, then downloaded them, deposited them in SDR and had the purl added to our bibliographic record. The added benefit to collecting this digital annual report is that it makes it easier for future users to access this important annual report chock full of important statistics — our paper collection is shelved in several different areas of the US documents collection as INS has shifted around over the years (causing its call# to change over time) among different agencies from Treasury (call# T21.1:) to Labor (call# L3.1: and L6.1:) to Justice (call# J21.1:) to Homeland Security (call# HS4.200).
3) Documents from the Ferguson Grand Jury. Ferguson has been in the news over the last year because of the fatal shooting of African American youth Michael brown by police officer Darren Wilson and the ensuing protests it sparked. This important historic series of 105 Missouri state documents from the Grand Jury were released via Freedom of Information requests from CNN. Some of our government information colleagues around the country wondered online how to collect and preserve these documents for posterity and future researchers. Luckily, SUL is one library able to collect and preserve historically important born-digital government documents.
The overwhelming majority of state, local, US and international government documents these days are born-digital. Here at Stanford libraries, we continue to look for ways to maintain and expand both our historic and born-digital documents collections. Self-deposit will no doubt be one strategy among several (including Web archiving, LOCKSS and future initiatives) as we look to serve the information needs of citizens, faculty, students and researchers.
Last week, GPO announced that Sitting Bull College Library had joined the Federal Depository Library Program (FDLP) and would be the first digital-only member of the FDLP. Welcome Sitting Bull College to the FDLP!
But one sentence piqued my interest in GPO’s press release: “…opting to meet their community’s needs by developing an online Government information collection.” I haven’t been able to find any information about *how* GPO is helping Sitting Bull College to develop an “online government information collection.” To me, a library collection is one that the library actually manages, preserves and makes accessible to their local community. GPO is not offering digital deposit to Sitting Bull College (or any FDLP library for that matter).
I think for the FDLP program to continue to flourish, there’s got to be more than PR. There’s so much to be done to ensure the preservation and access of born-digital government information. There need to be ways for depositories to help collect, preserve, describe and give access (e.g. digital deposit, fugitive hunting, collaborative cataloging, building an FDLP knowledge base etc). Otherwise, they’re “depositories” in name only.
The U.S. Government Printing Office (GPO) designates Sitting Bull College Library as the first digital-only member of the Federal Depository Library Program (FDLP). The library is opting to meet their community’s needs by developing an online Government information collection. In choosing this format, the library will not receive print materials from GPO. Sitting Bull College is a Native American tribally-managed college that was granted land-grant status under an act of Congress with the mission to serve their community through higher education programs. Sitting Bull College Library serves the information needs of its students, faculty, and staff, the Standing Rock Sioux Reservation community, and the American public in North Dakota and South Dakota. Through the FDLP, GPO works with approximately 1,200 libraries nationwide to provide the public with access to authentic, published information from all three branches of the Federal Government in print and electronic formats.
“Libraries have always been the cornerstone in helping GPO carry out its mission of Keeping America Informed on the three branches of the Federal Government,” said Public Printer Davita Vance-Cooks. “GPO welcomes Sitting Bull College Library into the FDLP. As GPO continues to transform by providing information in digital formats, we are pleased to partner with the library community to expand access to Government information in their communities.”