Home » Posts tagged 'born-digital'
Tag Archives: born-digital
The Fall, 2023 issue of Documents to the People (DttP) just came out. This issue is always interesting because it includes a section of MLIS student submissions. This time around was no different. An article by Miguel Beltran, a grad student at University of IL at Urbana Champaign (which also happens to be my alma mater!) caught my attention because it was on a subject that FGI has long written about: the exigency of born-digital preservation of government information.
Citation: Lessons Learned in Born-Digital Preservation. Miguel Beltran. Documents to the People (DttP), Fall, 2023. DOI: https://doi.org/10.5860/dttp.v51i3.8124.
Beltran’s insightful analysis revolves around the documents of the Special Inspector General for Afghanistan Reconstruction (SIGAR) and an investigative report by the Washington Post entitled “At war with the truth.” Beltran points to the BIG ELEPHANT in the FDLP room: the processes, workflows, and infrastructures needed to curate (collect, preserve, and give long-term access) government information are not currently in place and that “clear strategies and widespread collaboration are necessary to preserve government information on these mediums.”
As more government documents are created in digital mediums, it is increasingly important that agencies could preserve and make them available to the public. This article discusses one group of government documents related to the war in Afghanistan and the
landscape that would potentially preserve them. Based on the current conditions, there is a possibility that these documents and those of a similar nature may be overlooked and lost to future generations.
I checked the Catalog of Government Publications (CGP) for author: “United States. Office of the Special Inspector General for Afghanistan Reconstruction” and the newest SIGAR report there is from May of 2022. Herein lies the problem as Beltran notes. Without a agreement in place between SIGAR and GPO, many of this agency’s reports will fall through the cracks and not be cataloged for the National Collection or actively preserved. The main SIGAR site has been harvested by the Internet Archive many times since 2009 (but the reports page and its corresponding RSS feed have been collected far fewer times since only 2015, at very random intervals, and NOT by GPO!). That means that, though the SIGAR site is in the wayback machine, the reports from this agency are not necessarily even in wayback and certainly NOT in GPO’s FDLP web archive.
Therefore, the ONLY way to assure that these born-digital documents are curated is to go through the list one-by-one in a brute force kind of way to check to see if they’ve been cataloged in CGP and then report them as “unreported” documents to GPO. So that’s what I’m going to do 🙂
Thanks again to Miguel Beltran for again raising the important issue of born-digital preservation. Have you reported a document to GPO today? I challenge all of my FDLP colleagues around the country to report 5 documents per week to GPO. Together we can fill some of the cracks that are currently in the National Collection.
This week is Endangered Data Week, a new effort to raise awareness about publicly available data and the threats to its creation, sharing and retention. Follow along with the conversation at the Twitter hashtag #EndangeredData, check out the Endangered Data events near you, tune in on friday for the webinar hosted by the Digital Library Federation (DLF) “Endangered Accountability: A DLF-Sponsored Webinar on FOIA, Government Data, and Transparency” and definitely sign up for the new DLF Interest Group on Records Transparency/Accountability.
There’s never been such an open window of opportunity for govt information librarians to prove their metal and work together to assure the preservation of born-digital govt information in all its guises. So jump in and get involved today!
Political events in the United States have shed new light on the fragility of publicly administered data. In just the first few weeks of the Trump administration and 115th Congress, the Environmental Protection Agency was allegedly ordered to remove climate change information from its website, the USDA removed animal welfare data from its website, and the House passed H.Res.5, specifically excluding changes to the Affordable Care Act from mandatory long-term cost data analysis. The Senate and House of Representatives have both received proposed bills (S.103 and H.R.482) prohibiting funding from being used “to design, build, maintain, utilize, or provide access to a Federal database of geospatial information on community racial disparities or disparities in access to affordable housing.” While researchers, archivists, librarians, and watchdog groups work hard to create and preserve open data, there’s little guarantee that information under federal control will always survive changes to federal agencies.
Here at FGI, we have been raising concerns about the many issues surrounding the preservation of born-digital govt information for quite some time. Over the last year, there have been some fruitful discussions about digital preservation. Out of those discussions and meetings has grown a new collaborative project called the Preservation of Electronic Government Information (PEGI) project (pronounced PEGGY). Over the next 2 years, this group will hold public meetings and begin work to scope out the problems, do an environmental scan of the govt information landscape and explore possible solutions surrounding the preservation of electronic government information by cultural memory organizations for long term use by the citizens of the United States.
We are pleased to announce a new project: Preserving Electronic Government Information (PEGI). Librarians, technologists, and other information professionals from the Center for Research Libraries, the Government Publishing Office (GPO), the University of North Texas, the University of California at Santa Barbara, the University of Missouri, University of North Carolina at Greensboro, and Stanford University are undertaking a two year project to address national concerns regarding the preservation of electronic government information (PEGI) by cultural memory organizations for long term use by the citizens of the United States.
The PEGI project has been informed by a series of meetings between university librarians, information professionals, and representatives of federal agencies, including the Government Publishing Office and the National Archives and Records Administration. The focus of the PEGI proposal is at-risk government digital information of long term historical significance which is not being adequately harvested from the Web or by other automated means. The project website is located at the Center for Research Libraries.
Public PEGI project meetings are being scheduled in conjunction with selected upcoming conferences, including the OA Symposium, ALA Annual, and the 2017 Federal Depository Library Conference in October. If you would like to contact the project team for more information, or to ask to attend one or more of the meetings, please post an inquiry on the project’s google group.
Dr. Martin Halbert, University of North Texas, Project Steering Committee Chair
Roberta Sittel, University of North Texas
Marie Concannon, University of Missouri
James R. Jacobs, Stanford University
Lynda Kellam, University of North Carolina at Greensboro
Shari Laster, University of California, Santa Barbara
Scott Matheson, Yale University
Bernard Reilly, Center for Research Libraries
David E. Walls, Government Publishing Office (GPO)
Marie Waltz, Center for Research Libraries
Here’s a good piece in the Boston Globe, “The race to preserve disappearing data”. While primarily focusing on the film industry, it also mentions link rot, disappearing government information in the form of Supreme Court decisions and other issues on which government information librarians should be working. I’ve said it often and I’ll say it again, when documents librarians focus on digitizing historic government publications, they ignore the far greater danger of the disappearance of born-digital government information. We need the entire documents community to step up and work on the issue of born-digital collection development lest we risk becoming a “digital dark age.”
The problem of preservation is not unique to the film industry. It spans the digital artifacts of our age — from photos to music to scientific research data. One study of more than 500 biology papers published from a 20-year span found that as time passes, less original research data can be found; it suggested that up to 80 percent of raw data collected for studies in the early 1990s is lost. A crucial virtue of science is that researchers can reproduce findings or correct them over time by reevaluating original data. Fields from epidemiology to education to climate change require records that span decades or longer.
Lost data also plagues the legal world. A 2013 study of Supreme Court decisions by Harvard University Law School professors found that so-called link rot is eroding intellectual foundations of legal scholarship: Nearly half of all Supreme Court decisions up to that date and more than 70 percent of law journals from 1999 to 2012 referred to Web pages that no longer existed…
…What was once a race to rescue information from going-extinct media (think of old files trapped on floppy disks) has morphed into a mounting need to copy and curate massive troves of data, says Dr. David Rosenthal, the founder of a library-led digital preservation network run out of the Stanford University libraries. Digital information decays over time and files grow corrupt from “bit rot,” which Rosenthal says is best fended off by creating copies of data in multiple virtual and physical locations…
…“Digital preservation is essentially a hot potato problem, where everyone wants to pass responsibility onward,” said Berman, also a professor of computer science at Rensselaer Polytechnic Institute. She notes that in the private sector, companies invest in preserving data that give them a competitive advantage. The larger challenge is preserving those digital artifacts that have broad societal relevance for the future, but no urgent private interest.
Publicly funded archives such as the National Archives and those supported by federal R&D agencies fulfill only a fraction of the preservation needed to pass on society’s knowledge to the future. Less than 1 percent of the Library of Congress’s 1.4 million archived videos and film reels were born digital. While the Library of Congress can preserve digital films if filmmakers share their unencrypted files, less than a dozen filmmakers and studios have done so, and the library has yet to preserve a single born-digital feature-length film.
Docs of the week: Ferguson Grand Jury, 100 years of INS annual reports, and the historic Moynihan Report
Here at Stanford libraries, my colleague Kris Kasianovitz and I are busy putting context to the *massive* haystack that is the Internet — and we could use some help (want to be a lostdocs collector?!)! Below are just a few of the documents we’ve collected in the last week, stored in our Stanford Digital Repository and made accessible through our library catalog.
1)The Negro family, the case for national action AKA the Moynihan Report. This document came to me from a recent New Yorker article “Don’t Be Like That: Does black culture need to be reformed?” by Kelefa Sanneh. The article, a book review of a new anthology called “The Cultural Matrix: Understanding Black Youth,” contextualized the sociology and cultural history of being black in America, describing in detail the ground-breaking work of Daniel Patrick Moynihan, trained as a sociologist and well known later as the liberal Senator from NY. As Sanneh notes, the Moynihan Report — which was originally printed in a run of 100 with 99 of them locked in a vault — was leaked to the press causing the Johnson administration to release the entire document. Moynihan’s overarching theme was “the deterioration of the Negro family” and he called for a national program to “strengthen the Negro family.”
2) Annual Report of the Immigration and Naturalization Service. This one started out as a research consultation. A student wanted to analyze this report over the 100+ years that it’s been published. She found that the Immigration and Naturalization Service had digitized their historic run, but for some reason had taken the link down from their site and not restored it for over 2 weeks. I contacted INS and got the digitized documents restored, then downloaded them, deposited them in SDR and had the purl added to our bibliographic record. The added benefit to collecting this digital annual report is that it makes it easier for future users to access this important annual report chock full of important statistics — our paper collection is shelved in several different areas of the US documents collection as INS has shifted around over the years (causing its call# to change over time) among different agencies from Treasury (call# T21.1:) to Labor (call# L3.1: and L6.1:) to Justice (call# J21.1:) to Homeland Security (call# HS4.200).
3) Documents from the Ferguson Grand Jury. Ferguson has been in the news over the last year because of the fatal shooting of African American youth Michael brown by police officer Darren Wilson and the ensuing protests it sparked. This important historic series of 105 Missouri state documents from the Grand Jury were released via Freedom of Information requests from CNN. Some of our government information colleagues around the country wondered online how to collect and preserve these documents for posterity and future researchers. Luckily, SUL is one library able to collect and preserve historically important born-digital government documents.
The overwhelming majority of state, local, US and international government documents these days are born-digital. Here at Stanford libraries, we continue to look for ways to maintain and expand both our historic and born-digital documents collections. Self-deposit will no doubt be one strategy among several (including Web archiving, LOCKSS and future initiatives) as we look to serve the information needs of citizens, faculty, students and researchers.