Home » Posts tagged 'DttP'
Tag Archives: DttP
DttP student article re SIGAR and the tenuous nature of born-digital preservation
The Fall, 2023 issue of Documents to the People (DttP) just came out. This issue is always interesting because it includes a section of MLIS student submissions. This time around was no different. An article by Miguel Beltran, a grad student at University of IL at Urbana Champaign (which also happens to be my alma mater!) caught my attention because it was on a subject that FGI has long written about: the exigency of born-digital preservation of government information.
Citation: Lessons Learned in Born-Digital Preservation. Miguel Beltran. Documents to the People (DttP), Fall, 2023. DOI: https://doi.org/10.5860/dttp.v51i3.8124.
Beltran’s insightful analysis revolves around the documents of the Special Inspector General for Afghanistan Reconstruction (SIGAR) and an investigative report by the Washington Post entitled “At war with the truth.” Beltran points to the BIG ELEPHANT in the FDLP room: the processes, workflows, and infrastructures needed to curate (collect, preserve, and give long-term access) government information are not currently in place and that “clear strategies and widespread collaboration are necessary to preserve government information on these mediums.”
As more government documents are created in digital mediums, it is increasingly important that agencies could preserve and make them available to the public. This article discusses one group of government documents related to the war in Afghanistan and the
landscape that would potentially preserve them. Based on the current conditions, there is a possibility that these documents and those of a similar nature may be overlooked and lost to future generations.
I checked the Catalog of Government Publications (CGP) for author: “United States. Office of the Special Inspector General for Afghanistan Reconstruction” and the newest SIGAR report there is from May of 2022. Herein lies the problem as Beltran notes. Without a agreement in place between SIGAR and GPO, many of this agency’s reports will fall through the cracks and not be cataloged for the National Collection or actively preserved. The main SIGAR site has been harvested by the Internet Archive many times since 2009 (but the reports page and its corresponding RSS feed have been collected far fewer times since only 2015, at very random intervals, and NOT by GPO!). That means that, though the SIGAR site is in the wayback machine, the reports from this agency are not necessarily even in wayback and certainly NOT in GPO’s FDLP web archive.
Therefore, the ONLY way to assure that these born-digital documents are curated is to go through the list one-by-one in a brute force kind of way to check to see if they’ve been cataloged in CGP and then report them as “unreported” documents to GPO. So that’s what I’m going to do 🙂
Thanks again to Miguel Beltran for again raising the important issue of born-digital preservation. Have you reported a document to GPO today? I challenge all of my FDLP colleagues around the country to report 5 documents per week to GPO. Together we can fill some of the cracks that are currently in the National Collection.
Freegovinfo receives 2015 GODORT “Documents to the People (DttP)” award
Last night, Daniel Cornwall and James R. Jacobs were honored to be on hand to receive the 2015 GODORT “Documents to the People (DttP)” award for Free Government Information.
Free Government Information has been chosen as the recipient of the 2015 ProQuest/GODORT/ALA “Documents to the People” Award. This award is a tribute to an individual, library, institution, or other non-commercial group that has most effectively encouraged the use of government documents in support of library service. FGI epitomizes the spirit of the DttP Award by creating an open, public dialogue and building a diverse community. By moving the conversation to more social technologies, FGI has changed the way we think about preserving access to government information. As one letter of support noted, “FGI fills a gap between the specialized and frequently technical discussions taking place on the listserv and the more public conversations that are taking place with librarians of other specializations, professionals and advocates from other disciplines and backgrounds, and even the wider public.” The DttP Award recognizes the work of the many FGI volunteers who, for ten years, have dedicated themselves to advocating for permanent no-fee public access to government information.
FGI volunteers shown here (clockwise from upper left) Daniel Cornwall, Jim Jacobs, James Jacobs, Shinjoung Yeo, Rebecca Stockbridge, James Staub.
We were given the opportunity to say a few words, so thought we’d share our statement in its entirety below. We were so surprised and honored to receive this award. It’s a good feeling that 11 years of work with FGI have had some positive effect on the Documents community. Thanks!
(James)
11 years ago this october, Jim Jacobs, James Jacobs, Shinjoung Yeo and an unnamed person who wishes to remain anonymous were having dinner. We were discussing the future of govtinfo and brainstorming about how to change the conversation in the community. We had begun to notice some disturbing trends.
Under the pervasive myth that the new digital enviroment transcended and rendered moot the roles of libraries in access to govinfo, some libraries had begun questioning and dismantling the very fibers of the FDLP and public access to govt information by abandoning their traditional govinfo roles and repurposing their documents experts billets. To challenge the myth and to engage in a broader dialogue with both the wider library community as well as other stakeholder communities, the three of us had just written an article in Journal of Academic Librarianship about the once and future FDLP, in response to the trends we were seeing. We were really trying to push back against what we saw as libraries’ complicity in the erosion of the FDLP and public access to govt information.
I wish we could say that we’ve accomplished our goals and moved on to other projects. However, after 11 years, we’re still fighting for these same issues, dispelling the myths of what communication scholar Vincent mosco called the “digital sublime:” the almost religious fervor that technology would magically deliver democracy to the masses and that libraries no longer needed to work at collecting, describing, giving access to, and preserving govt info but could simply rely on GPO, commercial vendors and others besides libraries.
After that evening of discussion, FGI was born and the blog was started soon after. We quickly doubled our “staff” when Daniel Cornwall, Rebecca Troy Horton and James Staub joined us.
The bad news is that we are still facing tremendous challanges and tasks as the issues surrounding gov info become ever more complicated. The good news is that you all still have plenty of opportunities to participate to assure publicly controlled long-term access to govinfo.
We may have charred a few bridges over the years, but if this award is any testiment, we’ve also made a lot of friends and comrades to the cause of freegovinfo and for that we thank you!
(Daniel)
Like James, I’m very grateful that GODORT has honored FGI on the past decade’s worth of work. This is a good time to look ahead to the next ten years. We believe that the work of maintaining a system of permanently accessible government information at no cost to the user will require an active partnership among libraries and other non-profit institutions of good will. Otherwise, only that which has tangible market value will be preserved, with access at a price.
What can you do to help ensure a positive future in government information? Some things we think are in the reach of libraries and other non profits are:
– Advocating for users. Being user centric is more than throwing a user icon on the center of a chart. How do your users look at government information? What are they missing if they’re not? Help put the right information in front of the right people.
- Participate in finding/reporting fugitive documents. Host the stuff you find on your own servers or through an archiving service.
- Think about joining the LOCKSS-USDOCS group that is currently the only archive of FDSys outside Federal hands.
- Build local digital collections under your administrative control. Pointing is not maintaining access. We learned this during the last government shutdown.
- Work with other librarians in your state to plan for how you’ll serve up federal information for when the next government shutdown happens. Let’s not be taken by surprise again.
If we’re willing to pull together and do what we can, the next ten years will look bright for government information. Thank you.
Thoughts on the National Collection (DttP, spring 2015)
A few weeks ago, I posted on FGI my part of a collaborative feature article in the Spring 2015 issue of Documents to the People (DttP) (What are we to keep? thoughts on the National Collection). The other writers (Shari Laster, Aimée C. Quinn, and Barbie Selby) have given me permission to post their segments of our piece. We hope that this will spur some positive discussion and move the community toward a sustainable future for the Federal Depository Library Program (FDLP) and for government information in libraries.
Thoughts on the National Collection
In August 2014, at the request of outgoing Federal Documents Task Force Chair, Jill Vassilakos-Long, several GODORT members met via conference calls and e-mail to discuss the GPO proposal to enable regional depository libraries to discard tangible material by substituting digital documents if they met specific criteria. Shortly after this group’s work was completed (FGI Editor’s note: see the GODORT letter in re this proposal as well as those of other library associations), a general call for articles was announced on GOVDOC-L by the editors of DttP, and we thought it would be interesting to offer our personal opinions on one of the questions in the announcement which related to our work from the task force. We agreed to each limit our contributions to 2-3 pages. The following pieces are our individual perspectives about who is responsible for the preservation of government information and the feasibility of setting a target for an optimal number of tangible copies for preservation purposes.
James R. Jacobs, Stanford University
Shari Laster, University of California Santa Barbara (UCSB)
Aimée C. Quinn, University of New Mexico
Barbie Selby, University of Virginia
- What are we to keep? thoughts on the National Collection. James R. Jacobs
- Segmenting the Government Information Corpus. Shari Laster
- Who Is Responsible for Permanent Public Access? Aimée C. Quinn
- Where Do We Go From Here?: Some Thoughts. Barbie Selby
What Are We To Keep? (FAQ)
This document is meant to accompany the article, “What are we to Keep?” by James R. Jacobs, Documents to the People (Spring 2015) p 13-19.
FAQ
- What is a Preservation Copy?
Research that was prompted by JSTOR’s desire to determine how to guarantee that all of the printed material within its journals would remain available defined preservation copies as “clean copies that retain full information accuracy from the vantage point of the researcher” (Yano). Thus when we think about “preservation copies” we are looking to be able to ensure that copies are available for the long-term and that those copies are complete and accurate. “Informational Accuracy” a “perfect copy” — a copy that is as good as new. A preservation copy is, therefore, a “clean” copy that is quality-checked and repaired, if necessary, on a page by page basis.
- Why do we need Preservation Copies?
Even if we had perfect digital copies of paper documents, we still need preservation paper copies for two reasons. First, there is evidence that digital documents degrade more rapidly than print material (Rosenthal), so it is necessary to have a paper copy that could be used to re-digitize. Second, Digitization does not magically preserve paper; or, to put it another way, digital copies are not the same as print copies and may inherently lose information by the very dint of reformatting to a new presentation.
- Why do we need Access-Copies?
- Why do we need re-digitization copies?
Unless we create perfect copies that adequately anticipate the future needs of users, we will need to create new digitizations in order to meet those future needs. (See “An alarmingly casual indifference to accuracy and authenticity” What we know about digital surrogates.)
Unless we have perfect, page-verified digitizations that are as complete, as accurate, and as easily usable as the original paper copies (Jacobs and Jacobs), users will inevitably need to go back to the original paper copy in order to get either the complete and accurate content or the functional usability of the original paper medium. Some libraries have already reported that digitization of paper copies has increased the demand for access to the paper copies. Additionally, some users/uses will require access to physical copies via Interlibrary Borrowing. ILL can only happen if there is a surplus of copies. As the # of copies goes toward 0 (scarcity), libraries will no longer be willing to lend to ILL. Therefore, it is imperative that there not be a dearth of geographically distributed copies.
Checklist:
What should I think about before discarding government documents?
-
1. In General
- Does the document have long-term historical value? and if it is a recently published document, *will* it have historic value?
- Does the document include tabular data and statistics?
- Does the document include maps, fold-outs, color illustrations, and other non-textual content?
- Does the library have adequate metadata representation in the library’s catalog for the document?
- Is the document discoverable and accessible?
- How many other libraries are listed in the OCLC record as having a copy?
- Are there other copies in nearby FDLs?
- Are there MOU’s for shared collections with nearby libraries/consortia in place?
- Does the digitization meet the requirements of the Digital Surrogate Seal of Approval (DSSOA)?
- Is the digital copy adequately cataloged?
- Does the digitization include digital full-text (aka OCR)?
- Is the full-text searchable for item-level discovery?
- Is the full-text searchable within an item?
- Can the digital text be accurately copied or extracted?
- How accurate is the digital text — particularly with regard to tabular numeric data, dates, and named people places and things?
- Does the digitized text preserve the original layout of the print text — particularly with regard to tables, footnotes, sidebars, and headers and footers?
- Is the document freely and publicly available in a trusted digital repository?
- Does your community have complete access and use rights to the digital copy?
- Has anyone checked the digital document page-by-page to assure it’s accuracy, legibility, usability, and searchability?
- Does your library have any control over the long-term availability of the document?
2. About Paper Copies
3. About Digital Copies
Selected Bibliography
Ames, Eric. “So We Can Throw These Out Now, Right?”: What We Learned From Microfilming Newspapers and How It Shapes Our Digitization Strategy. The Baylor University Libraries Digital Collections Blog (August 23, 2012).
Center for Research Libraries. 2011. Certification Report on the HathiTrust Digital Repository (March 2011).
Conway, Paul. 2013. Preserving Imperfection: Assessing the Incidence of Digital Imaging Error in HathiTrust.
HathiTrust. 2012. Update on February 2012 Activities: HathiTrust Research Center (HTRC): quantifying OCR errors.
Jacobs, James A., and James R. Jacobs. 2013. “The Digital-Surrogate Seal of Approval: A Consumer-Oriented Standard.” D-Lib Magazine 19, no. 3/4 (March 2013). doi:10.1045/march2013-jacobs.
Kichuk, Diana. 2015. “Loose, Falling Characters and Sentences: The Persistence of the OCR Problem in Digital Repository E-Books.” Portal: Libraries and the Academy 15, no. 1 (2015): 59–91. doi:10.1353/pla.2015.0005.
Ladd, Ken. 2010. An Examination of the Failure Rate and Content Equivalency of Electronic Surrogates and the Implications for Print Equivalent Preservation. Evidence Based Library and Information Practice (2010) 5.4.
McEathron, Scott R. An Assessment of Image Quality in Geology Works from the HathiTrust Digital Library. Proceedings, Geoscience Information Society, Volume 41, October 27, 2011.
Nadal, Jacob, and Annie Peterson. 2009. Scarce and Endangered Works: Using Network-Level Holdings Data in Preservation Decision Making and Stewardship of the Printed Record. Preprint, accepted for publication in ALCTS Monographs.
Schonfeld, Roger C., and Ross Housewright. 2009. Documents for a Digital Democracy: A Model for the Federal Depository Library Program in the 21st Century. Ithaka S+R (December 17, 2009).
Schonfeld, Roger C., and Ross Housewright. 2009. What to Withdraw: Print Collections Management in the Wake of Digitization. Ithaka S+R, (September 29, 2009).
Yano, Candace Arai, Z.J. Max Shen, and Stephen Chan. 2008. Optimizing the Number of Copies for Print Preservation of Research Journals Berkeley, CA: University of California Berkeley, Industrial Engineering & Operations Research, (October 2008). [originally published at http://www.ieor.berkeley.edu/~shen/webpapers/V.8.pdf]
What are we to keep? thoughts on the National Collection (DttP Spring 2015 feature article)
The Spring 2015 issue of Documents to the People (DttP) just arrived at my door. The feature article in this issue is titled “Thoughts on the National Collection” and was collaboratively written by myself, James R. Jacobs, along with Shari Laster, Aimee C. Quinn, and Barbie Selby. I’m posting my segment titled “What Are We to Keep?” as it was written under a Creative Commons Attribution-NonCommercial-Share-Alike CC BY-NC-SA license. The other pieces include: “Segmenting the Government Information Corpus” by Shari Laster; “Who is Responsible for Permanent Public Access?” by Aimee C. Quinn; and “Where Do We Go From Here?: Some Thoughts” by Barbie Selby. I’ll post the other segments if I get permission from my collaborators.
The question of “how many copies” of print documents the FDLP should collectively keep is the wrong question asked for the wrong reasons and trying to answer it will only lead to the wrong answers and irreparable loss of information. For me, even thinking about answering it raises more questions. How can we know how many copies to keep unless we specify the purposes for which we wish to keep them? What are those purposes? How will we know if we are meeting our goals? How will discarding paper benefit users? How can we be sure that we are not losing information when we discard paper copies if we do not have an inventory of the paper copies that exist? How can we implement a policy that is so vague that it doesn’t define things like “a requisite number of copies,” and how decisions will be made, and which apparently treats a born-digital XML document created by GPO and an indifferent digitization without OCR text and missing its maps and foldouts as of equal value?
Let’s be clear. We are talking about the records of our democracy. Loss of even a single page could damage the ability of historians, journalists, economists, and citizens to understand our history and hold our government accountable for it successes and its failures. We have those documents now in our libraries; there are not hundreds or even dozens of copies of these documents floating around in used bookstores or elsewhere. They are in our charge.
Keep reading “What are we to keep?”…
Also see the What Are We To Keep FAQ for further context and bibliography.
Latest Comments