Home » Posts tagged 'hathitrust'

Tag Archives: hathitrust

Our mission

Free Government Information (FGI) is a place for initiating dialogue and building consensus among the various players (libraries, government agencies, non-profit organizations, researchers, journalists, etc.) who have a stake in the preservation of and perpetual free access to government information. FGI promotes free government information through collaboration, education, advocacy and research.

HathiTrust Federal Documents Registry Progresses to Beta

This is big news indeed. According to a press release, HathiTrust’s Federal Documents Registry — pulled together from the records of over 40 libraries — is now available as a beta release! Mike Furlough and Valerie Glenn gave a very good presentation of the project, methodology, etc. at IFLA last week.

A HUGE thanks to HathiTrust and especially to Valerie Glenn who took on this immense project. Hopefully, it’ll not only help HathiTrust identify materials not yet digitized in its corpus, but also the 1100+ depository libraries and public information users. Hopefully the FDLP will be able to use the registry to put together a union list and holdings of all documents in order to further FDLP’s goal of preservation of all historic FDLP documents. I know I’ll be using the registry to compare Stanford Library’s federal documents collection to the registry in order to find and fill gaps in our collections and also hopefully as copy for documents not yet cataloged.

The Registry is intended to be a comprehensive source of metadata for the US federal documents corpus – material produced at government expense since 1789. While many potential use cases exist, an important use will be the identification of materials that have not yet been digitized and/or deposited into the HathiTrust repository.The Registry was conceived in 2012 as a mechanism to determine how far  HathiTrust had progressed in meeting its goal of a comprehensive digital corpus, as outlined in the ballot initiative from the 2011 Constitutional Convention. In the fall of 2013, we issued a broad call for records, and thanks to the more than 40 libraries who responded we received  more than 25 million records. With such a large aggregation of records, the project team needed to develop multiple approaches for detecting and grouping duplicate records (records describing the same work).

via Federal Documents Registry Progresses to Beta | HathiTrust Digital Library.

What Are We To Keep? (FAQ)

This document is meant to accompany the article, “What are we to Keep?” by James R. Jacobs, Documents to the People (Spring 2015) p 13-19.


  • What is a Preservation Copy?

    Research that was prompted by JSTOR’s desire to determine how to guarantee that all of the printed material within its journals would remain available defined preservation copies as “clean copies that retain full information accuracy from the vantage point of the researcher” (Yano). Thus when we think about “preservation copies” we are looking to be able to ensure that copies are available for the long-term and that those copies are complete and accurate. “Informational Accuracy” a “perfect copy” — a copy that is as good as new. A preservation copy is, therefore, a “clean” copy that is quality-checked and repaired, if necessary, on a page by page basis.

  • Why do we need Preservation Copies?

    Even if we had perfect digital copies of paper documents, we still need preservation paper copies for two reasons. First, there is evidence that digital documents degrade more rapidly than print material (Rosenthal), so it is necessary to have a paper copy that could be used to re-digitize. Second, Digitization does not magically preserve paper; or, to put it another way, digital copies are not the same as print copies and may inherently lose information by the very dint of reformatting to a new presentation.

  • Why do we need Access-Copies?
  • Unless we have perfect, page-verified digitizations that are as complete, as accurate, and as easily usable as the original paper copies (Jacobs and Jacobs), users will inevitably need to go back to the original paper copy in order to get either the complete and accurate content or the functional usability of the original paper medium. Some libraries have already reported that digitization of paper copies has increased the demand for access to the paper copies. Additionally, some users/uses will require access to physical copies via Interlibrary Borrowing. ILL can only happen if there is a surplus of copies. As the # of copies goes toward 0 (scarcity), libraries will no longer be willing to lend to ILL. Therefore, it is imperative that there not be a dearth of geographically distributed copies.

  • Why do we need re-digitization copies?

    Unless we create perfect copies that adequately anticipate the future needs of users, we will need to create new digitizations in order to meet those future needs. (See “An alarmingly casual indifference to accuracy and authenticity” What we know about digital surrogates.)

What should I think about before discarding government documents?

What are we to keep? thoughts on the National Collection (DttP Spring 2015 feature article)

The Spring 2015 issue of Documents to the People (DttP) just arrived at my door. The feature article in this issue is titled “Thoughts on the National Collection” and was collaboratively written by myself, James R. Jacobs, along with Shari Laster, Aimee C. Quinn, and Barbie Selby. I’m posting my segment titled “What Are We to Keep?” as it was written under a Creative Commons Attribution-NonCommercial-Share-Alike CC BY-NC-SA license. The other pieces include: “Segmenting the Government Information Corpus” by Shari Laster; “Who is Responsible for Permanent Public Access?” by Aimee C. Quinn; and “Where Do We Go From Here?: Some Thoughts” by Barbie Selby. I’ll post the other segments if I get permission from my collaborators.

The question of “how many copies” of print documents the FDLP should collectively keep is the wrong question asked for the wrong reasons and trying to answer it will only lead to the wrong answers and irreparable loss of information. For me, even thinking about answering it raises more questions. How can we know how many copies to keep unless we specify the purposes for which we wish to keep them? What are those purposes? How will we know if we are meeting our goals? How will discarding paper benefit users? How can we be sure that we are not losing information when we discard paper copies if we do not have an inventory of the paper copies that exist? How can we implement a policy that is so vague that it doesn’t define things like “a requisite number of copies,” and how decisions will be made, and which apparently treats a born-digital XML document created by GPO and an indifferent digitization without OCR text and missing its maps and foldouts as of equal value?

Let’s be clear. We are talking about the records of our democracy. Loss of even a single page could damage the ability of historians, journalists, economists, and citizens to understand our history and hold our government accountable for it successes and its failures. We have those documents now in our libraries; there are not hundreds or even dozens of copies of these documents floating around in used bookstores or elsewhere. They are in our charge.

Keep reading “What are we to keep?”…

Also see the What Are We To Keep FAQ for further context and bibliography.

Document of the day. Or why a paper document may be better than a digitized version

I just received an old (historic NOT legacy) Department of Commerce publication off of the needs and offers list called “Commercial handbook of China” by Julean Arnold, commercial attaché (WorldCat record). It’s actually a 1975 reprint of a 1919 publication. It’s chock full of statistics relating to provinces, cities, and consular districts — agriculture, minerals and mining, populations, exports and imports, revenues, transportation, ports and shipping facilities etc. In short, this is a gold mine of historic information and statistics from the Republic of China (pre-Communist China). The document was digitized and is available in HathiTrust as well as the Internet Archive (see book reader below).

However, in comparing the digitized version with the paper version in hand, I came upon several issues:

  1. there are 3 foldout maps that were not digitized. These maps are critical information on railway lines and treaty ports in China. The bibliographic record has a physical description including “2 v. fronts., plates, fold. map, tables, diagrs., fold. charts” but no content note mentioning that the maps were not digitized.
  2. As I mentioned, the document is chock full of statistical tables. Have you ever tried copying and pasting tabular data from a PDF? It’s even worse when the tables are displayed in landscape rather than portrait. I’ve verified that the OCR fails on those pages.
  3. Lots of readability/usability issues: The table of contents is partially obscured in one copy and the tables are often blurred or faint. also, HT is using a process of OCR now where you can search but not copy or paste.
  4. Lastly, I find it … uh… interesting that this book says here “Copyright: Public Domain, Google-digitized.” But, if you want to download the whole book, you have to be an HT partner.

Does this digitized version increase access to this important historic material? Yes, indeed, it does. But I’m rather glad to have a bibliographic record in my catalog that links to the the digital version AND points to the paper copy in our collection.

Comparing Hathitrust and Google Books as repositories of government documents

Here are 2 recent items analyzing Hathitrust and Google books for their efficacy in giving access to Federal government documents. The first is an article by Laura Sare (Texas A&M) and compares Hathitrust with Google Books. The second is a presentation by Brian Vetruba (Washington U in St Louis) at “Leveraging Your Strengths: Regional Government Documents Conference” at the Federal Reserve Bank St. Louis on May 4, 2012.

A Comparison of HathiTrust and Google Books Using Federal Publications. Laura Sare. Practical Academic Librarianship: The International Journal of the SLA Academic Division. 2(1) 2012 p. 1-25. (attached below. Fair use claim)