Home » Doc of the day » Document of the day. Or why a paper document may be better than a digitized version

Our mission

Free Government Information (FGI) is a place for initiating dialogue and building consensus among the various players (libraries, government agencies, non-profit organizations, researchers, journalists, etc.) who have a stake in the preservation of and perpetual free access to government information. FGI promotes free government information through collaboration, education, advocacy and research.

Document of the day. Or why a paper document may be better than a digitized version

I just received an old (historic NOT legacy) Department of Commerce publication off of the needs and offers list called “Commercial handbook of China” by Julean Arnold, commercial attaché (WorldCat record). It’s actually a 1975 reprint of a 1919 publication. It’s chock full of statistics relating to provinces, cities, and consular districts — agriculture, minerals and mining, populations, exports and imports, revenues, transportation, ports and shipping facilities etc. In short, this is a gold mine of historic information and statistics from the Republic of China (pre-Communist China). The document was digitized and is available in HathiTrust as well as the Internet Archive (see book reader below).

However, in comparing the digitized version with the paper version in hand, I came upon several issues:

  1. there are 3 foldout maps that were not digitized. These maps are critical information on railway lines and treaty ports in China. The bibliographic record has a physical description including “2 v. fronts., plates, fold. map, tables, diagrs., fold. charts” but no content note mentioning that the maps were not digitized.
  2. As I mentioned, the document is chock full of statistical tables. Have you ever tried copying and pasting tabular data from a PDF? It’s even worse when the tables are displayed in landscape rather than portrait. I’ve verified that the OCR fails on those pages.
  3. Lots of readability/usability issues: The table of contents is partially obscured in one copy and the tables are often blurred or faint. also, HT is using a process of OCR now where you can search but not copy or paste.
  4. Lastly, I find it … uh… interesting that this book says here “Copyright: Public Domain, Google-digitized.” But, if you want to download the whole book, you have to be an HT partner.

Does this digitized version increase access to this important historic material? Yes, indeed, it does. But I’m rather glad to have a bibliographic record in my catalog that links to the the digital version AND points to the paper copy in our collection.


CC BY-NC-SA 4.0 This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.


Leave a comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Archives