Paste your Google Webmaster Tools verification code here

Home » post » Google Books/Fed Docs: Google Books Statistics–The Bigger Picture

Our mission

Free Government Information (FGI) is a place for initiating dialogue and building consensus among the various players (libraries, government agencies, non-profit organizations, researchers, journalists, etc.) who have a stake in the preservation of and perpetual free access to government information. FGI promotes free government information through collaboration, education, advocacy and research.

Google Books/Fed Docs: Google Books Statistics–The Bigger Picture

Now that I had some statistics it dawned on me I had no idea whether or not this was a lot documents.  So I was off to the FDLP desktop and the  Catalog of U.S. Government Publications.

I looked around the desktop to see if GPO listed any statistics.  On the "about" page for the CGP, GPO says merely that there are more than 500,000 records in the database.  So I gave some thought to how I might get a better figure, and off I went to OCLC and the GPO database in FirstSearch.  On the database info page, OCLC lists 507,000+ as the number of records and that the database had its last monthly update on August 8, 2007.

So I went back to the CGP and its advance search page.  Searching for GPO in the publisher field is not terribly effective.  Of course, in this database everything is a government document so that is not a problem. 

But how to get a real number out of the database?  I tried using the most common of words–a and the– but to no great effect.  A brings up 359,875 records and the brings up 411,493.  Neither result comes close enough to the supposed 507,000.

I had another realization that the CGP now includes records for electronic titles–titles that would not be fodder for the Google Book Project. Using the New Electronic Titles page is not really an option to count them as it only goes back to April of 2005 and since early 2006 the monthly lists are not numbered (leaving me to do a lot of counting).

So back to the advanced searching page in the CGP.  Happily here you can search for terms in the URL/PURL. I proceeded to search for every record that listed .gov, .mil, .us, .org, and .com. I came up with a total of 64,504 records.  So approximately 13% of the records in the CGP are electronic titles or are titles with an electronic counterpart.

Unfortunately I had another realization that these figures really only represent documents published from 1976 on. This is a really big problem in that most of the documents I found in Google Books dated to before 1923.  My only hope to get good numbers was to askGPO.  So late on August 8th I shot off a query to GPO asking for statistics on the number of documents GPO has distributed both before and after 1976.

Surprisingly enough, GPO called me first thing the next morning. askGPO is notoriously slow in providing answers to queries so I was very surprised!  I spoke with Nancy Faget at GPO and she was very pleasant though not exactly forthcoming with numbers.  It struck me that I got the quick call back as GPO viewed my query as the first step in getting out of the program.  As far as I know my director has no real intentions of doing that, but I don’t think I convinced her on that point.  But aside from that she told me that GPO really didn’t know how many documents went to depositories since the beginning.  Alas!!

I honestly don’t know if would be fair to take the view that probably as much was distributed from 1813 to 1976 as was published after 1976.  But if you did, that would lead to believe that over one million documents have been produced.

So the bigger picture suggests that the 167,878 titles in Google Books is only about 17% of all the documents that could be digitized.  At a guess…

So I put a call out to everyone in GovDoc Land.  If you are a full depository and have been one since 1813 and have kept really good records, could you please send me the statistics?  Thank you very kindly in advance!



CC BY-NC-SA 4.0 This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Leave a comment

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.