Now that I had some statistics it dawned on me I had no idea whether or not this was a lot documents. So I was off to the FDLP desktop and the Catalog of U.S. Government Publications.
I looked around the desktop to see if GPO listed any statistics. On the "about" page for the CGP, GPO says merely that there are more than 500,000 records in the database. So I gave some thought to how I might get a better figure, and off I went to OCLC and the GPO database in FirstSearch. On the database info page, OCLC lists 507,000+ as the number of records and that the database had its last monthly update on August 8, 2007.
So I went back to the CGP and its advance search page. Searching for GPO in the publisher field is not terribly effective. Of course, in this database everything is a government document so that is not a problem.
But how to get a real number out of the database? I tried using the most common of words–a and the– but to no great effect. A brings up 359,875 records and the brings up 411,493. Neither result comes close enough to the supposed 507,000.
I had another realization that the CGP now includes records for electronic titles–titles that would not be fodder for the Google Book Project. Using the New Electronic Titles page is not really an option to count them as it only goes back to April of 2005 and since early 2006 the monthly lists are not numbered (leaving me to do a lot of counting).
So back to the advanced searching page in the CGP. Happily here you can search for terms in the URL/PURL. I proceeded to search for every record that listed .gov, .mil, .us, .org, and .com. I came up with a total of 64,504 records. So approximately 13% of the records in the CGP are electronic titles or are titles with an electronic counterpart.
Unfortunately I had another realization that these figures really only represent documents published from 1976 on. This is a really big problem in that most of the documents I found in Google Books dated to before 1923. My only hope to get good numbers was to askGPO. So late on August 8th I shot off a query to GPO asking for statistics on the number of documents GPO has distributed both before and after 1976.
Surprisingly enough, GPO called me first thing the next morning. askGPO is notoriously slow in providing answers to queries so I was very surprised! I spoke with Nancy Faget at GPO and she was very pleasant though not exactly forthcoming with numbers. It struck me that I got the quick call back as GPO viewed my query as the first step in getting out of the program. As far as I know my director has no real intentions of doing that, but I don’t think I convinced her on that point. But aside from that she told me that GPO really didn’t know how many documents went to depositories since the beginning. Alas!!
I honestly don’t know if would be fair to take the view that probably as much was distributed from 1813 to 1976 as was published after 1976. But if you did, that would lead to believe that over one million documents have been produced.
So the bigger picture suggests that the 167,878 titles in Google Books is only about 17% of all the documents that could be digitized. At a guess…
So I put a call out to everyone in GovDoc Land. If you are a full depository and have been one since 1813 and have kept really good records, could you please send me the statistics? Thank you very kindly in advance!
Having found no published statistics for numbers of digitized books in Google Books, and especially nothing about digitized government publications, I was left with coming up with them on my own.
So I went to the Advanced Book Search screen for Google Books. Looking at the search options provided there I decided that the only way I could get reasonably useful statistics was to search for books published by GPO. As you are all aware not all government documents are actually published by GPO. Many are merely distributed by them. So I knew that my numbers would not be exact. Another problem was that over the years GPO listed themselves as publishers using a variety of abbreviations and phrases.
My first try was to use GPO in publisher and on August 8th I retrieved 141,600 hits. However just now when I ran it again, I only got 117,600. Hmmm.
Next search was for Government Printing Office, which retrieved both today and on the 8th, 43,600 hits. This was followed by gov’t, which on the 8th retrieved 2,322 titles but today only retrieved 2,258.
The grand total for using these three searches on August 8th was 187,522.
Today as I was double checking my results, I also tried gov. print. off. and got 4,420 hits. So as of this morning the grand total is 167,878. I find it rather disconcerting that the number as dropped so much in nine days!
I have a hard time trying to figure out where to begin this blog, so I have decided to start at the beginning even though I have written a bit about this in a message posted to GOVDOC-L on August 8th. So here goes…
I was asked to find out how much government information is available in the various Googles. Over the past few months I had saved posts from GOVDOC-L that had Google in the subject line; so I thought this would be an easy assignment. Turns out that the messages did not give statistics, instead they were questions about Google’s practice of making the full text of all books published after 1923 unavailable.
Well I was a bit disappointed but I still thought that I would find the information on Google’s website. I figured that Google would be tooting its own horn about the growth of this infamous project. Not so. There are no statistics anywhere, and there was very little that described the scope of the project.
Next I went into research mode. I checked for articles in EBSCO’s Academic Search Premier and Lexis-Nexis. I found some interesting news articles on the project but again no statistics. I then tried to search the web pages of a few library partners I looked at the University I had a little luck on Stanford’s web site on Robotic Book Scanning. There was a page a few statistics listed there but alas they dated from June2004.
I even Googled such keywords that I hoped would bring up statistics. But considering how many different way one might refer to statistical information, it was frustrating to do. I didn’t find any statistics this way but I did find some intersting Blog entries about the full-text copyright issue.
So I was on my own.