Home » post » Google begins to offer full-text scanned government documents

Our mission

Free Government Information (FGI) is a place for initiating dialogue and building consensus among the various players (libraries, government agencies, non-profit organizations, researchers, journalists, etc.) who have a stake in the preservation of and perpetual free access to government information. FGI promotes free government information through collaboration, education, advocacy and research.

Google begins to offer full-text scanned government documents

The Associated Press says that google print beta started offering the entire contents of books and government documents that aren’t entangled in a copyright today.

I did a quick search for “government printing office” and got a result of 246 books “with 294000 pages.” Of course, not all of those are government documents — Google presumably returns any book with the phrase anywhere in the full text, including bibliographies.

Google provides a page that illustrates how to interpret your search results and shows the difference between how Google treats “Library books still in copyright” and “Public domain books” and “Books submitted by a publisher.”

Interestingly, I found one books that appears to me to be a government publication: S. 1383, Children’s Protection from Violent Programming Act of 1993, (Author(s): Science, and Transportation United States. Congress. Senate. Committee on Commerce, Publisher: For sale by the U.S. G.P.O., Supt. of Docs., Congressional Sales Office, Publication Date: 1994, Pages: 133, ISBN: 0160463270) that Google seems to treat as a “book still in copyright.”

I also found a copy of the 1905 GPO edition of Constitution, Jefferson’s Manual, the Rules of the House of Representatives of the … Congress, and a Digest and Manual of the Rules of Practice of the House of Representatives of the United States. It appears you can browse this book one page at a time — no download, no ability to copy text.

CC BY-NC-SA 4.0 This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.


2 Comments

  1. Any idea where Google Print got these govdocs? It’s my impression that the contributing library gets its own digital copy that it can do with how it wishes, including providing a fully functional digital file over the Internet, assuming public domain or copyright control. I would hope if this is really the case, the owning library will soon place fully usable copies of these files on their servers.

    And if owning libraries are only getting this kind of crippled copy, then I’ll have to rethink my cheerleading for Google Print. Open Content Alliance, anyone?

    ————————————
    “And besides all that, what we need is a decentralized, distributed system of depositing electronic files to local libraries willing to host them.” — Daniel Cornwall, tipping his hat to Cato the Elder for the original quote.

  2. Daniel,

    I didn’t notice any scanning-provenance on the google library project, so, presumably, the documents could have been scanned at any of the participating libraries. I did hear on govdoc-l that the documents at the University of Michigan were being scanned, so it might have come from there.

    We don’t know much about the contracts between the libraries and google. I think that the contract with Michigan is the only one that has been made public (http://www.lib.umich.edu/mdp/) and it says that U. Michigan does get digital copies and the contract says that “U of M shall have the right to use the U of M Digital Copy….” But it also says that umich must “…restrict automated access to any portion of the U of M Digital Copy…” and prevent “third parties” from downloading copies “for commercial purposes” and from redistributing and even from “systematic downloading.” It further says that U of M must “restrict access to … those persons having a need to access such materials…” and must ensure that “substantial portions” are not “disseminated to the public at large.”

    There’s more, too. And, other libraries may have different agreements. My impression is that Umich could make files available to Umich students and faculty but not to anyone else, but I’d welcome a more definitive reading of the contract from anyone who knows.

    Evidently, Google is using a proprietary image format in the processing stream, but JPEGs on its site. Michigan (according to its FAQ on the project) is receiving TIFFs, JPEG2000 and OCR files “that conform to library community standards.” That seems wise and good news.

Leave a comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Archives

%d bloggers like this: