Michigan marks a milestone in book digitization

The librarians at the University of Michigan/Google book digitization project - called the "MBook Project" - had cause to celebrate on Friday as they digitized the millionth book in their collection, leaving just 6.5 million more to go. Michigan is one of the only institutions partnering with Google to agree to scan every one of its holdings — even those that are still covered by copyright. The MBooks project provides full text of works that are in the public domain, creating new ways for users to search and access U-M Library content (anyone can access this digital content). Materials that are currently in copyright are available for searching on-line, allowing users to assess the contents of a book before deciding whether to purchase it or borrow it from the library. Are there any govdocs in the collection? A quick search in their catalog did turn up some govdocs, beautifully digitized.  I went to the Mirlyn library catalog (very nice OPAC, btw) and did an Advanced Search using keyword Subcommittee AND Titleword =Congress AND titleword= Hearing* and searched just the format= Electronic Resource (according to their FAQ, that's the way to find the MBook content). I got 3064 hits ranging in date from 1896 to 2007 - for example, this 1935 digitized hearing before "a subcommittee of the Committee on military affairs, United States Senate, 74th Congress, on S. 1404, a bill to promote the efficiency of national defense."   We don't even have the microfiche in my law library.

So the good news is that there are digitized Congressional hearings freely available to the public in the MBook project. For some reason, though, I have found that a number of them, like this 1924 hearing, come up as "Search Only:  Page images and full text of this item are not available due to copyright restrictions."  I thought there weren't any copyright restrictions on government documents.

No votes yet

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

The issue here is that the

The issue here is that the MARC record probably does not properly indicate that this is a goverment document.  There is a form on the site that allows you to report that you are being restricted access to items you believe should be in the public domain.  Fill it out! Real humans read it, and real catalogers assess the situation and amend the records as necessary.

GPub, catalogers, and Google

To back up what Chris says: the GPub byte of the fixed field in the copy-wronged 1924 hearing is not set to "f" which would indicate a federal government publication, whereas the GPub byte in the properly identified 1935 public document is set to "f".

I would encourage anyone to use the feedback function to help identify public domain materials, and I'm glad to hear that real catalogers are reading this feedback. My previous attempts to use the feedback function on books.google.com to do exactly the same thing only appear to have fallen on deaf ears.

'Course, I'd also like to see a batch sweep for "U.S. Govt. print. off." and its variants in publisher fields flag items as public domain...

 

There really are people back there!

So I submitted feedback about the 1924 hearing - and almost immediately after that got an automated trouble ticket email - and almost immediately after that a personalized note from the folks working on the ticket! Yay humans!

Yay humans, redux!

From my morning email:

We've determined that this document is in the public domain, and it is
now freely available. Thanks for notifying us about this. Best wishes,

Perry Willett
Head, Digital Library Production Service

So now Susannaleers' original post's link to the Bureau of prohibition (Cramton bill) Hearings before a subcommittee of the Committee on the judiciary, United States Senate, sixty eighth Congress, second session, on H.R. 6641 links to a unrestricted online copy! Yay humans!

OCLC Connexion finds nearly 3000 records for the following query in their collection

li:EYM and pb=Govt. Print. Off. and pl:Washington not mt:ngp

Many if not most of the pre-1923 items have already been made fully available, but the post-1923 items appear at first glance to be more restricted...

Batch sweeps of variants

Batch sweeps of variants within the publisher/date (for instance) could be used to confirm things declared in the fixed field, but they'd never be used to make an automated rights assessment.  Too variable in content, too unreliable in presence, for a start.

Also, from above, I really do know how to spell "government."

Half of one, six dozen of the other

Chris, apologies in advance for rambling...

Although there is variation in the 260 (Publication, Distribution, Etc.) field of library records for US Government publications, we can still identify large tracts of... tracts in the public domain with a finite number of variations on GPO's name, e.g.,

pb="United States. Government Printing Office." (3248) [The one from the LC Authority file.]
pb="U.S. Govt. Print. Off." (123130)
or more clever queries like
pb:Government Printing Office and pl:Washington (18046)
pb:Govt. Print. Off. and pl:Washington (208799)
[these are OCLC Connexion queries and the number of results they yield. FYI:
pb:Govt. Print. Off. and pl:Washington not mt:ngp -> 59167 hits, nearly 60K records for things likely to be in the public domain but lacking the GPub "f" byte ("not mt:ngp" means lacking that flag). Narrow the query to yr:1924, and guess which Bureau of prohibition (Cramton bill) Hearings doc you can find :) ]

The problem with these queries is not that they're too inclusive - that they would steal the hard work of publishing companies that for some unknown reason named themselves "Govt. Print. Off." Instead the problem is that these queries only take a bite at identifying the stuff that already belongs to us all. Many government publication records simply don't identify the US GPO in the publisher field, instead naming one of thousands of federal agencies.

I don't know whether Mirlyn is indexing the fixed field, but it could :) A "GPub doesn't equal "f", combined with the CCL query
WPL = dcu AND WPU = (Govt. Print. Off.)
could probably ID a pretty healthy handful of things that need to be released to the public.

Another tangent - I imagine some of the serial records we might find would cover hundreds of volumes...

 

Thanks for this info...I'll

Thanks for this info...I'll be sure to use the feedback function. I've started to notice some documents from the 1970s are now available as full text and it must be because someone notified them. A batch sweep sure would be ideal though...

Yay humans!

I second that emotion! And I'm inspired to work on finding more.
Great job at the U. of Michigan libraries.

I see a collaborative project!

it'd be great to be proactive on UMich's govt pubs. Rather than having to submit a form when an item is found that should be accessible/in the public domain, wouldn't it be cool if UMich put up a list of all their documents (in a wiki?) and let the community/public have at it to verify "public domainness" of documents. Documents classes could assign reviewing as well.

There is precedent for this kind of collaborative project. In 2006, the federal government set up a Web site to make public a vast archive of Iraqi documents captured during the war (which was later shut down because detailed accounts of Iraq’s secret nuclear research were available publicly! oops!!). A site called LibriVox has volunteers who read chapters of public domain books, many of which have been digitized by Project Gutenberg.

The point is, let's leverage the power of the internet to help get govt information out to the public!

Post new comment

The content of this field is kept private and will not be shown publicly.
  • Lines and paragraphs break automatically.
  • Easily link to terms in various wikis. For help, see <a href="/interwiki/3">interwiki</a>.
  • Web page addresses and e-mail addresses turn into links automatically.

More information about formatting options

Syndicate content