Carl Malamud posed this question over on twitter: "What if our national cultural institutions all worked together on a common problem, attracted White House support?" In his post on the O'Reilly blog, "A National Scan Center: A Public Works Project", Malamud scopes out the issues and calls for Library of Congress, the Smithsonian Institution, the Government Printing Office, the National Archives and Records Administration, and the National Technical Information Service to come together and make the compelling case for funding a 5-year $500 million effort to create a National Scan Center. Here here Carl!
In the U.S., we face a similar deluge of paperwork that we faced in the 1930s. A huge backlog of paper, microfiche, audio, video, and other materials is located throughout the federal government. Little money has gone from Congress for digitization, and bureaucracies have resorted to a series of questionable private-public partnerships as a way of digitizing their materials. For example, the Government Accountability Office shipped 60 million pages of our Federal Legislative Histories (the record of each law from the initial bill through the hearings and conference reports) off to Thomson West, but didn't even get digital copies back. Another example is the recent failed effort by the Government Printing Office to digitize 60 million pages of the Federal Depository Library Program, an effort they tried to get through as a "zero dollar cost to the government" effort with the private sector.
There are no free lunches and there are no "no cost to the government" deals. The costs involve the government effort to supervise the contract, prepare the materials, and ship them, and in both the GAO and GPO cases, the government wasn't getting much back for its effort. What the government and the people usually get is a lien on the public domain, preventing the public from accessing these vital materials. Similar efforts are sprinkled throughout the government. I testified to Congress that I had learned that the National Archives was contemplating a scan of congressional hearings with LexisNexis under similar circumstances, and many may be aware of the questionable deal the Archives cut with Amazon where my favorite online superstore got de facto exclusive rights to 1,899 wonderful pieces of video.
[UPDATE: I spoke too soon. Seems that these are "early access" documents that "will be removed from this database, to be replaced by the fully edited version in the appropriate digital edition in the Rotunda American Founding Era collection."]
More than 200 years after they were written, some 5,000 previously unpublished documents of the founders of the United States — including Thomas Jefferson, John Adams and James Madison — are at long last available to the public at no cost.
The Documents Compass group of the Virginia Foundation for the Humanities at the University of Virginia has spent much of the last year proofreading and transcribing thousands of pages of letters and other papers.
The documents are now available online for free at the University of Virginia Press’ digital imprint called Rotunda...
...The online project is a federal pilot study that aims to expand public access to the papers of America’s founders. It is funded by a $250,000 grant from the National Historical Publications and Records Commission, which is a division of the National Archives.
[Thanks Resource Shelf!]
Building on our previous post about today's House hearing on digital books, it appears that Marybeth Peters, head of the US Copyright Office, is not supportive of the google book settlement. In written testimony (PDF) before the House Judiciary Committee, she wrote that the settlement...
“...inappropriately creates something similar to a compulsory license for works, unfairly alters the property interests of millions of rights-holders of out-of-print works without any Congressional oversight and has the capacity to create diplomatic stress for the United States.”
For more, see today's Wallstreet Journal blog: "Copyright Office No Fan of Google Books Settlement."
The House Judiciary Committee will hold a hearing on "Competition and Commerce in Digital Books" at 10 a.m. tomorrow, September 10. The hearing will be webcast; the link is on the committee's hearings calendar page.
I had known that the Internet Archive had submitted a response to the GPO's RFP for mass digitization. A friend just sent me the link to the proposal submitted to GPO (embedded below and here's the link to the proposal and supporting documents).
As you can probably guess, we've been pulling for the Archive to get the bid, not least of which because the Archive is a 501(c)(3) non-profit library and we've stated on more than one occasion that privatization of public domain government information is a very bad idea. But also, we've been heartened by the quality of the Archive's scans to date, their openness and willingness to be collaborative in their processes and data access and sharing. Those qualities certainly come through in their proposal for mass digitization -- not to mention the fact that they've actually made their proposal public!
While the award has not been officially announced, we really hope that the Archive wins the award. Perhaps GPO will name them as an official depository library and work with them not only on the "legacy" collection (there needs to be a better description of the deep and rich collections of depository libraries than the somewhat pejorative "legacy" :-| ) but on digital deposit of government documents going forward.
--that is all.
A post just now about recommendations for book scanners on code4lib reminded me of a comment from a Council member last week at Spring '09 DLC. The Council member said that his relatively small academic library might not have the technical or monetary means to gear up a large scale digitization project, but that he was more than willing to pitch in with small projects or one-off digitizations if there was, for example, a list of items of importance from which he could pick and choose.
I commented then and will repeat now that digitization doesn't necessarily mean a library has to purchase a high end digitization unit (aka the Scribe) from the Internet Archive for $15k -- although I *love* the work that the Open Content Alliance (OCA) is doing!
A small project could easily be done with off-the-shelf hardware and open source software (The Scribe's software is in fact freely available under a GPL license on SourceForge!). One such project that I'd recommend you look into is the Book Ripper project (bkrpr for short!). (Disclosure: my friend Karl Fogel is involved in bkrpr). They've even got instructions for building the camera mount. All the hardware is cheap and/or easy to build and the software is free and open source (they're experimenting now with OCRopus for character recognition processing). Check it out!
The National Academies (The National Academy of Sciences, National Academy of Engineering, Institute of Medicine, and National Research Council) have a long history of advising the government. Now, they have announced "the completion of the first phase of a partnership with Google to digitize the library's collection of reports from 1863 to 1997, making them available – free, searchable, and in full text – through Google Book Search. The Academies plan to have their entire collection of nearly 11,000 reports digitized by 2011."
- More Than 9,000 National Academies Reports Now Available in Open Access, press release, April 10, 2009.
Some publications of the Academies are already available through Google Book Search, but not full text. (See for example: Realizing the information future By National Research Council). The announcement does not make clear whether some of these will become available full text or not.
The FDA Notices of Judgment Collection is a digital archive of the published notices judgment for products seized under authority of the 1906 Pure Food and Drug Act. The NJs are resources in themselves but also lead users to the over 2,000 linear foot collection of evidence files used to prosecute each case. The evidence files are a rich documentary resource filled with legal correspondence, lab reports and data, photographs, and product labeling and containers. This digital library, created using the SPER system, allows for browsing the collection as well as searching the collection's metadata and full-text.
Currently only the Drugs and Devices portion of the collection is available in the digital library. As we complete work on other portions those will be released on an ongoing basis. Users are welcome to visit NLM to use the hard copies at any time.
The collection uses DSpace and provides technical information about the project: System For Preservation of Electronic Resources (SPER).
FLYP online magazine published an interview with Internet Archive's founder, Brewster Kahle, entitled "Know It All". There is a text version of the article, but the interactive multi-media verison is much more fun! Plus, it contains a nice video showing Brewster explaining the mission of Internet Archive.
Brewster Kahle wants to give you digital access to every book, film, video, song, TV show and periodical ever published. If he succeeds, the world will be a different place.
A special report from CNN.com states that Obama plans to digitize health records within the next five years. This is one of the endeavors to restore the economy as government estimates that this program will create around 212,000 jobs. However, there are some concerns about it because:
1) Commonwealth Fund, RAND, and Harvard have conducted independent studies which reveal that this program would cost between $75-100 billion dollars over the implementation period. The major cost will be incurred in traning the work force.
2) At present, only "about 8% of the nation's 5,000 hospitals and 17% of its 800,000 physicians currently use the kind of common computerized record-keeping systems that Obama envisions for the whole nation."
3) The privacy of patients must be protected as the nationalized system may be affected by system failures and hackers.
Obama asserts that this program will create new jobs, cut medical costs, and save $200-300 billion per year for the health industry.