digitization

International Amateur Scanning League (IASL) to the rescue!

Carl Malamud announced yesterday the inaugural meeting of the International Amateur Scanning League (IASL) (I'm already imagining cool swag!). Malamud is taking FedFlix program to the streets! Fedflix, a joint venture between the National Technical Information Service and Public.Resource.Org, digitizes NTIS video and makes them available on YouTube, the Internet Archive, and the public.resource.org Stock Footage Library.

Well now a gang of volunteers including members of DC CopyNight and Smithsonian employees working on their own time are going to the National Archives and Records Administration (NARA) and copying over 1,500 DVDs to be uploaded to the net.

Malamud said:

What makes this grassroots digitization effort so remarkable is that it has the full support of the government. Indeed, David Ferriero, the U.S. Archivist, joined me in the initial meeting where we taught volunteers how to rip DVDs!

Kudos to Malamud and the IASL!

And this makes me think that more libraries and librarians should be doing the same thing for govt documents. Why not set up your own scanning operations in your depository library (Book Liberator or DIY Book Scanner can show you how to digitize on the cheap!) and then deposit those scans into the Internet Archive's US Documents Collection (don't forget to follow FDLP digitization standards!). Scans could also be ingested into FDSys (when they've got that capability working ;-)). So get to it; what are you waiting for?!

Malamud calls for a national scan center public works project

Carl Malamud posed this question over on twitter: "What if our national cultural institutions all worked together on a common problem, attracted White House support?" In his post on the O'Reilly blog, "A National Scan Center: A Public Works Project", Malamud scopes out the issues and calls for Library of Congress, the Smithsonian Institution, the Government Printing Office, the National Archives and Records Administration, and the National Technical Information Service to come together and make the compelling case for funding a 5-year $500 million effort to create a National Scan Center. Here here Carl!

In the U.S., we face a similar deluge of paperwork that we faced in the 1930s. A huge backlog of paper, microfiche, audio, video, and other materials is located throughout the federal government. Little money has gone from Congress for digitization, and bureaucracies have resorted to a series of questionable private-public partnerships as a way of digitizing their materials. For example, the Government Accountability Office shipped 60 million pages of our Federal Legislative Histories (the record of each law from the initial bill through the hearings and conference reports) off to Thomson West, but didn't even get digital copies back. Another example is the recent failed effort by the Government Printing Office to digitize 60 million pages of the Federal Depository Library Program, an effort they tried to get through as a "zero dollar cost to the government" effort with the private sector.

There are no free lunches and there are no "no cost to the government" deals. The costs involve the government effort to supervise the contract, prepare the materials, and ship them, and in both the GAO and GPO cases, the government wasn't getting much back for its effort. What the government and the people usually get is a lien on the public domain, preventing the public from accessing these vital materials. Similar efforts are sprinkled throughout the government. I testified to Congress that I had learned that the National Archives was contemplating a scan of congressional hearings with LexisNexis under similar circumstances, and many may be aware of the questionable deal the Archives cut with Amazon where my favorite online superstore got de facto exclusive rights to 1,899 wonderful pieces of video.

UVA puts founding fathers' papers online! (temporarily :-| )

[UPDATE: I spoke too soon. Seems that these are "early access" documents that "will be removed from this database, to be replaced by the fully edited version in the appropriate digital edition in the Rotunda American Founding Era collection."]

Nice work U of Virginia! You can access the papers here. I hope this makes it into the FDLP digitization registry.

More than 200 years after they were written, some 5,000 previously unpublished documents of the founders of the United States — including Thomas Jefferson, John Adams and James Madison — are at long last available to the public at no cost.

The Documents Compass group of the Virginia Foundation for the Humanities at the University of Virginia has spent much of the last year proofreading and transcribing thousands of pages of letters and other papers.

The documents are now available online for free at the University of Virginia Press’ digital imprint called Rotunda...

...The online project is a federal pilot study that aims to expand public access to the papers of America’s founders. It is funded by a $250,000 grant from the National Historical Publications and Records Commission, which is a division of the National Archives.

[Thanks Resource Shelf!]

US Copyright office no fan of google book settlement

Building on our previous post about today's House hearing on digital books, it appears that Marybeth Peters, head of the US Copyright Office, is not supportive of the google book settlement. In written testimony (PDF) before the House Judiciary Committee, she wrote that the settlement...

“...inappropriately creates something similar to a compulsory license for works, unfairly alters the property interests of millions of rights-holders of out-of-print works without any Congressional oversight and has the capacity to create diplomatic stress for the United States.”

For more, see today's Wallstreet Journal blog: "Copyright Office No Fan of Google Books Settlement."

September 10: House Hearing on Digital Books

The House Judiciary Committee will hold a hearing on "Competition and Commerce in Digital Books" at 10 a.m. tomorrow, September 10. The hearing will be webcast; the link is on the committee's hearings calendar page.

ALA has some information available on its site. See:
Library Associations submit testimony... and the ALA Google Book Settlement page.

Internet Archive proposal for mass digitization

I had known that the Internet Archive had submitted a response to the GPO's RFP for mass digitization. A friend just sent me the link to the proposal submitted to GPO (embedded below and here's the link to the proposal and supporting documents).

As you can probably guess, we've been pulling for the Archive to get the bid, not least of which because the Archive is a 501(c)(3) non-profit library and we've stated on more than one occasion that privatization of public domain government information is a very bad idea. But also, we've been heartened by the quality of the Archive's scans to date, their openness and willingness to be collaborative in their processes and data access and sharing. Those qualities certainly come through in their proposal for mass digitization -- not to mention the fact that they've actually made their proposal public!

While the award has not been officially announced, we really hope that the Archive wins the award. Perhaps GPO will name them as an official depository library and work with them not only on the "legacy" collection (there needs to be a better description of the deep and rich collections of depository libraries than the somewhat pejorative "legacy" :-| ) but on digital deposit of government documents going forward.

--that is all.


Book Ripper (bkrpr) could facilitate small digitization projects

A post just now about recommendations for book scanners on code4lib reminded me of a comment from a Council member last week at Spring '09 DLC. The Council member said that his relatively small academic library might not have the technical or monetary means to gear up a large scale digitization project, but that he was more than willing to pitch in with small projects or one-off digitizations if there was, for example, a list of items of importance from which he could pick and choose.

I commented then and will repeat now that digitization doesn't necessarily mean a library has to purchase a high end digitization unit (aka the Scribe) from the Internet Archive for $15k -- although I *love* the work that the Open Content Alliance (OCA) is doing!

A small project could easily be done with off-the-shelf hardware and open source software (The Scribe's software is in fact freely available under a GPL license on SourceForge!). One such project that I'd recommend you look into is the Book Ripper project (bkrpr for short!). (Disclosure: my friend Karl Fogel is involved in bkrpr). They've even got instructions for building the camera mount. All the hardware is cheap and/or easy to build and the software is free and open source (they're experimenting now with OCRopus for character recognition processing). Check it out!

National Academies Reports (1863 to 1997) Now Available in Open Access

The National Academies (The National Academy of Sciences, National Academy of Engineering, Institute of Medicine, and National Research Council) have a long history of advising the government. Now, they have announced "the completion of the first phase of a partnership with Google to digitize the library's collection of reports from 1863 to 1997, making them available – free, searchable, and in full text – through Google Book Search. The Academies plan to have their entire collection of nearly 11,000 reports digitized by 2011."

Some publications of the Academies are already available through Google Book Search, but not full text. (See for example: Realizing the information future By National Research Council). The announcement does not make clear whether some of these will become available full text or not.

FDA 1906-1963 Documents Digitized

The Food and Drug Administration (FDA) has announced the availability of the FDA Notices of Judgment Collection, 1906-1963.

The FDA Notices of Judgment Collection is a digital archive of the published notices judgment for products seized under authority of the 1906 Pure Food and Drug Act. The NJs are resources in themselves but also lead users to the over 2,000 linear foot collection of evidence files used to prosecute each case. The evidence files are a rich documentary resource filled with legal correspondence, lab reports and data, photographs, and product labeling and containers. This digital library, created using the SPER system, allows for browsing the collection as well as searching the collection's metadata and full-text.

Currently only the Drugs and Devices portion of the collection is available in the digital library. As we complete work on other portions those will be released on an ongoing basis. Users are welcome to visit NLM to use the hard copies at any time.

The collection uses DSpace and provides technical information about the project: System For Preservation of Electronic Resources (SPER).

Interview with Internet Archive Founder

FLYP online magazine published an interview with Internet Archive's founder, Brewster Kahle, entitled "Know It All". There is a text version of the article, but the interactive multi-media verison is much more fun! Plus, it contains a nice video showing Brewster explaining the mission of Internet Archive.

Brewster Kahle wants to give you digital access to every book, film, video, song, TV show and periodical ever published. If he succeeds, the world will be a different place.

Obama Plans to Digitize Health Records

A special report from CNN.com states that Obama plans to digitize health records within the next five years. This is one of the endeavors to restore the economy as government estimates that this program will create around 212,000 jobs. However, there are some concerns about it because:

1) Commonwealth Fund, RAND, and Harvard have conducted independent studies which reveal that this program would cost between $75-100 billion dollars over the implementation period. The major cost will be incurred in traning the work force.

2) At present, only "about 8% of the nation's 5,000 hospitals and 17% of its 800,000 physicians currently use the kind of common computerized record-keeping systems that Obama envisions for the whole nation."

3) The privacy of patients must be protected as the nationalized system may be affected by system failures and hackers.

Obama asserts that this program will create new jobs, cut medical costs, and save $200-300 billion per year for the health industry.

Federal Agencies Digitization Guidelines Initiative

I've been reading and digesting the recently released Federal Agencies Digitization Guidelines Initiative website and the sustainable formats page, so I can discuss it (if there is time) during my presentation at next week's Depository Library Conference.

A dozen federal agencies launched an initiative to establish a common set of guidelines for digitizing historical materials. Two working groups have been established: the Still Image (books, photographs, maps, etc.) and the Audio-Visual Working Group. They have two draft documents currently up for review and comment: Tiff Image Metadata and Digital Imaging Framework. Comments are due on November 15.

I'm also loving their glossary of terms, which "has been generated to serve the participating agencies as a standardized vocabulary for their deliberations and guidelines" and it is "a work in progress" so suggestions are welcome.

In Case You Didn't Already Know...

...the U.S. is not the leader in e-Government...at least according to a study released last week by the Brookings Institution. However, we do rank third, but we are "falling behind other countries in broadband access, public-sector innovation and implementation of the latest interactive tools to federal Web sites".

Two other articles I read this morning also got me thinking about where we stand as a nation with digital government information: "Old-school Recordkeeping Meets the Digital Age" and "Government Data and the Invisible Hand". The first article made me feel quite frustrated with our lack of digital preservation progress, especially after reading this quote:

"...lacking a statutory prescription for maintaining electronic records, most agencies print and file [records] as they would paper documents, according to a recent investigation by the Government Accountability Office...Under current regulations, NARA does not require agencies to maintain records in their native formats. So for now, many agencies still print e-mail messages and file the paper versions.Although the filing process is relatively easy, the practice has a major weakness: It eliminates the searchability of digital documents". (Gee, ya think?!)

Envisioning all those emails being printed by government agency employees makes me think of Google's April Fool's joke: the "Google Paper" service!

I hope the next President and his administration will take the issue of e-government and digital preservation/authentication very seriously. Obama and McCain have touched on the issue a bit, including Obama's vague vision of online government transparency:

"I want people to be able to know, today, this issue is going on...Today, President Obama talked about his proposal for $4,000 student college-tuition credits. It’s going to be going to this congressional committee, these are the key leaders in the House and Senate who are going to be deciding on the bill, here are the groups that support it, you should contact your congressman. The more that we can enlist the American people to stay involved, that’s the only way we can move an agenda forward."

The second article touches on this issue as well, and urges the next Presidential administration to "embrace the potential of Internet-enabled government transparency [by reducing] the federal role in presenting important government information to citizens". A profound statement, but read the rest of their argument as stated in the abstract:

"Today, government bodies consider their own websites to be a higher priority than technical infrastructures that open up their data for others to use. We argue that this understanding is a mistake. It would be preferable for government to understand providing reusable data, rather than providing websites, as the core of its online publishing responsibility.

Rather than struggling, as it currently does, to design sites that meet each end-user need, we argue that the executive branch should focus on creating a simple, reliable and publicly accessible infrastructure that exposes the underlying data. Private actors, either nonprofit or commercial, are better suited to deliver government information to citizens and can constantly create and reshape the tools individuals use to find and leverage public data. The best way to ensure that the government allows private parties to compete on equal terms in the provision of government data is to require that federal websites themselves use the same open systems for accessing the underlying data as they make available to the public at large".

This makes sense if you think of it from the context of all the mashups, RSS feeds, and other interactivity with web content that exists. The rest of the article makes some other interesting points and counterarguments, such as

"A government data provider can provide a digital signature alongside each data item. A third party site that presents the data can offer a copy of the signature along with the data, allowing the user to verify the authenticity of the data item, by verifying the digital signature, without needing to visit the government site directly".

Easier said than done? Is the "digital signature" they talk about the same as GPO Digital Authentication?

We are making some progress in e-Government and digital preservation of government information but we need to do better. Like Obama said, we can start by contacting our congressmen to voice our concerns and suggestions for improvement on e-Gov initiatives and digital preservation...because I don't know about you, but I sure don't want the government to use "Google Paper".

Web Security Words Help Digitize Old Books

For anyone who missed it, this is an interesting article on the use of new technologies related to digitization:

Web Security Words Help Digitize Old Books
From: All Things Considered, August 14, 2008

Some exciting things have been happening at GPO in the world of digitization

As you have likely heard by now, we have a goal of digitizing all retrospective federal publications back to the earliest days of the Federal Government. A Request for Proposal (RFP) for Mass Digitization Opportunities has now been released via Federal Business Opportunities. Here's a link to this proposal and additional information on GPO's digitization initiatives. Proposals are due by September 19, 2008.

We are in search of a cooperative, mutually beneficial relationship with a private or public sector participants where the uncompressed, unaltered files created as a result of the conversion process are delivered to GPO at no cost to the Government. These files will serve as the digital master copies that will be preserved and used for the creation of access derivatives within GPO's Federal Digital System. In exchange, the contractor will be able to maintain a collection of files produced in the process for inclusion in their collections (e.g., search indices, book search sites). This content will be made available online, free of charge from GPO.

Also, if you haven't yet seen it, we have re-launched the Registry of U.S. Government Publication Digitization Projects, which contains records for projects that include digitized copies of publications originating from the U.S. Government.

The Registry...

  • serves as a locator tool for publicly accessible collections of digitized U.S. Government publications;
  • increases awareness of U.S. Government publication digitization projects that are planned, in progress, or completed;
  • fosters collaboration for digitization projects;

GPO is actively soliciting all interested parties who plan to digitize federal publications within the scope of the FDLP to contribute to the
registry of digitization projects.

I am very interested in hearing what you think about GPO's direction regarding digitization and where you would like to see us go.

Syndicate content Syndicate content