digitization
Federal Agencies Digitization Guidelines Initiative
Submitted by blakeley on Sun, 2008-10-12 15:08.I've been reading and digesting the recently released Federal Agencies Digitization Guidelines Initiative website and the sustainable formats page, so I can discuss it (if there is time) during my presentation at next week's Depository Library Conference.
A dozen federal agencies launched an initiative to establish a common set of guidelines for digitizing historical materials. Two working groups have been established: the Still Image (books, photographs, maps, etc.) and the Audio-Visual Working Group. They have two draft documents currently up for review and comment: Tiff Image Metadata and Digital Imaging Framework. Comments are due on November 15.
I'm also loving their glossary of terms, which "has been generated to serve the participating agencies as a standardized vocabulary for their deliberations and guidelines" and it is "a work in progress" so suggestions are welcome.
- blakeley's blog
- Add new comment
- Email this blog
- 345 reads
In Case You Didn't Already Know...
Submitted by blakeley on Wed, 2008-08-20 09:40....the U.S. is not the leader in e-Government...at least according to a study released last week by the Brookings Institution. However, we do rank third, but we are "falling behind other countries in broadband access, public-sector innovation and implementation of the latest interactive tools to federal Web sites".
Two other articles I read this morning also got me thinking about where we stand as a nation with digital government information: "Old-school Recordkeeping Meets the Digital Age" and "Government Data and the Invisible Hand". The first article made me feel quite frustrated with our lack of digital preservation progress, especially after reading this quote:
"...lacking a statutory prescription for maintaining electronic records, most agencies print and file [records] as they would paper documents, according to a recent investigation by the Government Accountability Office...Under current regulations, NARA does not require agencies to maintain records in their native formats. So for now, many agencies still print e-mail messages and file the paper versions.Although the filing process is relatively easy, the practice has a major weakness: It eliminates the searchability of digital documents". (Gee, ya think?!)
Envisioning all those emails being printed by government agency employees makes me think of Google's April Fool's joke: the "Google Paper" service!
I hope the next President and his administration will take the issue of e-government and digital preservation/authentication very seriously. Obama and McCain have touched on the issue a bit, including Obama's vague vision of online government transparency:
"I want people to be able to know, today, this issue is going on...Today, President Obama talked about his proposal for $4,000 student college-tuition credits. It’s going to be going to this congressional committee, these are the key leaders in the House and Senate who are going to be deciding on the bill, here are the groups that support it, you should contact your congressman. The more that we can enlist the American people to stay involved, that’s the only way we can move an agenda forward."
The second article touches on this issue as well, and urges the next Presidential administration to "embrace the potential of Internet-enabled government transparency [by reducing] the federal role in presenting important government information to citizens". A profound statement, but read the rest of their argument as stated in the abstract:
"Today, government bodies consider their own websites to be a higher priority than technical infrastructures that open up their data for others to use. We argue that this understanding is a mistake. It would be preferable for government to understand providing reusable data, rather than providing websites, as the core of its online publishing responsibility.
Rather than struggling, as it currently does, to design sites that meet each end-user need, we argue that the executive branch should focus on creating a simple, reliable and publicly accessible infrastructure that exposes the underlying data. Private actors, either nonprofit or commercial, are better suited to deliver government information to citizens and can constantly create and reshape the tools individuals use to find and leverage public data. The best way to ensure that the government allows private parties to compete on equal terms in the provision of government data is to require that federal websites themselves use the same open systems for accessing the underlying data as they make available to the public at large".
This makes sense if you think of it from the context of all the mashups, RSS feeds, and other interactivity with web content that exists. The rest of the article makes some other interesting points and counterarguments, such as
"A government data provider can provide a digital signature alongside each data item. A third party site that presents the data can offer a copy of the signature along with the data, allowing the user to verify the authenticity of the data item, by verifying the digital signature, without needing to visit the government site directly".
Easier said than done? Is the "digital signature" they talk about the same as GPO Digital Authentication?
We are making some progress in e-Government and digital preservation of government information but we need to do better. Like Obama said, we can start by contacting our congressmen to voice our concerns and suggestions for improvement on e-Gov initiatives and digital preservation...because I don't know about you, but I sure don't want the government to use "Google Paper".
- blakeley's blog
- 4 comments
- Email this blog
- 1201 reads
Web Security Words Help Digitize Old Books
Submitted by rdavis on Tue, 2008-08-19 18:23.For anyone who missed it, this is an interesting article on the use of new technologies related to digitization:
Web Security Words Help Digitize Old Books
From: All Things Considered, August 14, 2008
- rdavis's blog
- Add new comment
- Email this blog
- 478 reads
Some exciting things have been happening at GPO in the world of digitization
Submitted by rdavis on Wed, 2008-08-13 13:55.As you have likely heard by now, we have a goal of digitizing all retrospective federal publications back to the earliest days of the Federal Government. A Request for Proposal (RFP) for Mass Digitization Opportunities has now been released via Federal Business Opportunities. Here's a link to this proposal and additional information on GPO's digitization initiatives. Proposals are due by September 19, 2008.
We are in search of a cooperative, mutually beneficial relationship with a private or public sector participants where the uncompressed, unaltered files created as a result of the conversion process are delivered to GPO at no cost to the Government. These files will serve as the digital master copies that will be preserved and used for the creation of access derivatives within GPO's Federal Digital System. In exchange, the contractor will be able to maintain a collection of files produced in the process for inclusion in their collections (e.g., search indices, book search sites). This content will be made available online, free of charge from GPO.
Also, if you haven't yet seen it, we have re-launched the Registry of U.S. Government Publication Digitization Projects, which contains records for projects that include digitized copies of publications originating from the U.S. Government.
The Registry...
- serves as a locator tool for publicly accessible collections of digitized U.S. Government publications;
- increases awareness of U.S. Government publication digitization projects that are planned, in progress, or completed;
- fosters collaboration for digitization projects;
GPO is actively soliciting all interested parties who plan to digitize federal publications within the scope of the FDLP to contribute to the
registry of digitization projects.
I am very interested in hearing what you think about GPO's direction regarding digitization and where you would like to see us go.
- rdavis's blog
- 1 comment
- Email this blog
- 1054 reads
Partnering With GPO
Submitted by rdavis on Sun, 2008-08-03 09:01.GPO recognizes that with the ever-increasing amount of electronic U.S. Government information, we need your help! Since 1997, depository libraries have worked with GPO to ensure permanent public access to electronic content and to provide services to assist other depositories and the public by becoming a GPO partner.
Our recent partnerships include:
- Government Information Online: Ask A Librarian
- Homeland Security Digital Library
- Historical Publications of the U.S. Commission on Civil Rights
Does your library have a project, resource, or service that would benefit the depository library community and the public? Consider a partnership with GPO and have a direct impact upon citizens' access and use of government information. Learn more about GPO's partnership program.
The ever-increasing amount of electronic U.S. Government information requires a team effort.
- rdavis's blog
- 3 comments
- Email this blog
- 830 reads
Digitizing History: NARA's plans for the future
Submitted by StanfordLawLibr... on Thu, 2008-05-29 15:21.(cross posted on legalresearchplus.com)
Earlier this month the National Archives and Records Administration released their Strategy for Digitizing Archival Materials for Public Access, 2007-2016. This is a follow-up to a draft policy released in September of last year.
A fair amount of the report discusses the use of partner organizations in the digitization effort. The draft relased in September was open to public comment, and NARA has posted their responses to those comments here.
(Thanks to the American Association of Law Libraries Washington Office and their monthly E-Bulletin)
The NARA/TGN contract as a bad precedent
Submitted by jajacobs on Sat, 2008-04-05 10:40.A comment (Digitization Contract expands access to public records) posted here last week to a posting (Yet another digitization contract limits free access to public records) about the NARA/TGN contract to digitize certain materials at NARA, said that the contract "does not limit access to public records" and that "This is a definite win for the public."
I want to to take the opportunity to address the arguments made in that comment and enumerate some of the problems that I see with the contract and ones like it. In brief: (as James pointed out) while contracts like this one are attractive in the short run to some people because they do provide some access that we do not now have, in the long run they are bad ideas because our short term, limited gains result in long-term net losses to free public access to public information. Even people who relish the short term gains should be concerned about the long-term net losses.
The good things about the Contract
Let me begin by noting that there are many things about this contract that are good and that reflect, I think, the fact that government officials have learned from past mistakes. Examples of the good things in the contract are: the inclusion specific technical specifications, the right of NARA to interrupt processing when necessary to provide reference service and public access to the materials, the "non-exclusive" nature of the contract, the fact that TGN must provide free online access to the Digitized Materials in all NARA locations, the fact that NARA does not transfer permanent control or ownership of the materials to TGN, and the five year limitation on TGN's sole use of (some of) the digital copies.
The bad things about the Contract
But there are, I believe, several things wrong with the contract -- things that result in a net loss to the public rather than a net gain.
- The "enhancements" provided by the contract are fee-based and therefore explicitly and implicitly limit use and impose two-tier access.
- The contract promotes access over control. For the public to have "access" to public information content without the ability to use and reuse it "enhances" with one hand while it diminishes with the other. Enhancing access at the expense of control is a net loss for the public.
- The so-called expansion of access obscures the limitations on free public access to public information that deals such as the NARA/TGN deal impose. For example,
- NARA gives TGN "the rights to and the exclusive and unlimited right to use the Digitized Materials and all metadata created for the electronic databases for five years."
- There is nothing in the contract that requires the information that TGN dispenses during the five years to be usable or reusble by the public and we must assume from the language of the contract that it certainly does not intend to grant such rights for use of public information to citizens.
- The agreement gives TGN veto over disclosure of information about the agreement itself (section 4.4 of the Agreement).
- The agreement creates a category of "confidential information" that is exempt from disclosure (Section 4.2). This includes "designs or styles, trade secrets, inventions," and even "know-how." This is an example of the government not only condoning "closed access" principles over "open access" principles, it is contractually requiring NARA to do so.
- NARA is giving TGN the right to use NARA trademarks, which will obscure the difference between TGN and NARA itself thus blurring for the public the free-public access of government information with private-company-fee-access. The contract even requires NARA to link from its own Catalog (ARC) to the TGN site, thus effectively turning NARA into an advertiser and promoter of TGN. It is not clear to me that this requirement of NARA to link to TGN will end after five years.
- It is not true, as the comment claims, that "The digitized copies of these records become freely accessible at all NARA reading rooms." Rather, the contract explicitly places limits on use of the digitized images for 5 years -- even in the reading rooms. These limitations include: "production for a fee of digital images" and, the permission to provide DVDs or CD-ROMs "for sale to the public." Even those distributions by NARA must include "license restrictions" that "will limit their use to prohibit resale, distribution or republication." (Section 1.4a [emphasis added])
- The contract does not, as the comment claims, make "the digitized copies of these records freely available to everyone after five years at no cost to the taxpayer." Indeed the wording of the contract explicitly gives NARA the right after five years "to sell" the digital content. In addition, the contract does not remove restrictions on materials digitized from microform after 5 years. (See Section 1.4b)
- The argument that any "enhancement" is good -- even if it imposes restrictions and two-tier access is often used by the private sector as a rationalization for privatization of government information. The battles over privatization of public information have a long history and, with the shift to digital information, we face new battles. I believe that the push for privatization -- particularly because of the costs involved in digitization -- means that we should be more cautious, not less cautious or cavalier, about promoting, facilitating, or encouraging contractual arrangements such as the NARA/TGN deal that grant special rights to the private sector or blur the difference between the private and public sectors.
- Contracts such as this one set a precedent for creating two-tier or fee-only access to public information. When we allow the government to make excuses for failing to provide free public access by claiming that we have no choice and that this is better than nothing, we lower the bar for the next contract -- and the next.
It is a bigger problem than this one contract
We at FGI have no argument against the private sector repackaging and adding value to public information -- as long as the information itself is freely available to everyone to use and re-use. When everyone has access to the raw content, then we will all be able to repackage and add value to public information, we will all have free access and the ability to "enhance access."
But when any contractual agreement or system (private-sector or governmental) locks the raw information away from citizens or charges a fee for that information, then such systems and contracts, by definition, rest control of the information from the public and consolidate that control in a government agency or private sector company.
This problem of control exists not just with contracts such as the NARA/TGN contract. It also exists for information such as the Congressional Record and the Federal Register (which are "free" one-page-at-a-time, but cost thousands of dollars a year for a subscription; see http://bookstore.gpo.gov/collections/eproducts.jsp). It exists for Congressional Research Reports, which the government does not make available to the public except for those that leak out of government control or that private vendors provide for a fee (see http://opencrs.com/ and Inexplicable anomaly By Leslie Harris and Matt Stoller).
I am sure that some will argue that it is still possible (because of the non-exclusive nature of the contract) for the government or someone else to re-digitize these materials and make them freely available in the future. But that argument is the opposite of the argument for negotiating this contract in the first place. If we have to have a contract like this now, if this is the best we can do, if the government cannot afford to digitize these materials today, why should we assume that this will change in the future if those materials are already digitized? The practical result of contracts like this is that they will make it harder, not easier for these materials to ever become freely available to the public.
In summary, this is a big problem, not just a problem of this one contract. We are grasping short-term, good-enough expediency at the expense of long-term free public access. As citizens and librarians, we should not lower our standards for free public access to public information by accepting less than full, free, public access.
- jajacobs's blog
- 4 comments
- Email this blog
- 1472 reads
Virtual Vietnam Veterans Memorial Wall
Submitted by blakeley on Thu, 2008-03-27 10:24.If you wish to pay your respects but cannot travel to the Vietnam Veterans Memorial Wall in D.C., can now do so from your computer. NARA and Footnote.com have released a searchable digital replica of the Memorial Wall.
The site also allows you to "leave a tribute, a story or photograph about any of the 58,256 veterans killed or missing in the Vietnam War".
Word of warning, the site claims that due to recent high traffic, you might experience slow loading pages or images. They are working to improve this.
- blakeley's blog
- Add new comment
- Email this blog
- 940 reads
Yet another digitization contract limits free access to public records
Submitted by jajacobs on Tue, 2008-03-25 09:33.The National Archives and Records Administration (NARA) has announced a draft "non-exclusive agreement with The Generations Network, Inc. (TGN) to digitize and further expand public access to archival holdings in NARA's custody." The contract restricts free public access for five years.
- DRAFT PARTNERSHIP AGREEMENT AVAILABLE FOR PUBLIC COMMENT
Posted for comment: March 10, 2008
Comments due: April 9, 2008
Send comments to: Vision@nara.gov or by fax to 301-837-0319
The contract specifies that NARA will receive digital copies of all holdings that are digitized as part of this agreement and "[a]s with all of NARA's digitization agreements, there will be no charge for researchers at any time to access the digital copies in any of NARA's research rooms" and that users will have "the opportunity to purchase copies of the documents in digital format" [emphasis added].
As the NARA announcement notes, projects like this "will enable the public to have electronic access to textual and microfilm records sooner than NARA itself can provide." Once again, lack of funding for public accessibility ends in a two-tier access: free if you get to a reading room, fee if you want to use the Web.
The proposed agreement puts restrictions on redistribution of the digitized records by NARA for five years.
For a period of five years following the donation [TGN "donated" copy of the Digitized Materials to NARA], NARA will not sell, make available for downloading, or otherwise provide in electronic form, the entire contents of the Digitized Materials or a major file segment thereof. During this five year period NARA's use of the Digitized Materials will be limited to (i) access by staff and researchers at NARA locations; (ii) production for a fee of digital images of a microform publication or a portion of a series of original records, with a minimum complement of metadata to enable the purchaser to describe, identify, locate, retrieve, and manage the images; 2 (iii) display of sample images on NARA's website or elsewhere to promote awareness of NARA's services and activities or for noncommercial educational purposes, and (iv) to reproduce portions of the Digitized Images on offline storage devices that are not accessible via Internet such as DVDs or CD-ROMs, with metadata created by NARA only, for sale to the public at rates established by NARA. In the case of (ii) and (iv) above, license restrictions on the materials as issued by NARA will limit their use to prohibit resale, distribution or republication of the Digitized Material in any format or media by the original customer or successive owner of the media.
After five years from the date TGN donates Digitized Materials made from original records, NARA will have full and unrestricted rights to use them, including the right to sell, make available for downloading, or otherwise provide in electronic form, the entire contents of the Digitized Materials or segments of them. [emphasis added]
- jajacobs's blog
- 3 comments
- Email this blog
- 1438 reads
Creating Gov Doc "Libraries" in Google Books
Submitted by blakeley on Tue, 2008-03-04 23:49.Digitized Government Documents in Google Books has been written about quite a lot over here at the FGI and I'd like to revisit this topic again but with a different focus.
I was searching for Civil War era government documents for a History Professor, and I realized that we did not own one of the documents he sought. Before suggesting that he interlibrary loan a copy of this document, I decided to search online for a full-text digitized version. Alas, it did not exist in the digital realm, but I did find some other digitized gov docs pertaining to his research needs in Google Books. We were both elated, he because I had found what he needed, and I because so many documents I found digitized on Google Books were the same documents we had lost to mold and water damage from Hurricane Rita!
Out of curiosity, I did a Google Book search for other types of government publications and found these gems:
Trial of the Conspirators, for the Assassination of President Lincoln
Illustrations of the Gross Morbid Anatomy of theBrain in the Insane (isn't that a Cypress Hill song? Nevermind...) by the Government Hospital for the Insane.
How it Feels to be the Husband of a Suffragette (not published by the Government Printing Office, but it is a book housed in the National American Woman Suffrage Association Collection in the Library of Congress).
Official Records of the Union and Confederate Navies in the War of the Rebellion
Most of these documents were scanned at large research universities or depositories, but the quality is not always decent andcan sometimes border on the illegible. I was quite amused when I discovered a staff person's hand digitized on this document's cover:

However, there are bigger snafus than a digitized librarian's hand. For example, despite government documents being in the public domain, Google Books treats most post-1922 (i.e. post-copyright law) government documents as copyrighted material by only allowing a limited view! For more details, please read James Jacobs' post on this issue.
Despite all these issues (which have yet to be resolved), I decided to take advantage of the access to full-text, pre-1922 government documents and create a McNeese Gov Docs "Library"account in Google Books for my depository. The account also allows you to subscribe to updates of its holdings via an RSS feed. I put a link to the library account and the RSS feed on my depository's homepage and our "Gov Guides" wiki. I'll add more of these interesting and old documents as I come across them, especially those pertaining to Louisiana or documents that were lost to Hurricane Rita.
Here are some tips for finding gov docs in Google Books: Use Advanced Search, and in the Publisher field, type in Govt OR GPO OR "Government Printing Office". You can also search by agency, (i.e. "Department of the Interior") by typing the name of the agency in the Author field.
Have fun exploring and building your own digital collections, but please let me know if you find some really cool gov docs, ok?
- blakeley's blog
- 8 comments
- Email this blog
- 1524 reads
Book scanning projects
Submitted by jajacobs on Thu, 2008-01-03 10:07.Two recent items that look more closely at book scanning projects:
- The Race to the Shelf Continues The Open Content Alliance and Amazon.com by Beth Ashmore, Cataloging Librarian, Samford University & Jill E. Grogg, Electronic Resources Librarian, University of Alabama Libraries. Searcher, Vol. 16 No. 1 — January 2008.
"So, why would a librarian choose to go with the OCA over the other partners currently available? Two words: open access. If the goal is to support open access principles and to get scanned copies of out of copyright books indexed in as many search engines as possible, then OCA is the right choice. With minimum requirements such as attribution and maximum requirements of no re-hosting, OCA leaves the greatest number of opportunities for users to discover and re-use the text that they find, making it ideal for those interested in data mining."
- Google Book Search: Document Understanding on a Massive Scale, by Luc Vincent, Google, Inc.
Lorcan Dempsey notes of the Google paper, "The paper outlines presentation options based on copyright status and also discusses how Google supports the document understanding community through the release of software and data sets. I was interested that there was no discussion of social features."
- jajacobs's blog
- Add new comment
- Email this blog
- 783 reads
Boston Public Library Digitization Project
Submitted by jajacobs on Thu, 2007-12-27 09:50.Carl Malamud says his motivation is to make the workings of the government more accessible at no cost and that "This is society's operating system."
- Documents of Library in Boston to Go on Web. By John Markoff, New York Times (December 27, 2007).
A digital library partnership, including two nonprofit organizations and the Boston Public Library, is preparing to begin making digital copies of the library's paper-based government documents collection, which will then be made available on the Internet.
The project, which will take two years and require the hand scanning of millions of pages of government hearings and related publications, will cost an estimated $6 million, according to the project's sponsors.
Boston Public Library librarians said they planned to begin by digitizing the House Committee on Un-American Activities hearings from the 1950s, which is regularly sought after by its patrons.
The project is being undertaken by Public.Resource.Org, a nonprofit group seeking to open public access to government records, and the Internet Archive, a San Francisco-based digital library.
The project is the brainchild of founders of the two organizations, Carl Malamud and Brewster Kahle...
- jajacobs's blog
- Add new comment
- Email this blog
- 956 reads
the really modern library
Submitted by jajacobs on Fri, 2007-10-19 09:22.The folks at the The Institute For The Future Of The Book and the Digital Library Federation are having a series of brainstorming meetings to discuss what they call "the really modern library." Read more at the Institute's blog:
- the really modern library the institute for the future of the book, October 8, 2007.
The goal of this project is to shed light on the big questions about future accessibility and usability of analog culture in a digital, networked world.
- jajacobs's blog
- Add new comment
- Email this blog
- 914 reads
CLIR Seeks Public Comment on White Paper
Submitted by jajacobs on Sun, 2007-09-16 08:10.Preservation in the Age of Large-Scale Digitization By Oya Rieger
CLIR Seeks Public Comment on White Paper: Preservation in the Age of Large-Scale Digitization
The Council on Library and Information Resources (CLIR) seeks public comment on a white paper examining preservation issues relevant to large-scale digitization projects such as those being done by Google, Microsoft, and the Open Content Alliance. The paper, Preservation in the Age of Large-Scale Digitization, was written by Oya Rieger, Interim Assistant University Librarian for Digital Library and Information Technologies at Cornell University Library. It is available at http://www.clir.org/activities/details/mdpres.html.
The paper identifies issues that will influence the availability and usability, over time, of the digital books being created by large-scale digitizing projects, and considers the relationship of these new resources to our print collections. It concludes with a set of recommendations for rethinking a preservation strategy.
In issuing this paper, CLIR aims to stimulate discussion among stakeholders and to generate productive thinking about collaborative approaches to enduring access. To this end, CLIR invites those who submit comments to indicate whether they would like their comments posted publicly on our Web site. CLIR will make public only those comments accompanied by permission to post (let us know if the comments are to be anonymous or signed), and all such comments will be moderated. Comments received without permission to post will be shared only with CLIR staff and the author.
Public comment is sought through Friday, October 5. Please address comments to Kathlin Smith (ksmith@clir.org). CLIR will issue a final print and electronic report later this fall.
- jajacobs's blog
- Add new comment
- Email this blog
- 1265 reads
NASA digitization of photos and videos
Submitted by jajacobs on Mon, 2007-08-27 18:00.NASA photo, video collection to be digitized, Digital Silence (August 27, 2007)
NASA and Internet Archive of San Francisco are partnering to scan, archive and manage the agency's vast collection of photographs, historic film and video. The imagery will be available through the Internet and free to the public, historians, scholars, students, and researchers.
NASA selected Internet Archive, a nonprofit organization, as a partner for digitizing and distributing agency imagery through a competitive process. The two organizations are teaming through a non-exclusive Space Act agreement to help NASA consolidate and digitize its imagery archives using no NASA funds.
"We're dedicated to making all human knowledge available in the digital realm," said Brewster Kahle, digital librarian and founder of Internet Archive.
More interesting details at the above link.
- jajacobs's blog
- 1 comment
- Email this blog
- 1255 reads



Recent comments
16 hours 10 min ago
19 hours 38 min ago
1 day 21 hours ago
1 day 22 hours ago
1 week 1 day ago
1 week 1 day ago
1 week 1 day ago
1 week 3 days ago
1 week 4 days ago
2 weeks 1 day ago