I had a good time yesterday on a panel about Web archiving and digital preservation at the Society of California Archivists General meeting 2013 (slides to be posted there soon). The panel was organized by Scott Reed at the Internet Archive, and included Scott, Claude Zachary (University of Southern California), myself and my Stanford colleague Henry Lowood.
One of the coolest things -- other than the fascinating keynote by Dr. Michael Cohen, who talked about "Culture Wars: Engaging Undergraduates in Documenting the Crisis in California Through the Historian's Eye Project" -- was learning about the site archiveready.com. This is a handy little tool to test your Website's archivability. Paste in your url, and it goes through and checks things like standards compliance, accessibility, CSS, site maps, external media and proprietary objects like flash or quicktime, and lastly whether or not your site is already being collected by the Internet Archive's Wayback machine. Freegovinfo did pretty well in the test with an overall rating of 78%. We lost points for having some external images and external scripts (google analytics and a facebook badge), but I don't consider those things critical to the site for the long-term. How does your site do? Are you ready to be archived?!
"The Digital-Surrogate Seal of Approval: a Consumer-oriented Standard." James A. Jacobs, University of California San Diego and James R. Jacobs, Stanford University. D-Lib Magazine, March/April 2013, Volume 19, Number 3/4. Also available in the Stanford Digital Repository and the University of California Escholarship Repository.
We propose the "Digital-Surrogate Seal of Approval" (DSSOA) as a simple way of describing digital objects created from printed books and other non-digital originals as surrogates for the analog original. The DSSOA denotes that a digitization accurately and completely replicates the content and presentation of the original. It can be used to express an intended goal during the planning stages of digitization and to guarantee the quality of existing digital surrogates. The DSSOA Criteria can be used to evaluate individual digital objects or entire completed collections. DSSOA is independent of production technologies and methodologies and focuses instead on the perspective of consumers — including libraries that rely on digital surrogates.
The Archive-it team announced today the publication of their White Paper Web Archiving Life Cycle Model. The model offers a thorough description of the entire process of Web archiving. Whether you've been Web archiving for 7 years or mulling about jumping in to the fray, this model will put you in a good headspace to do this critical work. Thanks Molly Bragg, Kristine Hanna, Lori Donovan, Graham Hukill, and Anna Peterson!
The Archive-It team is excited to publish our first white paper: The Web Archiving Life Cycle Model. With this paper we hope to share web archiving best practices and processes with organizations interested in developing and/or expanding their web archiving initiatives.
This white paper is the product of a collaboration between members of the Archive-It team as well as the larger Archive-It partner community. Several partners took part in in-depth interviews regarding their experiences using Archive-It and web archiving in general, and others helped with the design iteration phase of the model and read preliminary drafts of the paper.
The Web Archiving Life Cycle Model encompasses the following web archiving processes:
• Vision and Objectives
• Resources and Workflow
• Risk Management
• Appraisal and Selection
• Data Capture
• Storage and Organization
• Quality Assurance and Analysis
A bill in the Minnesota legislature would allow government agencies to post official notices on their web sites instead of in newspapers and would require a "permanent record" of publications to be "maintained." Included would be publication of transportation projects, proceedings, official notices, and summaries of meetings. The bill apparently does not designate who will preserve the information nor does it specify how to preserve the information except for the caveat that the records must be in "a form accessible by the public."
- H.F. No. 1286, as introduced - 88th Legislative Session (2013-2014) Posted on Mar 05, 2013.
Subd. 4. Record retention. A political subdivision that publishes notice on its Web site under this section must ensure that a permanent record of publication is maintained in a form accessible by the public.
We would, of course, like to see a bit more detail of the implementation, perhaps even including requirements for deposit of records in a Trusted Repository, provisions for discovery, access, use, and bulk download, and, ideally, a state-law-compliant deposit into libraries.
One section of the bill does specify that print copies of "documents" published on the web must be made available at all public libraries within the jurisdiction. This is not a bad requirement, but it does seem to us to be short-sighted to require deposit of paper copies and not require deposit of digital copies. Libraries could provide enhanced access and service over what the government could provide and could provide redundant digital preservation.
Subd. 5. Print copies. When a political subdivision publishes exclusively on the Web site, it must also make print copies of all published documents available at the main office of the political subdivision, any other government offices designated by the political subdivision, all public libraries within the jurisdiction, and by mail upon request.
Chris Rusbridge, retired Director of the UK Digital Curation Centre (DCC), sent an open letter to Tony Hey of Microsoft asking that they publish the specifications for older file formats. He has received a reply:
- Response to the Open Letter on obsolete Microsoft file formats, Chris Rusbridge, Unsustainable Ideas, (Nov 26, 2012).
- We do not currently have specifications for these older file formats.
- It is likely that those employees who had significant knowledge of these formats are no longer with Microsoft.
But the good new is that Microsoft is willing to work on the problem! The response from Microsoft continues:
- We can look into creating new licensing options including virtual machine images of older operating systems and old Office software images licensed for the sole purpose of rendering and/or converting legacy files.
- One approach we could consider is for Microsoft to participate in a “crowd source” project working with archivists to create a public spec of these old file formats.
Of course, this is a closing-the-barn-door-after-the-horse-is-gone solution, but such kludgy solutions are necessary when born-digital information is produced in proprietary formats rather than open formats -- and when libraries accept these formats rather than insists on preservable digital objects.
Ever heard the term "bit rot" or wondered what actually happens when electronic files go bad? The Atlas of Digital Damages is a collection of files with corrupted bits, so you can visually see what happens. The Atlas is a flickr album, or rather "a staging area for collecting visual examples of digital preservation challenges, failed renderings, encoding damage, corrupt data, and visual evidence documenting #FAILs of any stripe." So, in addition to viewing these examples, you too can contribute examples to help build the Atlas' collection. A blog post by Barbara Sierman, from the National Library of the Netherlands, first posed the question and well, folks ran with the idea and created this "crowd sourced effort" to document digital degradation. See, "Where is our atlas of digital damages?".
I discovered this nifty item while reading through the November Digital Preservation Newsletter from the Library of Congress (there's lots of great project updates and information, especially on the geospatial digital preservation front in there - so go check it out!)
and check out the LOCKSS project for digital preservation approaches and methods to prevent bit rot on a large scale.
[This post was nicely sent to us by our pal Kris Kasianovitz, International, State and Local Government Information Librarian at Stanford. If others want to send us items of interest, please send them to freegovinfo AT gmail DOT com. Thanks Kris!!]
The Colorado State Publications Library Digital Repository collects and preserves born digital publications from Colorado state agencies. Its mission is to provide Colorado residents with permanent public access to information produced by state government.
In a post to the Bestpractices mailing list today, Debbi MacLeod, the Director of the Colorado State Publications Library, says that library joined The Colorado Alliance of Research Libraries' Digital Repository (ADR) in 2008. There are now more than 9,500 documents in the ADR.
One of the benefits of this was increased exposure of the collection because ADR content is exposed to search engines. The collection went from an average of 1,000 to 10-15,000 hits per month.
Another benefit is shared preservation responsibility. MacLeod says:
[W]e no longer have to worry about a local catastrophic server failure. ADR staff keep track of the latest developments in digital preservation. They keep on top of server maintenance and periodic testing to ensure that files deposited in the system have not been corrupted. Also, there is an ongoing pilot with DuraCloud to test the pros and cons of a distributed back-up system using cloud technology.
This seems to be a successful example of building and sharing infrastructure and responsibilities in a way that leverages the strengths of cooperating organizations to accomplish more than any one could on its own.
Even better, MacLeod notes that ADR is willing to work with other states!
While The Alliance is located in Colorado, they are interested in expanding their base and having other state collections of documents or special collections join their consortium. Much of the ground work for the particulars to state documents has now been done and can be applied to other states. Robin Dean is the Director, of the Alliance Digital Repository. She can be reached at 303-759-3399 x110 or robin at coalliance.org to start a conversation.
Hot off the presses, the August 2012 Library of Congress Digital Preservation Newsletter is now available. In this issue:
- Summary of DigitalPreservation 2012
- Rescuing the Tangible from the Intangible
- From AIP to Zettabyte: Comparing Glossaries
- One Family's Digital Archiving Project
- Fighting the Battle for Fleeting Attention
- Profile of William Kilbride
- Training Digital Curators
- Upcoming Events (Designing Storage Architectures, NDIIPP at Book Festival and others)
- Meetings Roundup (Open Repositories, Preserving Online Science, Data Intensive Research)
- Resources (Digital Disaster Planning, Digital Preservation in a Box, and others)
Y'all should attend. The speaker is Abbie Grotke, who was one of our group of guest bloggers from the End of Term Archive last month. Register early. Or have a viewing party so you can share one Webinar connection with multiple people. But do it. You won't regret it!
FEDLINK invites you to the next Library of Congress Area Studies Webinar Series: "Web Archiving at the Library of Congress and Around the World," to be held on Thursday, August 30, 2012, 2:00-3:00pm ET.
Since 2000, the Library of Congress has been archiving born-digital web content documenting a variety of events and themes. The Webinar will provide an overview of the Library's web archiving program and a look at international work and collaborative efforts by libraries, archives, and other organizations.
The speaker is Abbie Grotke, the Web Archiving Team Lead in the Office of Strategic Initiatives at the Library of Congress. She came to the Library in 1997 to work on American Memory digitization projects. Since 2002 she has been involved in web archiving and the digital preservation program at the Library of Congress.
This program is FREE however there is a maximum capacity so registration is required.
The webinar will be recorded and available for later viewing if you are unable to participate.
Please register by Tuesday, August 28, 2012 at https://www.surveymonkey.com/s/GNBSWYM.
If you experience problems with the registration link - please email your name/email/library name to firstname.lastname@example.org and we will get you manually registered for the webinar.
For more information or to request ADA accommodations, please contact Dr. Anchi Hoh, Program Management Specialist, at email@example.com.
The focus of the Digital Preservation Network, which is being created by research-intensive universities, is "the complete digital scholarly record" and its goal is to ensure the long-term preservation of that record. Already, twenty-nine organizations have agreed to participate in start-up planning and committed "seed capital" to fund initial planning efforts. Its Brief Overview describes some of the thinking that is going into this effort, which sounds remarkably like the FDLP and what we at FGI have been advocating for the digital FDLP:
- The Digital Preservation Network: Overview of the Initiative (Feb 21, 2012). [PDF 3 pages]
Over time, reliance on common approaches, technologies and organizations creates risk of common points of failure in securing long-term preservation of the record.
To avoid the catastrophic loss of scholarship, we must build and sustain a diverse ecosystem that can ensure the survival of scholarship in digital form for future generations. We envision a system that is scalable, sustainable, and complementary to existing collection and preservation efforts--the Digital Preservation Network (DPN or Deepen).
DPN will create a federated approach to preservation of academic content. It will build upon the higher education community's current investments to create sufficient diversity of preservation approaches to assure access to the digital scholarly record far into the future....
The objects and metadata of the scholarly record must be replicated across:
1. Diverse software architectures
2. Diverse organizational structures
3. Diverse geographic regions
4. And in the future, diverse legal/political environments (nations)
...This diversity requires a supporting ecosystem for preservation that enables higher education to own, maintain and control the scholarly record throughout time. While commercial entities may partner with us to contribute to this effort at different points in time depending on priorities and business models, final control must reside with the academy.