We are posting to FGI to let you know about an ongoing project to deal with depository CD-ROMs (and other legacy formats). This post talks about recent pilot projects and a pending NSF proposal to test emulation and initiate migration activities.
As some of you may know, there was a presentation on CD-ROMs at the Fall depository library conference, chaired by Lisa Russell of GPO.
Julie Linden of Yale and Gretchen Gano of NYU talked about a migration pilot they have been working on
Kay Collins was also on the panel; her presentation on weeding CDs and
DVDs – which ends up where we all seem to end up, considering a
cooperative effort to solve the problems of this collection! – is here:
The other panelist was a computer science professor from Indiana
University, Geoffrey Brown, who is exploring “virtualization” (emulation) as a
strategy for long-term access to government documents distributed on tangible
electronic media. His presentation slides are here:
Shortly after the conference, Professor Brown invited Julie and Gretchen to collaborate on an NSF proposal to fund his emulation work. A significant aspect of the proposal is to
involve “domain users” (NSF terminology) – in this case, depository librarians – in testing the emulation environment, contributing CDs that Indiana doesnâ€™t have, contributing metadata, etc.
We will not find out whether it is funded until the middle of next year. In the meantime, we are continuing to think about models and funding for a large-scale collaborative effort to ensure long-term access to the CDs. We welcome suggestions, musings, cautions, leads – anything – from you all. We are planning to put up a web page with information about our project and will let you all know when it is live. What follows is a summary of the proposed grant activities:
Project Summary III-CXT Virtualization of
Government Information in Legacy Formats: Enabling Long-Term
Access to a Large Collection of Digital Documents
Over the past 20-years the United States Government Printing Office (GPO) has distributed important data and reports on electronic media such as floppy disk, CD-ROM, and DVD through the Federal Depository Library Program (FDLP). This digital document collection, comprised of more than 3000 items with millions of individual files, includes fundamental data on the economy, the environment, the population, the life sciences, and the physical sciences. Accessing many of these items requires installing proprietary and increasingly obsolete software. The goal of this project is to develop technologies supported by a large user community to ensure continued long- term access to these documents (the “FDLP collection”) while lowering existing barriers to their use.
This project will generate tools to evaluate the software requirements of this large document collection, and to configure and automate the delivery of individual collections through emulation (virtualization). In addition, the project will extend schemas for descriptive and preservation metadata to document both the contents and technical requirements of the collection.
This collection represents a significant preservation challenge because of its size and heterogeneity, and because the collection exists only on physical media distributed across the many depository libraries. Fortunately, a large community of depository librarians with extensive knowledge of the issues supports the collection. Success in preserving these materials requires actively engaging these users. To encourage and enable their participation, the project will utilize off-the-shelf tools to provide web accessible renditions of the collection and its associated metadata. Participation of these users will range from simply accessing a “virtual” collection, to contributing to the creation of metadata, to configuring and testing document renditions delivered through emulation.
A key technology this project explores is the use of emulation to enable access to documents within their contemporary software infrastructures. While emulation has been frequently discussed as a preservation strategy and has been used successfully for small-scale projects, it has never been explored in the context of a large and diverse document collection. Applying emulation to such a large document collection requires developing technology to automate tasks such as the evaluation of software requirements and the dynamic configuration of specialized emulation environments.
The GPO documents on electronic media are important resources at risk of becoming inaccessible due to obsolescence. Furthermore, many of these documents already present a significant access barrier because they require specialized skills and equipment for installation and because most of the roughly 1260 depository libraries have only selected subsets of the overall collection. An expected outcome of this project is a technical strategy to en- able all depository libraries access to the entire collection with significantly lower technical barriers to access. The primary outcomes of this project are technologies, strategies, and metadata to ensure the long-term access to this important document collection.
Let us know what you think…
Julie Linden –email@example.com, Gretchen Gano –firstname.lastname@example.org, and Geoffrey Brown –email@example.com