Note: While these remarks are based on the outline I used to prepare my digital preservation talk at the Nevada Library Association, they are not identical to my talk. I work from talking points instead of a script and as a result my remarks will vary each time I give a talk using the same outline.
In 1999, the National Endowment for the Humanities (NEH) established an ambitious web-based family history project called “My History is America’s History.” “My History” aimed to assemble a history of America through the eyes of its citizens from all walks of life. At its height, the site contained hundreds of contributed stories and an extensive guide to collecting and preserving family history.
Without warning or explanation, NEH shut down the site in 2002. To this day, no one knows why. To some people at the time, this wasn’t a problem. They knew about the Internet Archive, a private project that aimed to collect and preserve the entire Internet. Someone else had obviously taken care of the problem. If only. The well-intentioned Internet Archive spider has limitations. One of these is that it cannot retrieve content behind a search box. So, when you go to an Internet Archive page for “My History is America’s History”, you get the FOUR family stories that happened to be featured on the front page. Nearly everything else is lost. This is part of the price we pay when we expect “someone else” to solve a problem for us.
It’s not just family history that is fading away. According to the MIT Technology Review article The Fading Memory of the State, millions of government records and even bits of military history have either disappeared or are on the verge of disappearing. I can say with confidence that we know more about the inner workings of our government in 1945 than future researchers will ever know about the inner workings of government in 2005. And that’s assuming total access to intact materials.
In today’s fifteen minute discussion of Digital Preservation, I wish to address the unique challenges represented by digital materials, cover a few proposed solutions, and end with a few actions you can take today to keep the past from fading away.
While preservation is a concern regardless of format, digital materials offer unique challenges that librarians and other stakeholders must be aware of.
Most importantly, no one knows what will work. We know how to preserve paper and have made that last centuries. We’re pretty sure how to preserve microforms and some has already survived decades. With the right conditions, we think we can make microforms last centuries too. And as long as humans still have magnifying glasses and light sources, microforms can be read. For both formats, there are established best practices. Not so with digital materials. There are some promising approaches, but no one is claiming they know what will work. This is part of the reason that we at Free Government Information are so insistent on the library community retaining multiple copies of electronic documents. With many documents, many approaches can be tried, and it is likely that at least one approach will work. If we only have one or even ten copies, it seems likely that all will fail.
Another unique challenge is media fragility. Paper can last a century or so without special treatment if you’re not in the tropics. By contrast, most magnetic media (tapes, floppy disks, hard drives) begin to have major problems after ten years. Optical media (CDs, DVDs) may last 50 years or more IF kept under PERFECT conditions. Keeping optical media under less than ideal conditions may shorten their life to less than a decade.
A worse problem than fragile digital media is standards obsolescence. Who here remembers WordStar? Who still has Wang word processors? Popular software such as Microsoft Office changes quickly and older versions often get left behind. Unless there is a market for it. Some formats endure longer than others. ASCII text has been around for decades and HTML is starting its second decade. But these formats don’t convey rich information like graphics or fancy formatting.
Related to the problem of standards obsolescence is that of closed, proprietary standards. A proprietary standard is one that is owned by a private party and subject to licensing. A closed standard is one where the full file specification is not published. An example of a closed proprietary standard is Microsoft Word. Microsoft does not share the full standard with others. This leads to other word processors (like WordPerfect) to showing a Word document in a different format (for me, pink strike through) than the document showed in word. Closed and proprietary are NOT the same thing. The Adobe PDF and MP3 are two files that are open and proprietary. This means that the full file specification has been published, but the owning company decides who can write software that creates or modifies the files. What does this mean to digital preservation? It means that if you are writing a program to convert older files to a newer format, you must get a license from the owning company, and if the standard is closed, your program will fail to fully reproduce the look and feel of the original document. That’s assuming you can find the owning company. Closed proprietary standards of bankrupt companies may be permanently unreadable.
Another problem unique to digital materials is a technology called digital rights management or DRM. DRM are technology imposed restrictions on a file’s ability to be printed, copied, or even viewed. DRM software can be configured to automatically delete files after a period determined by the publisher. DRM is not copyright aware, so it knows nothing of fair use. It has no timer to put it in the public domain. DRM technology can be used to “protect” public domain materials. A file that is locked down through DRM cannot be successfully migrated to a new format because the DRM software will interpret that as an effort to make an “illegal” copy even though that is a right under copyright law.
A final unique challenge presented to us by digital materials is that of access versus ownership. Everyone in the world in 2000 had access to “My History is America’s History” In a growing number of states, including Alaska, citizens have access to a large collection of electronic journals. This is great and its a fact that people have more access now than any other time in history — but what happens if the vendor goes under or a state can’t pay its database bill? People are left with nothing. What if the vendor is the government and takes down documents that are inconvenient to the current administration? Citizens lose access. You cannot preserve what you do not possess.
II. Possible Solutions
So what can we do with digital documents that we’ve acquired? If the file is sufficiently “document-like”, you can make a tangible backup (i.e. Print it out or have it microfilmed). This is what we do in Alaska for “born-digital” publications. We print them off and put them into a non-circulating collection for preservation. The Census Bureau has taken the tangible backup approach as well. They are required to make individual household questionnaires available to the public after 72 years. So how do they intend to fulfill their responsibility to make the individual questionnaires available to the public in 2072? The Bureau is microfilming them. They’ve also created a digital file that they HOPE will be readable in 2072, but they KNOW the microfilm will be there for future genealogists. A number of large corporations write their important digital files to microfilm using Computer Output Microfilm (COM) technology.
The “tangible backup” approach has its limitations. It can only be applied to digital objects that have an analog equivalent. Thus it cannot be used for databases and multimedia like movies and audio recordings. Even for files with an analog equivalent, a tangible backup loses functionality such as in-document searching.
The most basic thing that can be done for digital files is refreshing, which is copying files onto a new medium. This will guard against media failure, but does nothing to protect you from standards obsolescence
Migration is one of the promising approaches to digital preservation. Migrating a file means moving it from one format to another (i.e. WordPerfect to Word, or Word 95 to Word 2002). For maximum success, it should be done as soon as a new format looks like it is taking market share away from existing formats. If a file format has an open specification and is non-proprietary, like XML or PDF/A, it is more likely to have a successful migration path than a closed proprietary standard. If a file is in a closed proprietary format, or has been copy disabled through DRM, migration may be impossible.
Emulation, the practice of using current software and hardware to simulate a different (often older) environment is a promising digital practice and may offer the best chance of retaining the look and feel of older digital materials. Unlike tangible backups, emulation could be used to preserve databases and multimedia. There are examples of working emulators today, most notably for old video games and SoftWindows, which allows a Mac computer to mostly act like a Windows box.
Emulation has its problems too. It depends on open standards in operating systems and file formats to be able to write software that is faithful to the original look and feel. SoftWindows isn’t Windows, as both Mac and Windows users can tell you. Unless we can specify the original operating system, the original application that created a file, and the original file specification, emulation isn’t going to work. Even when we have all these items, emulation can be difficult.
What all of the above solutions have in common is that they cannot be implemented in the presence of strong Digital Rights Management (DRM). If DRM doesn’t allow printing, you cannot make a tangible backup. If it doesn’t allow copying, you cannot refresh the file to a new media and you likely cannot upgrade to a newer format. DRM software is invariably closed and proprietary, so you will not be able to recreate the original DRM scheme in an emulator. The â€œdigital rightsâ€ in DRM is for the publisher, not you. That is why we at Free Government Information have repeatedly called on the Government Printing Office to publicly commit to rejecting DRM and are disappointed they have not done so.
III. What you can do today to save the past
But let’s not end on a downer. If we have the will and interest, there are things that we can do to assure the preservation of important “born digital” documents:
- Start by taking the Cornell University Digital Preservation Tutorial. No matter your level or lack of digital expertise, the people at Cornell will teach you something with their free tutorial. They even have a brief placement test.
- Keep local copies of e-pubs that are important to you, even if you don’t know what to do with them. Maybe you can find a partner who does!
- Think about joining the LOCKSS (Lots of Copies Keep Stuff Safe) alliance that has been preserving e-journals for several years and may expand to federal materials.
- Advocate for open data and operating system standards (i.e. Use or suggest open source whenever possible). This isn’t a pipe dream â€“ it is already a reality in Massachusetts and several foreign countries.
- Finally, if you run a Federal Depository Library, make plans today to store and serve federal e-content on local servers, whether or not those federal publications are in the Federal Depository Library Program. Post those plans to govdoc-l.
Although I could talk about digital preservation for hours, like my partners I was alloted 15 minutes. So I hope that I have given you a taste of what’s at stake, what are the unique challenges associated with digital materials, what some of the solutions MIGHT be, and hopefully encouraged and empowered some of you to start taking charge of our â€œborn digitalâ€ heritage. It will be difficult and costly, but as the only institutions with a specific charge and interest in preserving our historical, cultural, and scientific heritage, libraries must lead as best they can. We are a rare “yesterday and tomorrow” institution in a “only today matters” society. We know that today will be unworkable if people cannot find and use what was done yesterday to plan for tomorrow.
Thank you all for being here today. Especially the two-thirds of you that are not documents librarians. It just thrills me to see the wider community caring about the huge amount of federal information that is at stake. Thank you!