Digital government information, like its private sector counterpart comes in many formats.
I recently came across an article:
Smith, MacKenzie. Eternal bits: How can we preserve digital files and save our collective memory? IEEE Spectrum, July 2005. (9 pages)
that provides a nice overview of the many kinds of digital materials that need preserving and the relative challenges of preserving such material:
Like most difficult challenges, data preservation is really a mix of the simple and the complex challenges. At one end of the preservation continuum is a simple item, like an ASCII text document. Preserve the data by keeping the file on current media and provide some way to view it and you’re pretty much done. We’ll call this the “save the bits” approach.
At the other end lie the harder cases, like these:
A compiled software program written in a custom-built programming language for which neither the language documentation nor the compiler has survived.
A complex geospatial data set developed for the U.S. Geological Survey in a proprietary system made by a company that went out of business 20 years ago.
A Hollywood movie created with state-of-the-art encryption to prevent piracy, for which the decryption keys were lost.
For these three items, we don’t hold out much hope of being able to preserve the content forever. For the software program and the geospatial data set, the digital archeologists of the future probably won’t have enough information about how the software and data set were created or the language they were created inâ€”no Rosetta Stone, as it were, to translate the bits from lost languages to modern ones.
As for that encrypted movie, our archeologists might have read old reviews that raved about the special effects in Sin City, but this cinematic achievement will remain locked away until someone pays a lot of money to a master of ancient cryptology to crack the key.
Fortunately, many content types fall between these difficult cases and ASCII text. Usually, saving the bits using standard, well-documented data, video, and image formats, such as XML, MPEG, and TIFF, gets you halfway to an enduring digital archive. Put another way, the goal is to avoid formats that require proprietary software, such as AutoCAD or QuarkXPress, to play or render the data.
The article’s main focus is on the Dspace digital repository software and is well worth reading. The Kansas state depository system keeps its electronic documents in a dSpace implementation called KSPACe.