Home » Posts tagged 'digital libraries'
Tag Archives: digital libraries
Dodging the memory hole
Abbey Potter’s comments about preserving digital news are also very relevant to the preservation of government information.
Potter is the Program Officer with the the National Digital Information Infrastructure and Preservation Program (NDIIPP). In her post on The Signal blog, she elaborates on her closing keynote address at the Dodging the Memory Hole II: An Action Assembly meeting in Charlotte NC last month.
-
Dodge that Memory Hole: Saving Digital News by Abbey Potter, The Signal Library of Congress Digital Preservation Blog (June 2, 2015).
-
Take Action: What comes next? Closing Keynote, Dodging the Memory Hole II: An Action Assembly Charlotte, NC, (May 12, 2015).
She quotes a presentation by Andy Jackson of the UK Web Archive in which he addresses the questions: “How much of the content of the UK Web Archive collection is still on the live web?” and “How bad is reference rot in the UK domain?”
- Ten years of the UK Web Archive: What have we saved? [ppt] Andy Jackson International Internet Preservation Consortium, General Assembly 2015 (April 27, 2015).
By sampling URLs collected in the UK Web Archive, Jackson examined URLs that have moved, changed, or gone missing. He analyzed both link rot (a file gone missing) and content drift (a file that has changed since being archived). He shows that 50 percent of content had gone, moved, or changed so as to be unrecognizable in only one year. After three years the figure rose to 65 percent.
Potter says that it is safe to assume that the results would be similar for newspaper content on the web. It would probably also be similar for U.S. government web sites.
What can we learn from this and what can we do? For newspapers, Potter says, libraries have acquisition and preservation methods that are too closely linked to physical objects and that too often exclude digital objects. This results in libraries having gaps in their collections – “especially the born-digital content.” She summarizes the problem:
Libraries haven’t broadly adopted collecting practices so that they are relevant to the current publishing environment which today is dominated by the web.
This sounds exactly like what is happening with government information.
First, because GPO has explicitly limited actual deposit of government information to so-called “tangible” products (Superintendent Of Documents Policy Statement 301 [SOD 301]). This policy does exactly what Potter says is wrong: it establishes collecting practices that are not relevant to the current publishing environment. (See more on the effects of SOD 301 here.)
Second, because most of the conversation within the FDLP in the last few years has been about our historic paper collections rather than about the real digital preservation issue we should be facing: born-digital government information. (See Born-Digital U.S. Federal Government Information: Preservation and Access.)
As Potter says, “We have clear data that if content is not captured from the web soon after its creation, it is at risk.” And, “The absence of an acquisition stream for this [born-digital] content puts it at risk of being lost to future library and archives users.”
Potter outlines a plan of action for digital newspaper information that is surprisingly relevant for government information. She suggests that libraries should establish relationships (and eventually agreements) with the organizations that create, distribute, and own news content. That sounds like exactly what FDLP libraries have always done for 200+ years with paper and should be doing, could be doing, with digital government information today. There is no legal or regulatory barrier to GPO depositing FDLP digital files with FDLP libraries; indeed, GPO is already doing this de facto with its explicit actions that allow “USDocs” private LOCKSS network partners to download FDsys content.
Potter also recommends web archiving as another promising strategy. Since many agencies are reluctant to deposit digital content with FDsys, and because they are allowed by law to refrain from doing so, web archiving is a practical alternative, even if it is imperfect. Indeed, GPO does its own web harvesting program. Although some libraries also do web harvesting that includes U.S. Federal government web sites, more needs to be done in this area. (See: Webinar on fugitive documents: notes and links.)
I find it ironic that libraries are not at least experimenting with preserving born-digital government information. It is difficult to find an article about library projects that does not assert scarcity of funds or high barriers of copyright to overcome in digital library projects. So, why not use born-digital government information as a test bed for preserving digital content? The FDLP agreements and commitments are already in place, most of the content is public domain, and communities of interest for the content already exist. FDLP libraries could start today by building digital library collections and test-bed technology for government information and later expand to other more difficult collections and build on a base of experience and success. The fact that this would help our designated communities, preserve essential information, and further the goals of the FDLP would be welcome side-effects.
Role of libraries
On a quiet Sunday, here are two quotes that I find both memorable and inspiring when I think of the role and future of libraries. These days we see so much emphasis placed on fast access to current, popular, “must see” information. Although libraries have a role to play in that as well, few if any institutions have the long-term role that libraries have.
We mustn’t model the digital library on the day-to-day operation of a single human brain, which quite properly uses-or-loses, keeps uppermost in mind what it needs most often, and does not refresh, and eventually forgets, what it very infrequently considers — after all, the principal reason groups of rememberers invented writing and printing was to record accurately what they sensed was otherwise likely to be forgotten.
— Nicholson Baker. Double Fold. NY: Random House, 2001. p245.
Libraries exist to preserve the thoughts and deeds that no one else has time for anymore, to collect items that might not be used for another ten, fifty, one hundred years — if ever. It is this last uncertainty that makes libraries the most heroic of human creations.
— Paul Collins. Banvard’s Folly: Thirteen Tales of People Who Didn’t Change the World. New York: Picador, 2001. p.285-286.
ALA Digital Libary of the Week: Homeland Security Digital Library
Digital Library of the Week: Homeland Security Digital Library, American Library Association (October 7th, 2010).
For the first time in its seven-year history, the Homeland Security Digital Library has opened a portion of its unique and unrivaled collection to the public. The HSDL is the nation’s premier collection of documents related to homeland security policy, strategy, and organizational management. Sponsored by the U.S. Department of Homeland Security’s National Preparedness Directorate (under FEMA) and the Naval Postgraduate School Center for Homeland Defense and Security, the HSDL is composed of homeland security–related documents collected from a wide variety of sources. These include federal, state, tribal, and local government agencies, professional organizations, think tanks, academic institutions, and international governing bodies. . Although largely comprised of reports, this specialized library also provides homeland security subject matter in other formats including videos, slide presentations, maps, databases, and statistics….
David Rosenthal: Stepping Twice Into The Same River
Last month, David Rosenthal, chief scientist on the LOCKSS Project, gave the keynote address entitled Stepping Twice Into The Same River to the ACM/IEEE Joint Conference on Digital Libraries (JCDL) and the annual International Conference on Asia-Pacific Digital Libraries (ICADL) (or just ICDL/ICADL!) in Queensland, Australia. It was wide-ranging, thoughtful and provocative — in short everything you’d want in a keynote to a major international digital library conference.
David hit on publishers and the publishing industry and practices, scholarly communication, digital preservation, the intersection between technology and economics and the current state and future of libraries. He makes a great argument that the upheaval and disruption currently affecting the 3 parallel fields of publishing, libraries, and archives (what he terms “technological and economic discontinuity”) creates the perfect opportunity for radical technological change toward a collaborative archival academic cloud in order to define the future of information access and preservation (at least in terms of universities and scholarly communication) in beneficial and long-term sustainable ways.
Here are some main points that I gleaned from David’s presentation:
- publishers are in a similar boat to news organizations and have sacrificed long-term viability for short term economic gain — and that’s going to ultimately destroy them;
- libraries and archives need to focus their preservation goals on dynamic services rather than the static content:
“…it’s less about what we are preserving and more about how preserved information is accessed. Less about HTML and other formats, and more about HTTP and other protocols. The reason is that static information is a degenerate case of dynamic information; a system designed for dynamic information can easily handle static information. The converse isn’t true.”
- distributed digital preservation and archives offer the more economically and technologically sound opportunities in the long run;
- data preservation will take steady long-term funding;
- since ingest is a major cost for any digital preservation system, universities need to start seeing their Web space/infrastructure in terms of academic clouds rather than leasing from commercial cloud companies like Amazon’s Elastic Compute Cloud (Amazon EC2):
“Unless something dramatic happens, scholars who want to publish services wrapped around their, or other people’s, data will take the path of least resistance and use Amazon’s services. Miss a credit card payment, your data and service are history. Worse, do we really want to end up with Amazon owning the world’s science and culture?”…
…What Universities get for the extra cost is the permanence they need. The permanence comes from the fact that the University already has its hands on the data and the services in which it is wrapped, instantiated in highly robust and preservable hardware. Thus, no ingest costs and very low preservation costs. With the model of Amazon and a separate archiving service, as well as paying Amazon, Universities have to pay the archiving service, and pay the ingest costs. When these extra costs are taken in to account, because the ingest costs dominate, it is likely that Amazon would be more expensive.
I highly recommend that folks read David’s keynote at least twice. there are a lot of pearls of wisdom in there. I think he makes a compelling case for a viable digital future for scholarly communication, one in which libraries and archives can play a vital role.
Smithsonian digitization strategic plan
The Smithsonian has just released their digitization strategic plan for fiscal years 2010 – 2015 called “Creating a Digital Smithsonian” — executive summary and full report.
I’m in 2 minds about this as well as similar digitization plans. On the one hand, the digitization of Smithsonian collections — books, research reports, data, music, film and other sounds (like frog vocalizations!) — will mean potentially a boon to online access to some really amazing materials.
On the other hand, this quote from the executive summary worries me:
To preserve our collections, the Smithsonian constantly battles the destructive forces of time and environment. Despite our best efforts, plastics discolor, wax cylinder recordings distort, and botanical specimens become brittle. Digitization offers a way to make objects — and the valuable information they contain — available without jeopardizing their integrity by handling or by exposure to the elements.
While they mention a “life cycle-management approach to digitization,” there doesn’t seem to be a serious amount of thought given to the fact that digital objects degrade faster than physical objects, and that digital preservation is an ongoing and potentially more expensive effort. I worry that SI.edu will broker the same kind of disastrous deal that GAO did with Thomson-West whereby a whole swath of public domain information was privatized.
I would call on SI.edu and ALL .gov agencies to insert a clause into ANY digitization contract that ALL digital files and metadata will be accessible via free and open sites. That means where applicable, copies of all digital content would be ingested into GPO’s FDsys, Library of Congress, NARA and/or publicly accessible non-profit sites (eg. UNT digital library or Internet Archive). Please help us get this message across to your friends in the .gov sector. Public information should remain public!
Latest Comments