For libraries that rely on pointing to URLs rather than preserving information in their own digital libraries, the new report from the Chesapeake Project provides sobering, factual data on the reliability of that strategy.
In an examination of “link rot” the project found that 30.4% of URLs examined no longer provide access to their original information.
This study is particularly relevant to government information specialists because more than 90% of their sample URLs were from state governments (state.[state code].us), organizations (.org), and government (.gov) the top-level domains.
The Chesapeake Project Legal Information Archive, which harvests and preserves relevant digital legal information from the web, has been producing reports on “link rot” for several years. They define link rot as “a URL that no longer provides direct access to files matching the content originally harvested from the URL and currently preserved in the Chesapeake Group’s digital archive.”
Their new report is now available:
- “Link Rot” and Legal Resources on the Web: A 2011 Analysis, by the Chesapeake Digital Preservation Group, [April 2011].
In one interesting finding, the report says that the rate of loss of information slowed in the last year: “Whereas the prevalence of link rot among URLs in the sample nearly doubled every year during the first three years of the study, it slowed significantly in the fourth year.” The report makes clear that although 30.4 percent, or nearly one-third, of the archived titles have disappeared from their original URLs since the beginning of the program in 2007, only 2.5 percent of URLs were lost to link rot within the past year.
Their data show that cumulative link rot frequency for .gov files was 10% in 2008, 13% in 2009, 25% in 2010, and 31% in 2011. There was an interesting development in that some state-level URLs that were inaccessible in 2010 were once again accessible when re-checked for the 2011 analysis. The cumulative link rot frequency for state level URLs was still almost as high as for the federal URLs: 10.8% in 2008, 15.8% in 2009, 32.1% in 2010, and 30.4% in 2011. Even with that slight improvement at the state level, the overall cumulative link rot percentage rose in 2011 (30.4%) over 2010 (27.9%). Another way of looking at this is that of the documents the Chesapeake Project has preserved, only 69.6% were still available at their original URL as of the 2011 study.
In an earlier study, the authors qualified their findings, noting that the findings are “not meant to be broadly applicable or to provide a representation of link rot throughout the universe of web resources” but only reports on those items in the Chesapeake Project archive. The studies do provide “insight into the vulnerability of law- and policy-related web resources selected by experienced law librarians from seemingly stable open-access web sites hosted by reputable organizations and state and federal governments.”
Significantly, “All of the Web resources described in this report that have disappeared from their original locations on the Web remain accessible via permanent archive URLs here at legalinfoarchive.org, thanks to the Chesapeake Group’s efforts. ”