[Update Feb 8, 2014: I just noticed that the Georgetown Library press release below returned a 404 broken link (cue irony meter!) so here’s a link to the archived press release from the Internet Archive.]
For the past five years, the Georgetown Law Library and the Chesapeake Digital Preservation Group have been doing doing studies on “link rot.” This year, they discovered that “link rot has increased to 37.7 percent within five years.”
- “Link Rot” and Legal Resources on the Web: A 2012 Analysis by the Chesapeake Digital Preservation Group.
- Georgetown Law Library Finds 38 Percent of Online Documents Disappear from Web Pages Within Five Years, press release, Georgetown University (April 25, 2012).
The Chesapeake group gathers information from the web and preserves it for their users and each year they study how many of the URLs from which they originally gathered information “no longer provide access to the content that was originally selected, captured, and archived by the Chesapeake Group.”
This study is particularly relevant to government information specialists because more than 90% of their sample URLs were from state governments (state.[state code].us), organizations (.org), and government (.gov) the top-level domains.
For “dot-gov” domains (URLs ending in “.gov”) the studies have shown cumulative link rot of:
10% in 2008
13% in 2009
25% in 2010
31% in 2011
36% in 2012
Cumulative link rot of state government URLs (.state.__.us) were almost as bad: 10.8% in 2008 15.8% in 2009 32.1% in 2010 30.4% in 2011, and 33.8% in 2012.
The total cumulative link rot for all URLs was 37.7% in 2012. Another way of looking at this is that, of the documents the Chesapeake Project has preserved, only only 62.3% were still available at their original URL as of the 2012 study.
This year’s report includes two samples of URLs. The first sample includes 579 URLs that Chesapeake captured during 2007 and 2008. They use this sample to examine how link rot changes over time.
The second sample is a new and represents the full content of the Chesapeake archive at the time the study was conducted. Using this second, broader sample the study reports a link rot rate of 25.9%.
For libraries that rely on pointing to URLs rather than preserving information in their own digital libraries, the new report from the Chesapeake Project provides sobering, factual data on the reliability of that strategy.