New Link Rot report from Chesapeake.
For the past six years, the Georgetown Law Library and the Chesapeake Digital Preservation Group have been doing doing studies on “Link Rot and Legal Resources on the Web.” The newest report, for 2013, says that 51% of .gov URLs selected in 2007-2008 are broken. For a larger sample of documents selected 2007-2013 (and including all domains, not just .gov) “link rot has increased to 44.2 percent within six years.” This is a 6.5 percent increase over 2012.
The Chesapeake group gathers information from the web and preserves it for their users and each year they investigate “whether or not the documents in the archive can still be found at the original web addresses from which they were captured.”
The study uses two samples: one sample of 579 original URLs for content captured from 2007‐2008 and a second sample of the full content of the archive at the time the study is conducted. In 2013, the full sample included 842 original URLs for materials captured from 2007‐2013. The study is particularly relevant to government information specialists because more than 90% of the URLs in the original sample and almost 85% of the URLs in the full sample are from state governments (state.[state code].us), organizations (.org), and government (.gov) the top-level domains.
Among the new report’s findings:
This year saw a substantial increase in the number of government URLs (.gov) that no longer worked.
In 2013, the content at .gov domains showed the highest increase in link rot. More than 50 percent of the materials posted to government domains disappeared from the original documented web addresses.
Overall, the results of the six years of systemically checking links have demonstrated that documents posted on web sites will disappear at an increasing rate over time.
For “dot-gov” domains (URLs ending in “.gov”) the studies have shown cumulative link rot of:
The Chesapeake Digital Preservation Group is able to create these reports because it has been actively preserving information from the web for its users for several years. The report is a useful by-product of a preservation effort that is rooted in providing long-term access for its user community to information they need. This is not an academic exercise — the Group also collects data on the use of their harvested content. The report summarizes its conclusion of its experience this way:
The value of harvesting these materials before they are no longer available at their original URLs is demonstrated by the high use of these materials. During March 2013, the time the 2013 sample set was taken, over 84,000 items were retrieved. In 2012, 1.5 million items viewed. It is likely that the value of this project and similar ones will become even more significant in future years.
For libraries that rely on pointing to URLs rather than preserving information in their own digital libraries, the new report from the Chesapeake Project provides sobering, factual data on the reliability of that strategy.