A recent report evaluating a two-year web harvesting project found that 14.3 percent of the original URLs of all titles harvested from the Web and archived during the first year of project had become inactive within at least one year of harvesting. (p.33)
- Two-Year Pilot Project Evaluation. Legal Information Archive, The Chesapeake Project. (June 2009).
The report is an evaluation of the Legal Information Archive of The Chesapeake Project, which was designed to preserve born-digital legal information published directly to the Web. The project was implemented in early 2007 by the Georgetown Law Library and the State Law Libraries of Maryland and Virginia.
The report notes that more than 95 percent of the titles in the sample were PDF files. Of these titles, 8.2 percent were found to have inactive original URLs in 2008 and 14.1 percent in 2009. (p.35)
Ten percent of government (.gov) URLs became inactive in the first year and an additional three percent became inactive in the second year. (p.34)
The report concludes:
More than 4,300 digital items, representing nearly 1,900 titles, have been harvested from the Web and archived, and roughly 14 percent of these titles have already been removed from their original locations on the Web, demonstrating the importance and effectiveness of the project’s efforts. Moreover, the project’s access figures demonstrate both the broad, international reach of the project’s efforts, as well as the successful selection of high-interest and high-use materials by project participants.
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.