John Gilmore is an open software proponent, co-founder of the Electronic Frontier Foundation and perhaps most importantly an Archive-It partner (as an independent researcher). His Archive-It collections focus on open access to government information and policy as well as free and open source software.
John has been archiving sites related to wiretapping and the National Security Agency since 2007. Describing the reasons for creating this collection, John says:
“I’m trying to record and make searchable some documents related to the controversy over NSA wiretapping domestically without warrants, or with general warrants, which the Fourth Amendment outlaws. ”
This collection demonstrates how the recent change in administration has opened up further crawler access to the National Security Agency (NSA) website. Previously, most NSA web content was blocked to the Archive-It crawler (as well as other crawlers) using the robots.txt exclusion protocol. Looking at their old exclusion list, for example this one from 2008 you can just how much of their website was blocked from crawler access. (all the directories listed could not be accessed).
Since January 17, 2009 however crawlers have access to much more content.
At the Internet Archive, we have noticed similar changes in other .gov websites including www.whitehouse.gov (compare this version from 2006 to the current exclusion list).
Its exciting to know that moving forward John and other Archive-It partners will be able to collect more complete snapshots of government websites.
-Molly and Lori
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Latest Comments