Archive-it
Archive-It Wiretapping and the National Security Agency Collection
Submitted by archive on Fri, 2009-06-19 12:41.John Gilmore is an open software proponent, co-founder of the Electronic Frontier Foundation and perhaps most importantly an Archive-It partner (as an independent researcher). His Archive-It collections focus on open access to government information and policy as well as free and open source software.
John has been archiving sites related to wiretapping and the National Security Agency since 2007. Describing the reasons for creating this collection, John says:
"I'm trying to record and make searchable some documents related to the controversy over NSA wiretapping domestically without warrants, or with general warrants, which the Fourth Amendment outlaws. "
This collection demonstrates how the recent change in administration has opened up further crawler access to the National Security Agency (NSA) website. Previously, most NSA web content was blocked to the Archive-It crawler (as well as other crawlers) using the robots.txt exclusion protocol. Looking at their old exclusion list, for example this one from 2008 you can just how much of their website was blocked from crawler access. (all the directories listed could not be accessed).
Since January 17, 2009 however crawlers have access to much more content.
At the Internet Archive, we have noticed similar changes in other .gov websites including www.whitehouse.gov (compare this version from 2006 to the current exclusion list).
Its exciting to know that moving forward John and other Archive-It partners will be able to collect more complete snapshots of government websites.
-Molly and Lori
- archive's blog
- Add new comment
- 1237 reads
What do you want to know about Archive-it?
Submitted by jrjacobs on Mon, 2008-10-13 09:57.I'd like to survey you, our loyal FGI readers. I'm co-presenting with Molly Bragg at next week's Depository Library Council conference about digital collections using archive-it (see title and abstract below). I've got an outline but I'd really like to know what questions YOU have about archive-it and digital collections. What do YOU want to know about archive-it? So, please please please leave a comment here so that my presentation will be even more amazing :-)
Title of Presentation:
Gone Today, Here Tomorrow: Archiving and Preserving Born Digital Government Documents
Abstract:
Stanford University Library has been a federal depository library since 1895. In 2007, the library began collecting born digital documents using Archive-It, the web archiving service from Internet Archive (www.archive-it.org). In this presentation James Jacobs will discuss his group's objectives and procedures for selecting and archiving digital content and share examples of the unique content preserved. Molly Bragg will present an overview of web archiving projects and tools used and developed by Internet Archive. These tools are used by libraries around the world to preserve government documents and other born digital content.
- jrjacobs's blog
- 6 comments
- 1552 reads


Recent comments
1 week 1 day ago
2 weeks 3 days ago
2 weeks 3 days ago
2 weeks 5 days ago
3 weeks 6 days ago
4 weeks 51 min ago
4 weeks 2 days ago
4 weeks 2 days ago
4 weeks 2 days ago
4 weeks 3 days ago