Home » Posts tagged 'Archive-it'

Tag Archives: Archive-it

Our mission

Free Government Information (FGI) is a place for initiating dialogue and building consensus among the various players (libraries, government agencies, non-profit organizations, researchers, journalists, etc.) who have a stake in the preservation of and perpetual free access to government information. FGI promotes free government information through collaboration, education, advocacy and research.

Archive-it publishes Web archiving life cycle model

The Archive-it team announced today the publication of their White Paper Web Archiving Life Cycle Model. The model offers a thorough description of the entire process of Web archiving. Whether you’ve been Web archiving for 7 years or mulling about jumping in to the fray, this model will put you in a good headspace to do this critical work. Thanks Molly Bragg, Kristine Hanna, Lori Donovan, Graham Hukill, and Anna Peterson!

The Archive-It team is excited to publish our first white paper: The Web Archiving Life Cycle Model. With this paper we hope to share web archiving best practices and processes with organizations interested in developing and/or expanding their web archiving initiatives.

This white paper is the product of a collaboration between members of the Archive-It team as well as the larger Archive-It partner community. Several partners took part in in-depth interviews regarding their experiences using Archive-It and web archiving in general, and others helped with the design iteration phase of the model and read preliminary drafts of the paper.

The Web Archiving Life Cycle Model encompasses the following web archiving processes:

• Vision and Objectives
• Resources and Workflow
• Access/Use/Reuse
• Preservation
• Risk Management
• Appraisal and Selection
• Scoping
• Data Capture
• Storage and Organization
• Quality Assurance and Analysis

Status of the Wayback Machine

Roy updates us on the status of the Wayback machine with an example from the White House:

  • Back to the Wayback Machine, Roy Tennant, Library Journal (May 18th, 2011).

    But that means that any claims to be “archiving the web” should be taken with a grain of salt. Maybe say “archiving the parts of the web that matter” or “ignoring what doesn’t matter so much”.

And, don’t forget Archive-It, the web archiving service from Internet Archive.

Through a user-friendly web interface, Archive-It partners can catalog, manage, and browse their archived collections using web archiving tools developed at the Internet Archive. Collections are hosted at the Internet Archive data center and are accessible to the public, including full-text search.

Archive-It Wiretapping and the National Security Agency Collection

John Gilmore is an open software proponent, co-founder of the Electronic Frontier Foundation and perhaps most importantly an Archive-It partner (as an independent researcher). His Archive-It collections focus on open access to government information and policy as well as free and open source software.

John has been archiving sites related to wiretapping and the National Security Agency since 2007. Describing the reasons for creating this collection, John says:

“I’m trying to record and make searchable some documents related to the controversy over NSA wiretapping domestically without warrants, or with general warrants, which the Fourth Amendment outlaws. ” 

This collection demonstrates how the recent change in administration has opened up further crawler access to the National Security Agency (NSA) website. Previously, most NSA web content was blocked to the Archive-It crawler (as well as other crawlers) using the robots.txt exclusion protocol. Looking at their old exclusion list, for example this one from 2008 you can just how much of their website was blocked from crawler access. (all the directories listed could not be accessed).

Since January 17, 2009 however crawlers have access to much more content.

At the Internet Archive, we have noticed similar changes in other .gov websites including www.whitehouse.gov (compare this version from 2006 to the current exclusion list).

Its exciting to know that moving forward John and other Archive-It partners will be able to collect more complete snapshots of government websites.

-Molly and Lori

What do you want to know about Archive-it?

I’d like to survey you, our loyal FGI readers. I’m co-presenting with Molly Bragg at next week’s Depository Library Council conference about digital collections using archive-it (see title and abstract below). I’ve got an outline but I’d really like to know what questions YOU have about archive-it and digital collections. What do YOU want to know about archive-it? So, please please please leave a comment here so that my presentation will be even more amazing 🙂

Title of Presentation:

Gone Today, Here Tomorrow: Archiving and Preserving Born Digital Government Documents


Stanford University Library has been a federal depository library since 1895. In 2007, the library began collecting born digital documents using Archive-It, the web archiving service from Internet Archive (www.archive-it.org). In this presentation James Jacobs will discuss his group’s objectives and procedures for selecting and archiving digital content and share examples of the unique content preserved. Molly Bragg will present an overview of web archiving projects and tools used and developed by Internet Archive. These tools are used by libraries around the world to preserve government documents and other born digital content.