Last year we posted a note on ResourceShelf about the â€œ2004 Presidential Term Web Harvestâ€ containing more than 75 million .Gov and .Mil web pages, equal to about 6.5 terabytes of data. It’s a project of NARA and The Internet Archive. The archived sites can be browsed or keyword searched.
Now available is the 109th Web Harvest.
What does it contain?
+ More than four million pages (42 GB) crawled and archived between 11/11/06 and 12/11/06
+ Browse by Members Name
+ Browse by Committee Name
+ Browse by Leadership
+ Browse by House or Senate Organizations
The harvest produced a public reference copy of the web sites for the purpose of continual availability to the public, and also produced a record copy to be retained in the holdings of NARAâ€¦Web sites included in the harvest were identified from information provided by the Web Systems Branch of the House Information Resources staff and by Senate webmasters in the Offices of the Secretary of the Senate and the Sergeant at Arms.
A bit more on ResourceShelf including a comment by Librarian of Congress, James Billington, about the average lifespan of a web site.
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.