govt web sites
We Want YOU... To Help With the Dot Gov Harvest!
Submitted by starr on Tue, 2008-09-09 15:10.Hi to all you FGI readers! I'm thrilled to be this month's guest blogger.
As we all watch this historic presidential election unfold, there's another question going on in the back of our minds--how much of this online government information is going to change with the new administration, regardless of who's sworn in next January? As someone who works specifically with digital government collections, and whose primary job is capturing defunct government websites, this is of particular interest.
Most of you already know about the "Dot Gov Crawl" project that's been organized to address this issue. The project partners include the Library of Congress, the Internet Archive, the California Digital Library (CDL), the University of North Texas (UNT), and the U.S. Government Printing Office (GPO). We're working collaboratively to harvest and preserve government websites (primarily .gov and .mil domains), to form a snapshot of digital government information at the end of the current presidential administration.
The Internet Archive will be performing the comprehensive crawl, and Library of Congress is focusing on congressional materials. CDL and UNT will be performing in-depth harvests of specific government websites, gathering documents linked deep within the websites that may not be gathered in the Internet Archive crawl.
I encourage you to participate in the project. Communicate with the partner institution closest to you, and let them know if there are specific websites (or portions of websites) that are of particular interest to you.
At UNT, we're trying to focus on documents that support our regional interests, things that might be overlooked in the kinds of sweeping national topics that will be handled by the Internet Archive. We're requesting that librarians for the central United States send us things that you want captured--websites you use often, publications deep within websites that might not be captured in large crawls, topics of regional interest. Your requests will help us identify and prioritize the information that is preserved for future generations.
Please, submit any suggestions you have in the comments section below--I'll be monitoring them and adding them to our list. Thanks for your input!
- starr's blog
- 1 comment
- Email this blog
- 599 reads
Google and state government information
Submitted by newkirk on Mon, 2007-04-30 15:26.- newkirk's blog
- Add new comment
- Email this blog
- 932 reads
Millions and Millions of Government and Military Web Pages Archived by NARA and The IA
Submitted by garyprice on Wed, 2007-04-11 18:54.Last year we posted a note on ResourceShelf about the “2004 Presidential Term Web Harvest†containing more than 75 million .Gov and .Mil web pages, equal to about 6.5 terabytes of data. It's a project of NARA and The Internet Archive. The archived sites can be browsed or keyword searched.
Now available is the 109th Web Harvest.
What does it contain?
+ More than four million pages (42 GB) crawled and archived between 11/11/06 and 12/11/06
+ Browse by Members Name
+ Browse by Committee Name
+ Browse by Leadership
+ Browse by House or Senate Organizations
Go to: http://www.webharvest.gov/collections/
The harvest produced a public reference copy of the web sites for the purpose of continual availability to the public, and also produced a record copy to be retained in the holdings of NARA…Web sites included in the harvest were identified from information provided by the Web Systems Branch of the House Information Resources staff and by Senate webmasters in the Offices of the Secretary of the Senate and the Sergeant at Arms.
A bit more on ResourceShelf including a comment by Librarian of Congress, James Billington, about the average lifespan of a web site.
- garyprice's blog
- Add new comment
- Email this blog
- 933 reads



Recent comments
1 day 13 hours ago
2 days 4 hours ago
3 days 12 hours ago
3 days 15 hours ago
5 days 4 hours ago
5 days 10 hours ago
6 days 4 hours ago
6 days 8 hours ago
1 week 1 day ago
1 week 3 days ago