The End of Term 2016 collection is still going strong, and we continue to receive email from interested folks about how they can help. Much of the content for the EOT crawl has already been collected and some of it is publicly accessible already through our partners. Last month we posted about ways to help the collection process. At this point volunteers are encouraged to help check the archive to see if content has been archived (i.e., do quality assurance (QA) for the crawls).
Here’s how you can help us assure that we’ve collected and archived as thoroughly and completely as possible:
Step 1: Check the Wayback Machine
Search the Internet Archive to see if the URL has already been captured. Please note this is not a specific End of Term collection search and does not include ALL content archived by the End of Term partners, but will be helpful in identifying whether something has been preserved already.
You may type in specific URLs or domains or subdomains, or try a simple keyword search (in Beta!).
1a: Help Perform Quality Assurance
If you do find a site or URL you were looking for, please click around to check if it was captured completely. A simple way to do this is to click around the archived page – click on navigation, links on the page, images, etc. We need help identifying parts of the sites that the crawlers might have missed, for instance specific documents or pages you are looking for but perhaps we haven’t archived. Please note that crawlers are not perfect and cannot archive some content. IA has a good FAQ on information about the challenges crawlers face.
If you do discover something is missing, you can still nominate pages or documents for archiving using the link in step 3 below.
Step 2: Check the Nomination Tool
Check the Nomination Tool to see if the URL or site has been nominated already. There are a few ways to do this:
- View all reports here
- Check this list here for a list of everything nominated or search here.
- You can also check our bulk lists here
Step 3: Nominate It!
If you don’t see the URL you were looking for in any of those searches, please nominate it here.
Questions? Please contact the End of Term project at eot-info AT archive DOT org.
[Editor’s note: Updated 12/15/16 to include updated email address for End-of-Term project queries (eot-info AT archive DOT org), and information about robots.txt (#1 below) and databases and their underlying data (#5 below). Also updated 12/22/16 with note about duplication of efforts and how to dive deeply into an agency’s domain at the bottom of #1 section. jrj]
Here at FGI, we’ve been tracking the disappearance of government information for quite some time (and librarians have been doing it for longer than we have; see ALA’s long running series published from 1981 until 1998 called “Less Access to Less Information By and About the U.S. Government.”). We’ve recently written about the targeting of NASA’s climate research site and the Department of Energy’s carbon dioxide analysis center for closure.
But ever since the NY Times last week wrote a story “Harvesting Government History, One Web Page at a Time”, there has been renewed worry and interest from the library- and scientific communities as well as the public in archiving government information. And there’s been increased interest in the End of Term (EOT) crawl project — though there’s increased worry about the loss of government information with the incoming Trump administration, it’s important to note that the End of Term crawl has been going on since 2008, with both Republican and Democratic administrations, and will go on past 2016. EOT is working to capture as much of the .gov/.mil domains as we can, and we’re also casting our ‘net to harvest social media content and government information hosted on non-.gov domains (e.g., the St Louis Federal Reserve Bank at www.stlouisfed.org). We’re running several big crawls right now (you can see all of the seeds we have here as well as all of the seeds that have been nominated so far) and will continue to run crawls up to and after the Inauguration as well. We strongly encourage the public to nominate seeds of government sites so that we can be as thorough in our crawling as possible.
It’s hard to believe we’re rapidly approaching FGI’s 9 year anniversary(!). We’d like to ring in our 10th year with an invitation to the community to become citizen documents bloggers. We don’t want to have news and information critical to the govt information community fall through the cracks — fugitive news?! — and so we need your help. Are you a news hound? Maybe you’d like to cover the “doc in the news” beat like the one we just posted. Passionate about fugitive documents? Freshen up the blog with periodic posts about interesting fugitives — perhaps ones you’ve found on the lostdocs blog. Policy wonk? You could set up Govtrack.us alerts and write about legislation of interest to libraries and the docs community.
The possibilities are limitless, but we need your help to make them a reality. Contact us at freegovinfo AT gmail DOT com if you’re intrigued.
Those who would like to volunteer for Martin Luther King, Jr. Day will find opportunities via MLKDay.gov You can locate service opportunities within your state using keywords and selecting a state from the drop-down box. Options are also available to register individual projects/organizations. Marketing resources can be accessed through the website. A press release from MLKDay.gov states that Colin Powell inaugurated a new website, USAService.org which has been “created by the Presidential Inaugural Committee where Americans can find volunteer opportunities for the January 19 King Holiday or sign up to host a local event.”