The End of Term 2016 collection is still going strong, and we continue to receive email from interested folks about how they can help. Much of the content for the EOT crawl has already been collected and some of it is publicly accessible already through our partners. Last month we posted about ways to help the collection process. At this point volunteers are encouraged to help check the archive to see if content has been archived (i.e., do quality assurance (QA) for the crawls).
Here’s how you can help us assure that we’ve collected and archived as thoroughly and completely as possible:
Step 1: Check the Wayback Machine
Search the Internet Archive to see if the URL has already been captured. Please note this is not a specific End of Term collection search and does not include ALL content archived by the End of Term partners, but will be helpful in identifying whether something has been preserved already.
You may type in specific URLs or domains or subdomains, or try a simple keyword search (in Beta!).
1a: Help Perform Quality Assurance
If you do find a site or URL you were looking for, please click around to check if it was captured completely. A simple way to do this is to click around the archived page – click on navigation, links on the page, images, etc. We need help identifying parts of the sites that the crawlers might have missed, for instance specific documents or pages you are looking for but perhaps we haven’t archived. Please note that crawlers are not perfect and cannot archive some content. IA has a good FAQ on information about the challenges crawlers face.
If you do discover something is missing, you can still nominate pages or documents for archiving using the link in step 3 below.
Step 2: Check the Nomination Tool
Check the Nomination Tool to see if the URL or site has been nominated already. There are a few ways to do this:
- View all reports here
- Check this list here for a list of everything nominated or search here.
- You can also check our bulk lists here
Step 3: Nominate It!
If you don’t see the URL you were looking for in any of those searches, please nominate it here.
Questions? Please contact the End of Term project at eot-info AT archive DOT org.
[Editor’s note: Updated 12/15/16 to include updated email address for End-of-Term project queries (eot-info AT archive DOT org), and information about robots.txt (#1 below) and databases and their underlying data (#5 below). Also updated 12/22/16 with note about duplication of efforts and how to dive deeply into an agency’s domain at the bottom of #1 section. jrj]
Here at FGI, we’ve been tracking the disappearance of government information for quite some time (and librarians have been doing it for longer than we have; see ALA’s long running series published from 1981 until 1998 called “Less Access to Less Information By and About the U.S. Government.”). We’ve recently written about the targeting of NASA’s climate research site and the Department of Energy’s carbon dioxide analysis center for closure.
But ever since the NY Times last week wrote a story “Harvesting Government History, One Web Page at a Time”, there has been renewed worry and interest from the library- and scientific communities as well as the public in archiving government information. And there’s been increased interest in the End of Term (EOT) crawl project — though there’s increased worry about the loss of government information with the incoming Trump administration, it’s important to note that the End of Term crawl has been going on since 2008, with both Republican and Democratic administrations, and will go on past 2016. EOT is working to capture as much of the .gov/.mil domains as we can, and we’re also casting our ‘net to harvest social media content and government information hosted on non-.gov domains (e.g., the St Louis Federal Reserve Bank at www.stlouisfed.org). We’re running several big crawls right now (you can see all of the seeds we have here as well as all of the seeds that have been nominated so far) and will continue to run crawls up to and after the Inauguration as well. We strongly encourage the public to nominate seeds of government sites so that we can be as thorough in our crawling as possible.
Government information specialists know the value of the information that government agencies gather, create, assemble, and distribute, but wouldn’t it be nice to have a book that documents that value and provides examples of how that information is used? Wouldn’t it be nice to have a book that doesn’t just list useful databases, but describes the missions and histories of the agencies that produce the information?
Back in 2013, Dr. Miriam Drake, longtime director and dean of libraries at Georgia Institute of Technology, wanted to create such a book: A book about the value of public information and how the communities that libraries serve actually use that information. The result is this new book that we think deserves the attention of practicing government information professionals and teachers:
- Public Knowledge: Access and Benefits, Edited by Miriam A. Drake and Donald T. Hawkins, Foreword by Judith Coffey Russell. Medford NJ: Information Today, Inc. (2016).
Government documents librarians know and use FDsys (and now govinfo), and USA.gov, and the Catalog of Government Publications and specialty web sites like the Census Bureau’s American Factfinder and the Bureau of Economic Analysis and The National Archives and Congress, and GPO’s federated search engine metalib, and probably at least a few more. But after the basics, it is hard to keep track of the wealth of information available and how to find it. You might know, for example, that there are 123 U.S. federal government agencies that collect and distribute important statistical data, but how do you find it and which agency is best for which statistic? Have you ever used the Library of Congress’s Performing Arts Encyclopedia, or think about the non-government, public knowledge in the LoC, such as historic newspapers online? How many of the Databases, Resources & APIs at the National Library of Medicine have you explored? You’ve used USA.gov, but have you tried Science.gov or WorldWideScience.org? Are you helping your community find datasets, but you haven’t used OSTI data explorer?
And, if you have used some of those, but haven’t had time to understand the subtle differences between databases or agencies (e.g., do you know when to use NASA Technical Reports Server and when to use The National Technical Information Service?), you will find this book useful. This book will be useful for those who answer reference questions and work with communities who need information in almost any discipline. It gives the historical context of the development of the vast government information infrastructure and describes how agencies are changing rapidly and planning for the future. If you are a new or “accidental” government information librarian, or if you teach government documents, this book is for you.
And, yes, we wrote a chapter of this book, but we’d be praising its utility even if we were not part of it. The publisher has kindly allowed us to offer you a PDF copy of the chapter we wrote for this book.
- Beyond LMGTFY*: Access to Government Information in a Networked World. by James A. Jacobs and James R. Jacobs. (*LMGTFY = “Let me google that for you”)
Every chapter is different and every chapter is worthwhile. Here is a complete list of the chapters and authors.
Table of Contents
- The Relationship Between Citizen Information Literacy and Public Information Use. Forest “Woody” Horton Jr.
- Beyond LMGTFY: Access to Government Information in a Networked World. James A. Jacobs, University of California-San Diego Library, and James R. Jacobs, Stanford University Libraries.
- Government Resources in the Classroom. Susanne Caro, Maureen and Mike Mansfield Library, University of Montana.
- The U.S. Government Publishing Office. Miriam A. Drake and Donald T. Hawkins.
- The Library of Congress. Miriam A. Drake.
- The National Library of Medicine. Katherine B. Majewski, MEDLARS Management Section, and Wanda Whitney, Reference and Web Services Section, National Library of Medicine.
- The Department of Energy Office of Scientific and Technical Information, Part 1: Extending the Reach and Impact of DOE Research Results. Brian A. Hitson and Peter M. Lincoln, Department of Energy Office of Scientific and Technical Information.
- The Department of Energy Office of Scientific and Technical Information, Part 2: Bringing the World’s Research to DOE. Brian A. Hitson and Peter M. Lincoln, Department of Energy Office of Scientific and Technical Information.
- NASA’s Scientific and Technical Information for a Changing World. Lynn Heimerl, NASA STI Program.
- The National Technical Information Service: Public Access as a Driver of Change. Gail Hodge, Ha (Information International Associates).
- Federal Statistics Past and Present. Mark Anderson, Michener Library, University of Northern Colorado.
- Agricultural Information and the National Agricultural Library. Marianne Stowell Bracke, Purdue University Libraries.
- Hidden Government Information. Miriam A. Drake.
- The Future Is Open. Barbie E. Keiser, Barbie E. Keiser, Inc.
James A. Jacobs
James R. Jacobs
We have long advocated for public access to reports from the Congressional Research Service (CRS), Congress’ think tank. But CRS reports are little known and difficult to find because they are not distributed to FDLP libraries or made public — I harvest them up from sites around the ‘net that post them when they can, but it’s pretty random.
But now, thanks to the tireless efforts of Daniel Schuman, our friend and colleague and others at the Congressional Data Coalition, public access to CRS reports seems to be gathering steam. The NY Times published an editorial yesterday entitled “Congressional Research Belongs to the Public”. There are 2 legislative efforts underway in the House and Senate to make these valuable but difficult-to-find-or-even-know-about reports publicly available. Librarians have been fighting for this forever. Now it finally looks like it might just happen!
Over the years our coalition has submitted testimony in favor of public access to these reports, most recently in March. In summary, the reports explain current legislative issues in language that everyone can understand, are written by a federal agencies that receives more than $100 million annually, and there is strong public demand for access. A detailed description of the issues at play is available here.
This congress, two legislative efforts are underway to make CRS reports public. First, the bipartisan H. Res. 34, introduced by Reps. Leonard Lance (R-NY) and Mike Quigley (D-IL), would make all reports widely distributed in Congress available to the public, except confidential memoranda and advice provided by CRS at the request of a member. Second, Rep. Quigley offered an amendment to an appropriations bill that would have required CRS to make available an index of all of its reports. Similar legislation has been introduced in the Senate in prior years.
Informative, but non-essential websites are likely candidates for being shut down.
White House says critical websites won’t be affected by shutdown, By William Matthews, Next Gov (04/07/2011).
In the event of a government shutdown, federal websites “would remain operational” if they are deemed “necessary to avoid significant damage to the execution of authorized or accepted activities,” a White House official told Nextgov in an email message late Wednesday.
…For the duration of a shutdown, agency-operated websites that are not judged to be critical “would not remain active,” the official said. That doesn’t mean they will necessarily vanish from the Internet. Rather, if they remain available, the information on them might not be up to date, and transactions submitted to agencies through the sites might not be processed until the shutdown ends…
Sites to be closed include the International Trade Administration, the Bureau of Economic Analysis, the Economics and Statistics Administration, the and the National Institute of Standards and Technology.
…Informative, but non-essential websites, such as USASpending.gov, ITDashboard.gov and Data.gov are likely candidates for being shut down…
…Agencies are expected to post notices on their Web home pages about which online features will work and which won’t during the shutdown.