Home » Posts tagged 'End of term archive'
Tag Archives: End of term archive
End of Term crawl 2024 is now underway!
Well it’s that time again. The 2024 End of Term web crawl of the federal .gov/.mil web space (and other domains 🙂 ) has begun. We have just posted our first public announcement on the Internet Archive blog.
As we have done since 2008 (NARA did the first comprehensive crawl in 2004), a group of volunteers from the Internet Archive, GPO, Library of Congress, NARA, University of North Texas, and Stanford will be doing a “comprehensive” web harvest of the Federal government’s web space. For more information and background on the project, see our home page at https://eotarchive.org/. These archives can be searched full-text via the Internet Archive’s collections search (https://web.archive.org/) and also downloaded as bulk data for machine-assisted analysis from the project site.
But MOST IMPORTANTLY, we need YOUR help! We are currently accepting nominations for websites to be included in the 2024 End of Term Web Archive. Submit a url nomination by going to our nomination tool (hosted by University of North Texas!) and clicking the big yellow “add a url” button in the top right:
https://digital2.library.unt.edu/nomination/eth2024/
We encourage you to nominate any and all U.S. federal government websites that you want to make sure get captured. We’re also interested in any and all urls of federal sites that are NOT hosted on .gov/.mil (there are lots of federal government sites hosted on .edu, .org, and even .com! That includes social media but also research labs and other private/public partnerships). We already have a solid list of top level domains (eg epa.gov, congress.gov, defense.mil etc). Nominating urls deep within .gov/.mil websites helps to make our web crawls as thorough and complete as possible. Prizes will be awarded for most url nominations by individuals and institutions!
So get to it! Help us do the most complete crawl we can and also assure that the sites/publications/videos/data etc that are most important to YOU make it into the archive!!
EDGI Releases Dataset of Federal Environmental Website Changes Under Trump
Thanks to the Environmental Data and Governance Initiative (EDGI) for releasing the Federal Environmental Web Tracker. This tool is a public dataset of searchable records of approximately 1,500 significant changes to federal agency environmental webpages under the Trump administration, these changes were almost always precursors or responses to policy changes. These changes came from a “list of 25,000 federal Web pages related to climate, energy, and the environment, including pages for 20 federal agencies such as EPA, NOAA, and NASA.” Here’s the Tracker’s explanatory page for more context and background.
EDGI continues to do important work in tracking the federal .gov Web domain. EDGI’s work goes hand in hand with the work of the End of Term Web Archive which has harvested the .gov/.mil Web space every 4 years since 2008 and is now deep into its 2020 harvest. And we’re still accepting nominations, so go to the End of Term Nomination Tool hosted by the University of North Texas (UNT) library. Help us collect a snapshot of the federal Web domain!
Today, the Environmental Data & Governance Initiative (EDGI) publishes searchable records of approximately 1,500 changes to federal agency environmental webpages under the Trump administration. For four years, EDGI’s website monitoring team has identified and catalogued significant changes to federal websites using their open source monitoring software. EDGI’s Federal Environmental Web Tracker makes records of significant changes publicly available.
The information that’s available on federal websites can have important policy implications. As EDGI has often reported over the past four years, changes to the information that’s available on federal websites are almost always precursors or responses to policy changes. Federal websites provide information that the public is likely to access before commenting on a proposed rule to learn about current regulatory efforts, the science underlying a new policy decision, or likely impacts of a proposed rule. The information found (or not found) on a federal website can impact public participation in regulatory processes.
In the weeks after Trump’s election in November 2016, newly-formed EDGI compiled a list of 25,000 federal web pages related to climate, energy, and the environment, including pages for 20 federal agencies such as EPA, NOAA, and NASA. First using proprietary software and then building and using novel open source software, EDGI has compared versions of these web pages weekly since January 2017. This new dataset represents the documented changes that EDGI’s website monitoring team flagged as significant in some way over the past four years.
EDGI’s Federal Environmental Web Tracker gives journalists, academic researchers, and the public data that can be used to provide insight, documentation, and analysis of the information policies and priorities of the Trump administration.
The Federal Environmental Web Tracker will be updated quarterly as EDGI continues to monitor federal environmental websites.
Nominations sought for the U.S. Federal Government Domain End of Term 2020 Web Archive
It’s that time again folks. The End of Term Archive is once again gearing up to harvest the .gov/.mil Web domain. For the End of Term 2020, The Library of Congress, University of North Texas Libraries, Internet Archive, Stanford University Libraries, and the U.S. Government Publishing Office (GPO) are joining efforts again, this time with new partners Environmental Data & Governance Initiative (EDGI) and the General Services Administration (GSA), to preserve public United States Government websites at the end of the current presidential administration ending January 20, 2021. This web harvest – like its predecessors in 2008, 2012, and 2016 – is intended to document the federal government’s presence on the World Wide Web during the transition of Presidential administrations and to enhance the existing collections of the partner institutions. This broad comprehensive crawl of the .gov domain will include as many federal .gov sites as we can find, plus federal content in other domains (such as .mil, .com, and social media content) and FTP’d datasets.
Here’s the official announcement asking for YOUR help. Please forward widely!
WE NEED YOUR HELP TO PRESERVE THE .GOV WEB DOMAIN!
How would YOU like to help preserve the United States federal government .gov/.mil Web domain for future generations? But, that’s too huge of a swath of Internet real estate for any one person or organization to preserve, right?!
Wrong! The volunteers working on the End of Term Web Archiving Project are doing just that. BUT WE NEED YOUR HELP!
And that’s where YOU come in. You can help the project immensely by nominating your favorite .gov website/document/dataset, other federal government websites, or governmental social media account with the End of Term Nomination Tool. You can nominate as many sites as you want. Nominate early and often! Win a prize for the most seed nominations!! Tell your friends, family and colleagues to do the same. Help us preserve the .gov domain for posterity, public access, and long-term preservation. Only YOU can help prevent … link rot!
- End of Term 2020 Nomination Tool: Submit URLs here.
- About the End of Term 2020 Project
- End of Term Web Archive (2008, 2012, and 2016)
- Follow us on Twitter @eotarchive
- For more information, contact us at eot-info AT archive DOT org
The EPA’s Website after a year of climate change censorship
Here’s a good article from Time Magazine — “Here’s What the EPA’s Website Looks Like After a Year of Climate Change Censorship” — which accurately reports how the Trump Administration and EPA Administrator Scott Pruitt have changed, skewed or deleted government information from the EPA Website for crass political purposes. For more in-depth analysis of the issue of information scrubbing from federal websites, one should look to the work of the Environmental Data and Governance Initiative (EDGI) and especially their reports: “Changing the Digital Climate” and “The EPA Under Siege”.
According to former government officials and EPA staffers, the level of scrutiny is without precedent. In the hands of an administration that has eschewed facts for their alternative cousins, the agency’s site is increasingly unmoored from its scientific core.
“In my experience, new administrations might come in and change the appearance of an agency website or the way they present information, but this is an unprecedented attempt to delete or bury credible scientific information they find politically inconvenient,” Heather Zichal, a senior fellow at the Atlantic Council’s Global Energy Center, and previously President Barack Obama’s top White House adviser on energy and climate change, tells TIME.
The EPA’s site is now riddled with missing links, redirecting pages and buried information. Over the past year, terms like “fossil fuels”, “greenhouse gases” and “global warming” have been excised. Even the term “science” is no longer safe.
Christine Todd Whitman, the EPA Administrator under George W. Bush, says the overhaul is “to such an extreme degree that [it] undermines the credibility of the site”…
Of the more than 25,000 web pages tracked by the Environmental Data and Governance Initiative (EDGI) since Trump’s election, they say the EPA’s have been hit hardest. One section, which provided local communities with resources for combating climate change, disappeared for months only to resurface heavily redacted, including just 175 of its 380 pages.
via The EPA’s Website After a Year of Climate Change Censorship | Time.
2016 End of Term Web Archive is now available
The 2016 end of term .gov/.mil web crawl is now available! We collected approximately 300TB of government websites which includes over “70 million html pages, over 40 million PDFs and, towards the other end of the spectrum and for semantic web aficionados, 8 files of the text/turtle mime type” as well as @100TB of public data via .gov FTP file servers! Thanks to everyone who participated on the project and the thousands(!) of seed nominators, both individuals and those that came in via DataRefuge and EDGI tools and public events.
The End of Term Web Archive contains federal government websites (.gov, .mil, etc) in the Legislative, Executive, or Judicial branches of the government. Websites that were at risk of changing (i.e., whitehouse.gov) or disappearing altogether during government transitions were captured. Local government websites, or any other site not part of the federal government domain were out of scope.
Latest Comments