Home » Posts tagged 'Web archiving'

Tag Archives: Web archiving

Our mission

Free Government Information (FGI) is a place for initiating dialogue and building consensus among the various players (libraries, government agencies, non-profit organizations, researchers, journalists, etc.) who have a stake in the preservation of and perpetual free access to government information. FGI promotes free government information through collaboration, education, advocacy and research.

EDGI Releases Dataset of Federal Environmental Website Changes Under Trump

Thanks to the Environmental Data and Governance Initiative (EDGI) for releasing the Federal Environmental Web Tracker. This tool is a public dataset of searchable records of approximately 1,500 significant changes to federal agency environmental webpages under the Trump administration, these changes were almost always precursors or responses to policy changes. These changes came from a “list of 25,000 federal Web pages related to climate, energy, and the environment, including pages for 20 federal agencies such as EPA, NOAA, and NASA.” Here’s the Tracker’s explanatory page for more context and background.

EDGI continues to do important work in tracking the federal .gov Web domain. EDGI’s work goes hand in hand with the work of the End of Term Web Archive which has harvested the .gov/.mil Web space every 4 years since 2008 and is now deep into its 2020 harvest. And we’re still accepting nominations, so go to the End of Term Nomination Tool hosted by the University of North Texas (UNT) library. Help us collect a snapshot of the federal Web domain!

Today, the Environmental Data & Governance Initiative (EDGI) publishes searchable records of approximately 1,500 changes to federal agency environmental webpages under the Trump administration. For four years, EDGI’s website monitoring team has identified and catalogued significant changes to federal websites using their open source monitoring software. EDGI’s Federal Environmental Web Tracker makes records of significant changes publicly available.

The information that’s available on federal websites can have important policy implications. As EDGI has often reported over the past four years, changes to the information that’s available on federal websites are almost always precursors or responses to policy changes. Federal websites provide information that the public is likely to access before commenting on a proposed rule to learn about current regulatory efforts, the science underlying a new policy decision, or likely impacts of a proposed rule. The information found (or not found) on a federal website can impact public participation in regulatory processes.

In the weeks after Trump’s election in November 2016, newly-formed EDGI compiled a list of 25,000 federal web pages related to climate, energy, and the environment, including pages for 20 federal agencies such as EPA, NOAA, and NASA. First using proprietary software and then building and using novel open source software, EDGI has compared versions of these web pages weekly since January 2017. This new dataset represents the documented changes that EDGI’s website monitoring team flagged as significant in some way over the past four years.

EDGI’s Federal Environmental Web Tracker gives journalists, academic researchers, and the public data that can be used to provide insight, documentation, and analysis of the information policies and priorities of the Trump administration.

The Federal Environmental Web Tracker will be updated quarterly as EDGI continues to monitor federal environmental websites.

HT to InfoDocket!

HHS Plans to Delete 20 Years of Critical Medical Guidelines on monday July 16!

According to the DailyBeast, the National Guideline Clearinghouse (NGC), a critical database of medical guidelines, is set to go dark on monday July 16 because “federal funding through Agency for Healthcare Research and Quality (AHRQ) will no longer be available to support the NGC as of that date.” For any questions, please contact Mary.Nix@ahrq.hhs.gov.

Of course, the Internet Archive has archived this site many times since 1998(!), but sadly, the Web archive hasn’t collected the most critical information because it’s hidden behind a database query.

This is completely unacceptable!

The Trump Administration is planning to eliminate a vast trove of medical guidelines that for nearly 20 years has been a critical resource for doctors, researchers and others in the medical community. Maintained by the Agency for Healthcare Research and Quality [AHRQ], part of the Department of Health and Human Services, the database is known as the National Guideline Clearinghouse [NGC], and it’s scheduled to “go dark,” in the words of an official there, on July 16.

via HHS Plans to Delete 20 Years of Critical Medical Guidelines Next Week.

The EPA’s Website after a year of climate change censorship

Here’s a good article from Time Magazine“Here’s What the EPA’s Website Looks Like After a Year of Climate Change Censorship” — which accurately reports how the Trump Administration and EPA Administrator Scott Pruitt have changed, skewed or deleted government information from the EPA Website for crass political purposes. For more in-depth analysis of the issue of information scrubbing from federal websites, one should look to the work of the Environmental Data and Governance Initiative (EDGI) and especially their reports: “Changing the Digital Climate” and “The EPA Under Siege”.

According to former government officials and EPA staffers, the level of scrutiny is without precedent. In the hands of an administration that has eschewed facts for their alternative cousins, the agency’s site is increasingly unmoored from its scientific core.

“In my experience, new administrations might come in and change the appearance of an agency website or the way they present information, but this is an unprecedented attempt to delete or bury credible scientific information they find politically inconvenient,” Heather Zichal, a senior fellow at the Atlantic Council’s Global Energy Center, and previously President Barack Obama’s top White House adviser on energy and climate change, tells TIME.

The EPA’s site is now riddled with missing links, redirecting pages and buried information. Over the past year, terms like “fossil fuels”, “greenhouse gases” and “global warming” have been excised. Even the term “science” is no longer safe.

Christine Todd Whitman, the EPA Administrator under George W. Bush, says the overhaul is “to such an extreme degree that [it] undermines the credibility of the site”…

Of the more than 25,000 web pages tracked by the Environmental Data and Governance Initiative (EDGI) since Trump’s election, they say the EPA’s have been hit hardest. One section, which provided local communities with resources for combating climate change, disappeared for months only to resurface heavily redacted, including just 175 of its 380 pages.

via The EPA’s Website After a Year of Climate Change Censorship | Time.

These Advocates Want to Make Sure Our Data Doesn’t Disappear

Here’s another story about data rescue and the preservation of government information, this time from PC Magazine UK. Though the last data refuge event was in Denton, TX in May and the 2016 End of Term crawl has finished its collection work and will soon have its 200TB of data publicly accessible, there still remains much interest — and not a little bit of worry — about the collection and preservation of govt information and data. And with stories continuing to come out — eg this one from the Guardian entitled “Another US agency deletes references to climate change on government website” — about the US government agencies scrubbing or significantly altering their Websites, this issue will not be going away any time soon.

“Somewhere around 20 percent of government info is web-accessible,” said Jim (sic.) Jacobs, the Federal Government Information Librarian at Stanford University Library. “That’s a fairly large chunk of stuff that’s not available. Though agencies have their own wikis and content management systems, the only time you find out about some of it is if someone FOIAs it.”

To be sure, a great deal of information was indeed captured and now resides on non-government servers. Between Data Refuge events and projects such as the 2016 End-of-Term Crawl, over 200TB of government websites and data were archived. But rescue organizers began to realize that piecemeal efforts to make complete copies of terabytes of government agency science data could not realistically be sustained over the long term—it would be like bailing out the Titanic with a thimble.

So although Data Rescue Denton ended up being one of the final organized events of its kind, the collective effort has spurred a wider community to work in concert toward making more government data discoverable, understandable, and usable, Jacobs wrote in a blog post.

via Feature: These Advocates Want to Make Sure Our Data Doesn’t Disappear.

Attend the FGI virtual EOT seed nomination sprint. Help make and preserve .gov history!

If you’ve been waiting for your chance to make history: now’s the time!

Please join us for the FGI virtual End of Term Project Web archiving nomination sprint on Wednesday 11 January 2017 from 9AM – 11AM Pacific / 12 noon – 2PM EST. During that time, We’ll set up a virtual conference room, give a brief presentation of the End of Term crawl and the ins and outs of nominating seeds and then volunteers will be on hand to answer your questions, suggest agencies for deep exploration, and take information about databases and other resources that are tricky to capture with traditional web archiving. RSVP TODAY!

If you’re new to the End of Term Project, it’s a collaborative project to collect and preserve public United States government web sites prior to the end of the current presidential administration on January 20, 2017. Working together, the Library of Congress, California Digital Library, University of North Texas Libraries, Internet Archive, George Washington University Libraries, Stanford University Libraries, and the U.S. Government Publishing Office (GPO) are conducting a thorough Web harvest of the .gov/.mil domain based on prioritized lists of URLs, including social media. As it did in 2008 and 2012 (previous harvests are accessible here), the project’s goal is to document federal agencies’ presence on the World Wide Web during the transition of Presidential administrations, to enhance the existing archival Internet collections, and to give the public access to archived digital government information. This broad comprehensive crawl of the .gov/.mil domain is based on a prioritized list of URLs, including social media.

This sprint to nominate seeds is a big part of making it happen! Hundreds of volunteers and institutions are already involved in the effort. We hope you’ll join the conversation and the fun. There may even be a few (completely non-monetary) prizes for top contributors.

You can pre-register here. We’ll contact you as the date gets closer with access information for the virtual conference.

The final deadline to nominate URLs prior to Inauguration Day is Friday, January 13th, so even if you can’t sprint with us, keep the nominations coming! Questions? Email us at admin AT freegovinfo DOT com.

Archives