Home » Posts tagged 'Web archiving'
Tag Archives: Web archiving
According to the DailyBeast, the National Guideline Clearinghouse (NGC), a critical database of medical guidelines, is set to go dark on monday July 16 because “federal funding through Agency for Healthcare Research and Quality (AHRQ) will no longer be available to support the NGC as of that date.” For any questions, please contact Mary.Nix@ahrq.hhs.gov.
Of course, the Internet Archive has archived this site many times since 1998(!), but sadly, the Web archive hasn’t collected the most critical information because it’s hidden behind a database query.
This is completely unacceptable!
The Trump Administration is planning to eliminate a vast trove of medical guidelines that for nearly 20 years has been a critical resource for doctors, researchers and others in the medical community. Maintained by the Agency for Healthcare Research and Quality [AHRQ], part of the Department of Health and Human Services, the database is known as the National Guideline Clearinghouse [NGC], and it’s scheduled to “go dark,” in the words of an official there, on July 16.
Here’s a good article from Time Magazine — “Here’s What the EPA’s Website Looks Like After a Year of Climate Change Censorship” — which accurately reports how the Trump Administration and EPA Administrator Scott Pruitt have changed, skewed or deleted government information from the EPA Website for crass political purposes. For more in-depth analysis of the issue of information scrubbing from federal websites, one should look to the work of the Environmental Data and Governance Initiative (EDGI) and especially their reports: “Changing the Digital Climate” and “The EPA Under Siege”.
According to former government officials and EPA staffers, the level of scrutiny is without precedent. In the hands of an administration that has eschewed facts for their alternative cousins, the agency’s site is increasingly unmoored from its scientific core.
“In my experience, new administrations might come in and change the appearance of an agency website or the way they present information, but this is an unprecedented attempt to delete or bury credible scientific information they find politically inconvenient,” Heather Zichal, a senior fellow at the Atlantic Council’s Global Energy Center, and previously President Barack Obama’s top White House adviser on energy and climate change, tells TIME.
The EPA’s site is now riddled with missing links, redirecting pages and buried information. Over the past year, terms like “fossil fuels”, “greenhouse gases” and “global warming” have been excised. Even the term “science” is no longer safe.
Christine Todd Whitman, the EPA Administrator under George W. Bush, says the overhaul is “to such an extreme degree that [it] undermines the credibility of the site”…
Of the more than 25,000 web pages tracked by the Environmental Data and Governance Initiative (EDGI) since Trump’s election, they say the EPA’s have been hit hardest. One section, which provided local communities with resources for combating climate change, disappeared for months only to resurface heavily redacted, including just 175 of its 380 pages.
Here’s another story about data rescue and the preservation of government information, this time from PC Magazine UK. Though the last data refuge event was in Denton, TX in May and the 2016 End of Term crawl has finished its collection work and will soon have its 200TB of data publicly accessible, there still remains much interest — and not a little bit of worry — about the collection and preservation of govt information and data. And with stories continuing to come out — eg this one from the Guardian entitled “Another US agency deletes references to climate change on government website” — about the US government agencies scrubbing or significantly altering their Websites, this issue will not be going away any time soon.
“Somewhere around 20 percent of government info is web-accessible,” said Jim (sic.) Jacobs, the Federal Government Information Librarian at Stanford University Library. “That’s a fairly large chunk of stuff that’s not available. Though agencies have their own wikis and content management systems, the only time you find out about some of it is if someone FOIAs it.”
To be sure, a great deal of information was indeed captured and now resides on non-government servers. Between Data Refuge events and projects such as the 2016 End-of-Term Crawl, over 200TB of government websites and data were archived. But rescue organizers began to realize that piecemeal efforts to make complete copies of terabytes of government agency science data could not realistically be sustained over the long term—it would be like bailing out the Titanic with a thimble.
So although Data Rescue Denton ended up being one of the final organized events of its kind, the collective effort has spurred a wider community to work in concert toward making more government data discoverable, understandable, and usable, Jacobs wrote in a blog post.
If you’ve been waiting for your chance to make history: now’s the time!
Please join us for the FGI virtual End of Term Project Web archiving nomination sprint on Wednesday 11 January 2017 from 9AM – 11AM Pacific / 12 noon – 2PM EST. During that time, We’ll set up a virtual conference room, give a brief presentation of the End of Term crawl and the ins and outs of nominating seeds and then volunteers will be on hand to answer your questions, suggest agencies for deep exploration, and take information about databases and other resources that are tricky to capture with traditional web archiving. RSVP TODAY!
If you’re new to the End of Term Project, it’s a collaborative project to collect and preserve public United States government web sites prior to the end of the current presidential administration on January 20, 2017. Working together, the Library of Congress, California Digital Library, University of North Texas Libraries, Internet Archive, George Washington University Libraries, Stanford University Libraries, and the U.S. Government Publishing Office (GPO) are conducting a thorough Web harvest of the .gov/.mil domain based on prioritized lists of URLs, including social media. As it did in 2008 and 2012 (previous harvests are accessible here), the project’s goal is to document federal agencies’ presence on the World Wide Web during the transition of Presidential administrations, to enhance the existing archival Internet collections, and to give the public access to archived digital government information. This broad comprehensive crawl of the .gov/.mil domain is based on a prioritized list of URLs, including social media.
This sprint to nominate seeds is a big part of making it happen! Hundreds of volunteers and institutions are already involved in the effort. We hope you’ll join the conversation and the fun. There may even be a few (completely non-monetary) prizes for top contributors.
You can pre-register here. We’ll contact you as the date gets closer with access information for the virtual conference.
The final deadline to nominate URLs prior to Inauguration Day is Friday, January 13th, so even if you can’t sprint with us, keep the nominations coming! Questions? Email us at admin AT freegovinfo DOT com.
Some of our readers may have seen this announcement already. But in case not, we need your help to preserve the .gov domain. See the announcement below to find out how.
How would YOU like to help preserve the United States federal government .web domain for future generations? But, that’s too huge of a swath of Internet real estate for any one person to preserve, right?!
Wrong! The volunteers working on the End of Term Web Archiving Project are doing just that. But we need your help.
The Library of Congress, California Digital Library, University of North Texas Libraries, Internet Archive, George Washington University Libraries, Stanford University Libraries, and the U.S. Government Publishing Office (GPO) have joined together for a collaborative project to preserve public United States Government websites at the end of the current presidential administration ending January 20, 2017. This web harvest — like its predecessors in 2008 and 2012 — is intended to document the federal government’s presence on the World Wide Web during the transition of Presidential administrations and to enhance the existing collections of the partner institutions. This broad comprehensive crawl of the .gov domain will include as many federal .gov sites as we can find, plus federal content in other domains (such as .mil, .com, and social media content).
And that’s where YOU come in. You can help the project immensely by nominating your favorite .gov website, other federal government websites, or governmental social media account with the End of Term Nomination Tool (link below). You can nominate as many sites as you want. Nominate early and often! Tell your friends, family and colleagues to do the same. Help us preserve the .gov domain for posterity, public access, and long-term preservation. Only YOU can help prevent … link rot!
- End of Term 2016 Nomination Tool
- About the End of Term 2016 Project
- End of Term Web Archive (2008 and 2012)
- Follow us on Twitter @eotarchive
For more information, contact email@example.com