Home » Posts tagged 'Government websites'
Tag Archives: Government websites
This is bad news for government transparency advocates. The Sunlight Foundation shut down its Web Integrity Project (WIP) yesterday. WIP was created almost two years ago and was tracking “tens of thousands of federal government webpages each week, has reported on and sourced hundreds of stories about federal websites, has provided materials for congressional oversight, and has engaged with the executive branch and watchdog community on web integrity issues.” Unfortunately, they were unable to secure funding to keep the project going. At least their site and all their work has been archived and will be available at the Internet Archive.
Many thanks to Toly Rinberg, Andrew Bergman, Rachel Bergman, Jon Campbell, Sarah John, and Aaron Lemelin for your fine work!
Fast forward to today, and WIP monitors tens of thousands of federal government webpages each week, has reported on and sourced hundreds of stories about federal websites, has provided materials for congressional oversight, and has engaged with the executive branch and watchdog community on web integrity issues. As a result of our efforts, federal agencies may be paying better attention to their websites and are more aware that people are watching.
In today’s America, where government websites are “the primary means by which the public receives information from and interacts with the Federal Government,” web monitoring is profoundly important at the beginning of each new administration. In the intra-agency tumult that prevails just after the inauguration of a new president, there is much potential for useful and relied-upon content to be lost and for political machinations to dominate. Many of the most significant website changes that we’ve seen at the Web Integrity Project happened within the first few months of the new Trump administration, like the removal of climate change resources from the EPA’s website, ACA-related content and pages from the HHS website, LGBT resources from the Department of Labor’s website, or the staff directory from the Office of Refugee Resettlement’s website.
With that track record of accomplishments in mind, we announce that the Sunlight Foundation will be shutting down WIP as of today. While we were unable to secure continued financial support for the program, we want to ensure that our work lives on online, true to the values of the program, so we are ensuring that our reports, the Gov404 tracker, and our blog posts are properly archived using the Internet Archive’s Wayback Machine.
I happened to surf over to the FDLP site today around 4:30pm and found the FDLP.gov had been hacked and taken over by SoWa BeZ OkA — which translates from Polish into “Owl without an eye.” The group seems to be a band of some kind, but I can’t tell. But it wasn’t just a cute 1-eyed ascii cat picture (or is it a raccoon?!), but a somewhat catchy (if a little NSFW) tune to rave by 🙂
— That is all.
Update #2 10pm PST 10/2/13 : Our friends over at the Sunlight Foundation have an interesting post, “What Happens to .gov in a Shutdown?” They explained the .gov shutdown matrix:
…drawn on an agency-by-agency basis, and the specific determination is based on the importance of the function and how illegal ceasing to do it might be. But aside from some obvious ones–national parks would be closed; the CO2 scrubber on the International Space Station would stay plugged in–it’ll be agency leadership that makes the determinations.
(and love the unix joke!)
UPDATE #1 3pm PST 10/2/13: Arstechnica, checked 56 .gov sites and found 10 that went dark. See “Shutdown of US government websites appears bafflingly arbitrary.”
A bunch of federal websites will shut down with the government, By Andrea Peterson, Washington Post, Published: September 30 at 5:28 pm.
Also: The Government Printing Office (GPO) reports: ” GPO will not be updating gpo.gov, FDLP.gov, the Catalog of Government Publications, Ben’s Guide, or be responding to askGPO questions until funding is restored. The Laurel warehouse will be closed so there will be no shipments to depository libraries.
Congressional materials will continue to be processed and posted to FDsys. Federal Register services on FDsys will be limited to documents that protect life and property. The remaining collections on FDsys will not be updated and will resume after funding is restored.”
In October, the healthcare.gov website will be the site millions of Americans use to choose their health insurance. The new site has been built in public for months, iteratively created on Github using cutting edge open-source technologies. Healthcare.gov is the rarest of birds: a next-generation website that also happens to be a .gov. It will use Jekyll, which allows developers to build a static website from dynamic components. This will make the website faster and more efficient. A fascinating story!
- Healthcare.gov: Code Developed by the People and for the People, Released Back to the People, by Alex Howard, The Atlantic (Jun 28 2013).
First, Bryan [Sivak] pledged, “everything we do will be published on GitHub,” meaning the entire code-base will be available for reuse. This is incredibly valuable because some states will set up their own state-based health insurance marketplaces. They can easily check out and build upon the work being done at the federal level….
Moreover, all content will be available through a JSON API, for even simpler reusability. Other government or private sector websites will be able to use the API to embed content from healthcare.gov. As official content gets updated on healthcare.gov, the updates will reflect through the API on all other websites.
The State of the Federal Web Report issued in late 2011 noted that Federal agencies planned to eliminate or merge several hundred domains, as part of the President’s Campaign to Cut Waste. The goal was to reduce outdated, redundant, and inactive domains. As part of this work, the .gov Task Force overseeing the process asked members of the National Digital Stewardship Alliance (NDSA) to archive and preserve all .gov Executive branch domains slated to be decommissioned or merged. NDSA members immediately agreed that an important step in this process was to preserve the content of these sites as part of our national digital heritage – instead of simply eliminating them.
Rather than start a separate, standalone project, we chose to launch a collaborative crawl under the auspices of the End of Term Web Archive project (EOT). Although the EOT project has primarily focused on transitions occurring at the end of administrative terms, part of the goal of the project is to document changes in all online presences of the US Federal government during key periods of transition, regardless of when or under what circumstances they occur. So, a comprehensive harvest, using a targeted list of domains supplied by the .gov Task Force and a general list of all Executive branch domains downloaded from data.gov, began on Saturday, October 8, 2011. The crawl concluded on November 5, 2011 and encompassed 46,278,384 captures and ~13TBs of data compressed.
Here’s a general outline of the sequence of events of the Fall 2011 crawl:
- Agencies identified recommended actions for domains in their Interim Progress Reports and Web Inventory
- The .gov Task Force collected a list of outgoing .gov domains and shared those with the NDSA
- Internet Archive crawled outgoing sites and the full suite of Executive branch domains (note: for some resources it took several weeks to crawl sites in their entirety)
- GSA eliminated domains after they were archived
The End of Term Web Archive project, including the archival capture of Executive Branch domains last Fall, is not meant in any way to satisfy agency records management obligations. The domains are archived solely for the purpose of preservation and posterity. Agencies separately discuss records management obligations and handle those processes independently. However, we do make every effort to replicate resources in their entirety – at least what can be supported by available tools, techniques and best practices. Some portion of every web site is housed server-side and that subset of content and/or user experience cannot be archived and replicated using traditional web crawler/capture software that is dependent on files being downloaded to the client.
The biggest challenge of this project, however, was not Web 2.0/Web 3.0 server side rendering or content serving. The biggest limiting factor was time. When we archive resources, there is a big difference between visiting and sampling a web resource using a set of scoping rules and guidelines versus going out and attempting to “drain” a site, i.e. replicate it soup to nuts as fast as the server can respond to your requests. Some of these resources house thousands to tens of thousands of PDF files, videos &/or other network intensive resources. And, most servers are programmed to meter how fast they respond to requests from the same IP address or an IP address range, so we have to wait appropriate intervals between requests in order to avoid being ignored or blacklisted by an automated process. There are ways to parallelize capture, but without dedicated funding, few institutions are able to marshal those kinds of resources on a volunteer basis.
The End of Term project is built on the collaborative best efforts of a network of partners who share a passion for preservation of online government.
For more information about the streamlining of agency website management, please visit www.usa.gov/WebReform.shtml. This effort is now part of the larger Digital Government Strategy.
For more information on the End of Term Web Archive project, please visit http://eotarchive.cdlib.org, and follow us @eotarchive.
Kris Carpenter Negulescu
Director Web Group