Home » Posts tagged 'EOT'
Tag Archives: EOT
Here’s a good article from Time Magazine — “Here’s What the EPA’s Website Looks Like After a Year of Climate Change Censorship” — which accurately reports how the Trump Administration and EPA Administrator Scott Pruitt have changed, skewed or deleted government information from the EPA Website for crass political purposes. For more in-depth analysis of the issue of information scrubbing from federal websites, one should look to the work of the Environmental Data and Governance Initiative (EDGI) and especially their reports: “Changing the Digital Climate” and “The EPA Under Siege”.
According to former government officials and EPA staffers, the level of scrutiny is without precedent. In the hands of an administration that has eschewed facts for their alternative cousins, the agency’s site is increasingly unmoored from its scientific core.
“In my experience, new administrations might come in and change the appearance of an agency website or the way they present information, but this is an unprecedented attempt to delete or bury credible scientific information they find politically inconvenient,” Heather Zichal, a senior fellow at the Atlantic Council’s Global Energy Center, and previously President Barack Obama’s top White House adviser on energy and climate change, tells TIME.
The EPA’s site is now riddled with missing links, redirecting pages and buried information. Over the past year, terms like “fossil fuels”, “greenhouse gases” and “global warming” have been excised. Even the term “science” is no longer safe.
Christine Todd Whitman, the EPA Administrator under George W. Bush, says the overhaul is “to such an extreme degree that [it] undermines the credibility of the site”…
Of the more than 25,000 web pages tracked by the Environmental Data and Governance Initiative (EDGI) since Trump’s election, they say the EPA’s have been hit hardest. One section, which provided local communities with resources for combating climate change, disappeared for months only to resurface heavily redacted, including just 175 of its 380 pages.
The 2016 end of term .gov/.mil web crawl is now available! We collected approximately 300TB of government websites which includes over “70 million html pages, over 40 million PDFs and, towards the other end of the spectrum and for semantic web aficionados, 8 files of the text/turtle mime type” as well as @100TB of public data via .gov FTP file servers! Thanks to everyone who participated on the project and the thousands(!) of seed nominators, both individuals and those that came in via DataRefuge and EDGI tools and public events.
The End of Term Web Archive contains federal government websites (.gov, .mil, etc) in the Legislative, Executive, or Judicial branches of the government. Websites that were at risk of changing (i.e., whitehouse.gov) or disappearing altogether during government transitions were captured. Local government websites, or any other site not part of the federal government domain were out of scope.
Drop everything and watch this presentation from the 2017 Code4Lib conference that took place in Los Angeles March 6-9, 2017. Heck, watch the entire proceedings because there is a bunch of interesting and thoughtful stuff going on in the world of libraries and technology! But in particular, check out Matt Zumwalt’s presentation “How the distributed web could bring a new Golden Age for Libraries” — after submitting his talk, he changed the new title to “Storing data together: the movement to decentralize data and how libraries can lead it” because of the DataRefuge movement.
Zumwalt (aka @FLyingZumwalt on twitter), works at Protocol Labs, one of the primary developers of IPFS, the Interplanetary File System (IPFS) — grok their tagline “HTTP is obsolete. It’s time for the distributed, permanent web!” He has spent much of his spare time over the last 9 months working with groups like EDGI, DataRefuge, and the Internet Archive to help preserve government datasets.
Here’s what Matt said in a nutshell: The Web is precarious. But using peer-to-peer distributed network architecture, we can “store data together”, we can collaboratively preserve and serve out government data. This resonates with me as an FDLP librarian. What if a network of FDLP libraries actually took this on? This isn’t some far-fetched, scifi idea. The technologies and infrastructures are already there. Over the last 9 months, researchers, faculty and public citizens around the country have already gotten on board with this idea. Libraries just have to get together and agree that it’s a good thing to collect/download, store, describe and serve out government information. Together we can do this!
Matt’s talk starts at 3:07:41 of the YouTube video below. Please watch it, let his ideas sink in, share it, start talking about it with your colleagues and administrators in your library, and get moving. Government information could be the great test case for the distributed web and a new Golden Age for Libraries!
This presentation will show how the worldwide surge of work on distributed technologies like the InterPlanetary File System (IPFS) opens the door to a flourishing of community-oriented librarianship in the digital age. The centralized internet, and the rise of cloud services, has forced libraries to act as information silos that compete with other silos to be the place where content and metadata get stored. We will look at how decentralized technologies allow libraries to break this pattern and resume their missions of providing discovery, access and preservation services on top of content that exists in multiple places.
Please tune in next Wednesday, March 29, 2017 from 9am – 10am Pacific / 12:00 – 1:00pm Eastern for the next Help! I’m an Accidental Government Information Librarian Webinar “Saving government data: A conversation with the future.” You’ll need to RSVP for the session in order to get the link to the WebEx live session. “See” you there!
Help! I’m an Accidental Government Information Librarian presents … Saving government data: A conversation with the future, on Wednesday, March 29, 2017 from 12:00 – 1:00 p.m. (Eastern).
In recent months, the DataRefuge project has collaborated with hundreds of volunteers around the United States to collect, describe, and store federal data that support climate and environmental research and advocacy. This project, and others like it, works in conjunction with the End of Term Web Archive to capture and make available federal web content during administrative transitions.
Our discussion will explore the fragility of digital information, and expand on ideas about what data is. We’ll talk about current projects and efforts, and explore the future of this work. Finally, we’ll address the concept of sustainability, and propose a paradigm of empowered experimentation that aligns with our values and roles within libraries.
We will meet together for Session #69, online on Wednesday, March 29, 2017 from 12:00 – 1:00 p.m. (Eastern). Please RSVP for the session using this link: http://bit.ly/GRS-Session69
We will use WebEx for the live session. Information on testing and accessing the session will be made available when you register.
The session will be recorded and available after the live session, linked from the NCLA GRS web page (http://www.nclaonline.org/government-resources).
Laurie Allen is the Assistant Director for Digital Scholarship in the Penn Libraries, where she leads a group working to expand the capacity of researchers at Penn to create and share scholarship in new forms. The group engages in digital project development, data management and curation, mapping, experimentations with emerging research methods, and open access publishing. In late 2016, Allen was part of the group that started Data Refuge, and has been involved in bringing together a group of collaborators to form a network of libraries, open data activists and open government efforts.
James A. Jacobs is Data Services Librarian Emeritus, University of California San Diego. He has more than 25 years experience working with digital information, digital services, and digital library collections. He is a technical consultant and advisor to the Center for Research Libraries in the auditing and certification of digital repositories using the Trusted Repository Audit Checklist (TRAC) and related CRL criteria. He served as Data Services Librarian at the University of California San Diego and co-taught the ICPSR summer workshop, “Providing Social Science Data Services: Strategies for Design and Operation”. He is a co-founder of Free Government Information.
James R. Jacobs is the US Government Information Librarian at Stanford University Libraries where he works on both collection development as well as digital projects like LOCKSS-USDOCS. He is a member of ALA’s Government Documents Roundtable (GODORT) and served a 3-year term on Depository Library Council to the Public Printer, including serving as DLC Chair. He is a co-founder of Free Government Information (freegovinfo.info) and Radical Reference (radicalreference.info) and is on the board of Question Copyright, a 501(c)(3) non-profit organization that promotes a better public understanding of the effects of copyright, and encourages the development of alternatives to information monopolies.
Shari Laster is the Government Information Librarian and Data Services Librarian at the University of California, Santa Barbara. She currently serves as Assistant Chair/Chair-Elect for the Government Documents Round Table of the American Library Association, and is a past chair of the Depository Library Council, the advisory body for the Federal Depository Library Program.
(Editor’s note: this post is the second of two guest editorials on Libraries Network, a nascent collaborative effort of the Association of Research Libraries (ARL) spurred by the work of the DataRefuge project, End of Term crawl, and other volunteer efforts to preserve data and content from the .gov/.mil domain. The first post was pointed to libraries, the second to govt agencies. Please leave a comment of what you think! JRJ)
This moment in history provides us with a rare opportunity to go beyond short-term data rescue and set the much needed foundation for the long-term future of preservation of government information.
Awareness of risk. At the moment, more people than ever are aware of the risk of relying solely on the government to preserve its own information. This was not true even six months ago. This awareness goes far beyond government information librarians and archivists. It includes the communities that use government information (our Designated Communities!) and the government employees who devote their careers to creating this information. It includes our colleagues, our professional organizations, and library managers.
This awareness is documented in the many stories in the popular press this year about massive “data rescue” projects drawing literally hundreds of volunteers. It is also demonstrated by the number of people nominating seeds (URLs) and the number of seeds nominated for the current End of Term harvest. These have increased by nearly an order of magnitude or more over 2012.
Awareness of need for planning. But beyond the numbers, more people are learning first-hand that rescuing information at the end of its life-cycle can be difficult, incomplete, and subject to error and even loss. It is clear that last minute rescue is essential in early 2017. But it is also clear that, in the future, efficient and effective preservation requires planning. This means that government agencies need to plan for the preservation of the information they create at the beginning of the life-cycle of that information — even before it is actually created.
Opportunity to create demonstrable value. This awareness provides libraries with the opportunity to lead a movement to change government information policies that affect long-term preservation of and access to government information. By promoting this change, libraries will be laying the groundwork for future long-term preservation of information that their communities value highly. This provides an exceptional opportunity to work with motivated and inspired user communities toward a common goal. This is good news at a time when librarians are eager to demonstrate the value of libraries.
A model exists. And there is more good news. The model for a long-term government information policy not only exists, but libraries are already very familiar with it. In 2010, federal granting agencies like NSF, National Institutes of Health and Department of Energy started requiring researchers who receive Federal grants to develop Data Management Plans (DMPs) for the data collected and analyzed during the research process. Thus, data gathered at government expense by researchers must have a Plan to archive that data and make it available to other researchers. The requirement for DMPs has driven a small revolution of data management in libraries.
Ironically, there is no similar requirement for government agencies to develop a plan for the long-term management of information they gather and produce. There are, of course, a variety of requirements for managing government “Records” but there are several problems with the existing regulations.
Gaps in existing regulations. The Federal Records Act and related laws and regulations cover only a portion of the huge amount of information gathered and created by the government. In the past, it was relatively easy to distinguish between “publications” and “Records” but, in the age of digital information, databases, and transactional e-government it is much more difficult to do so. Official executive agency “Records Schedules,” which are approved by the National Archives and Records Administration (NARA), define only a subset of information gathered and created by an agency as Records suitable for deposit with NARA. Further, the implementation of those Records Schedules are subject to interpretation by executive agency political appointees who may not always have preservation as their highest priority. This can make huge swaths of valuable information ineligible for deposit with NARA as Records.
Government data, documents, and publications that are not deemed official Records have no long-term preservation plan at all. In the paper-and-ink world, many agency publications that did not qualify as Records were printed by or sent to the Government Publishing Office (GPO) and deposited in Federal Depository Library Program (FDLP) libraries around the country (currently 1,147 libraries). Unfortunately, a perfect storm of policies and procedures has blocked FDLP libraries from preserving this huge class of government information. A 1983 court decision (INS v. Chadha, 462 U.S. 919, 952) makes it impossible to require agencies to deposit documents with the Government Publishing Office (GPO) or FDLP. The 1980 Paperwork Reduction Act (44 U.S.C. §§ 3501–3521) and the Office of Management and budget (OMB)’s Circular A-130 have made it more difficult to distribute government information to FDLP libraries. The shift to born-digital information has decentralized publishing and distribution, and virtually eliminated best practices of meta-data creation and standardization. GPO’s Dissemination and Distribution Policy has severely limited the information it will distribute to FDLP libraries. Together, this “perfect storm” has reduced the deposit of this class of at-risk government information into FDLP libraries by ninety percent over the last twenty years.
The Solution: Information Management Plans. To plug the gaps in existing regulations, government agencies should be required to treat their own information with as much care as data gathered by researchers with government funding. What is needed is a new regulation that requires agencies to have Information Management Plans (IMPs) for all the information they collect, aggregate, and create.
We have proposed to the OMB a modification to their policy OMB Circular A-130: Managing Information as a Strategic Resource that would require every government agency to have an Information Management Plan.
Every government agency must have an “Information Management Plan” for the information it creates, collects, processes, or disseminates. The Information Management Plan must specify how the agency’s public information will be preserved for the long-term including its final deposit in a reputable, trusted, government (e.g., NARA, GPO, etc.) and/or non-government digital repository to guarantee free public access to it.
Many Benefits! We believe that such a requirement would provide many benefits for agencies, libraries, archives, and the general public. We think it would do more to enhance long-term public access to government information than changes to Title 44 of the US Code (which codified the “free use of government publications”) could do.
- It would make it possible to preserve information continuously without the need for hasty last-minute rescue efforts.
- It would make it easier to identify and select information and preserve it outside of government control.
- It would result in digital objects that are easier to preserve accurately and securely.
- It would make it easy for government agencies to collaborate with digital repositories and designated communities outside the government for the long-term preservation of their information.
- The scale of the resulting digital preservation infrastructure would provide an easy path for shared Succession Plans for Trusted Digital Repositories (TDRs) (Audit And Certification Of Trustworthy Digital Repositories [ISO Standard 16363]).
IMPs would provide these benefits through the practical response of vendors that provide software to government agencies. Those vendors would have an enormous market for flexible software solutions for the creation of digital government information and records that fit the different needs of different agencies for database management, document creation, content management systems, email, and so forth, while, at the same time, making it easy for agencies to output preservable digital objects and an accurate inventory of them ready for deposit as Submission Information Packages (SIPs) into TDRs.
We believe this is a reasonable suggestion with a good precedent (the DMPs), but we would appreciate hearing your opinions. Is A‑130 the best target for such a regulation? What is the best way to propose, promote, and obtain such a new policy? What is the best wording for such a proposed policy?
We believe we have a singular opportunity of awareness and support for the preservation of government information. We believe that this is an opportunity, not just to preserve government information, but also to demonstrate the leadership of librarians and archivists and the value of libraries and archives.
James A. Jacobs, Librarian Emeritus, University of California San Diego
James R. Jacobs, Federal Government Information Librarian, Stanford University