Panel on End-of-term crawl and the collection of vulnerable government information

January 23, 2017 by · 4 Comments
Filed under: Commentary, Library, post 

I was honored last week to be part of a panel hosted by OpenTheGovernment and the Bauman Foundation to talk about the End of Term project. Other presenters included Jess Kutch at Coworker.org and Micah Altman, Director of Research at MIT Libraries. I talked about what EOT is doing, as well as some of the other great projects, including Climate Mirror, Data Refuge and the Azimuth backup project, working in concert/parallel to preserve federal climate and environmental data.

I thought the Q&A segment was especially interesting because it raised and answered some of the common questions and concerns that EOT receives on a regular basis. I also learned about a cool project called Violation Tracker, a search engine on corporate misconduct. And I was also able to talk a bit about what are the needs going forward, including the idea of “Information Management Plans” for agencies similar to the idea of “Data Management Plans” for all federally funded research. I was heartened to know that there is interest in that as a wider policy advocacy effort!

The full recorded meeting can be viewed here from Bauman’s adobe connect account.

Here’s more information on the EOT crawl and how you can help.

Coalitions of government, university, and public interest organizations have been working to ensure as much information as possible is preserved and accessible, amid growing concern that important and sensitive government data on climate, labor, and other issues may disappear from the web once the Trump Administration takes office.

Last Thursday, OTG and the Bauman Foundation hosted a meeting of advocates interested in preserving access to government data, and individuals involved in web harvesting efforts. James Jacobs, a government information librarian at Stanford University Library who is working on the End of Term (EOT) web harvest – a joint project between the Internet Archive, the Library of Congress, the Government Publishing Office, and several universities – spoke about the EOT crawl, and explained the various targets of the harvest, including all .gov and .mil web sites, government social media accounts, and more.

Jess Kutch discussed efforts by Coworker.org with Cornell University to preserve information related to workers’ rights and labor protections, and other meeting attendees presented some of their own projects as well. Philip Mattera explained how Good Jobs First is using its Violation Tracker database to scrape and preserve government source material related to corporate misconduct.  

Micah Altman, Director of Research at MIT Libraries, presented on the need for libraries and archives to build better infrastructure for the EOT harvest and other projects – including data portals, cloud infrastructure, and technologies that enhance discoverability – so that data and other government information can be made more easily accessible to the public.

via Volunteers work to preserve access to vulnerable government information, and you can help | OpenTheGovernment.org.

Do libraries need more shelving? Isn’t everything digital?

February 14, 2016 by · 1 Comment
Filed under: post 

We here at FGI have been making the argument against the destruction of physical collections in connection with digitization efforts for a long time (see e.g., Wait! Don’t Digitize and Discard! A White Paper on ALA COL Discussion Issue #1a and What You Need to Know about the New Discard Policy). So it’s nice to hear the same argument from Jeff MacKie-Mason, recently hired University Librarian and Chief Digital Scholarship Officer at UC Berkeley on his blog madLibbing: Muddling Along in the Information Age. Mackie-Mason clearly and succinctly points out the reasons that libraries still need physical collections: many digitized works are still in copyright and their digital surrogates are therefore not shareable online, print copies are easier to read with higher comprehension rates, there is “little or no confidence that we can guarantee long-term digital preservation” (emphasis his!), and current digital surrogates from large digitization projects are less than complete (we’ve pointed this out repeatedly e.g., in “‘An alarmingly casual indifference to accuracy and authenticity.’ What we know about digital surrogates.”). So we hope the next time your library weeds a government document under the assumption that it’s online, you’ll check the digital surrogate for completeness and at least start the discussion with your administrators about the need for a local digital archive to assure the preservation of the digital surrogate that you’re about to weed. It could mean the difference between access and frustration for your user community.

One huge misconception we face is that digitizing our collections means we don’t need the print anymore. For example, we are participants in the Google Books / HathiTrust project, and most of our 11 million regular volumes have been digitized.  Why not burn our print copies?

  1. For starters, about half of the collection is still in copyright. The HathiTrust collection can be searched, full-text, to find the existence of books, but we are not allowed to let people use the digital copy (with limited exceptions, e.g., for the blind, who can listen to a text-to-voice conversion). Decades before this need for our print copies goes away.
  2. Second, we are here not to build collections for their own sake, but to serve our faculty and students. And many of them vastly prefer doing their work from print copies. Those who read long monographs find it easier and their comprehension higher. Those who need to study large images or maps, in high resolution, or who want to see side-by-side page comparisons, need the print. And for many rare and historical documents, the materiality of the original document itself is of enormous importance for scholarship, from the marginal annotations to the construction of the volume.
  3. Next, we can have little or no confidence that we can guarantee long-term digital preservation. Digital storage has been around a relatively short time  In that time, formats change frequently.  Hardware and software to render digital formats changes. Bits on storage media rot.  Keeping bits and being able to find and access them in the future requires large annual expenditures, and those expenditures are getting larger as the amount of content we want to preserve grows enormously fast. Further, much of scholarly content currently is held on servers of for-profit companies, and we have no guarantee those companies will survive, or that they will take care to ensure that their archives of scholarly publications survive.
  4. The Google project has been very good, but it is not complete.  It does not scan fold-out pages, for example, which are in many scholarly books (maps, charts, tables).  We have discovered that sometimes they miss pages, or the quality is not readable.

So, for now, there is pretty much consensus among research scholars and librarians that we must keep print copies for preservation in all cases, and for continuing use in many cases.

via Do libraries need more shelving? Isn’t everything digital? – madLibbing.

Roundup of Government Info News and New Resources

November 29, 2011 by · Leave a Comment
Filed under: post 

Time once again for a selection of news and new resources that we hope will be an interest to the FGI community. The posts are from INFOdocket.com (@infofodocket) where we compile and post new items daily from a variety of resources.

1. “Obama Wants Better Digital Archive of Federal Records” + Full Text of Presidential Memorandum

2. Now Available: EPA Releases Formerly Confidential Chemical Information

3. San Antonio, TX: New Online Database: Historical Election Results are Digitized

4. Statistics Canada to Make All Online Data Free

5. UK Parliament: MPs to Investigate Library Closures

6. TR Center Officially Launches the Theodore Roosevelt Digital Collection

7. Idaho: Libraries to Adjust to New Internet Filtering Law

8. U.S.: National Archives Trust Fund To Sell Copies of the 1940 Census (Digital & Microfilm Versions) Available

9. All Thing Preservation: New NARA (National Archives) Twitter Stream & Tumblr Page

10. U.S. History: Senator George Mitchell Oral History Project Debuts Online

11.New Social Media Resource: “PolitickerUSA is the Best Way to Track Politicians’ Tweets”

12. Video Now Online of NARA’s “What’s Next in Social Media” Forum

13. Child Welfare Information Gateway — State Guides & Manuals Search

14. GPO Releases Its First App

15. New UN Database Available: Expert Panel Launches Tool to Fight Arbitrary Deprivation of Freedom

16. State of Minnesota Posts Franchise Disclosure Documents (FDD)

17. U.S. Government: USAID Launches New GeoCenter

18. Public Access to Indiana’s Historic Sanborn Maps Provides Treasure Trove of Information

19. California: More than 13,000 Online Maps Provide Historic View of State

20. UNESCO’s Global Open Access Portal Now Online

AP reports on Cyber Cemetery

September 14, 2009 by · Leave a Comment
Filed under: post 

The Associated Press (AP) has a story out today covering the Cyber Cemetery project at the University of North Texas Libraries. I came across a version at the Federal News Radio website, but I imagine it has been picked up elsewhere:

Government Web sites kept alive at Cyber Cemetery, 14 September 2009.

  • Our mission

    Free Government Information (FGI) is a place for initiating dialogue and building consensus among the various players (libraries, government agencies, non-profit organizations, researchers, journalists, etc.) who have a stake in the preservation of and perpetual free access to government information. FGI promotes free government information through collaboration, education, advocacy and research.
  • Archives