Home » Posts tagged 'data preservation'

Tag Archives: data preservation

Our mission

Free Government Information (FGI) is a place for initiating dialogue and building consensus among the various players (libraries, government agencies, non-profit organizations, researchers, journalists, etc.) who have a stake in the preservation of and perpetual free access to government information. FGI promotes free government information through collaboration, education, advocacy and research.

Lunchtime listen: “Storing Data Together” by Matt Zumwalt at Code4Lib2017

Drop everything and watch this presentation from the 2017 Code4Lib conference that took place in Los Angeles March 6-9, 2017. Heck, watch the entire proceedings because there is a bunch of interesting and thoughtful stuff going on in the world of libraries and technology! But in particular, check out Matt Zumwalt’s presentation “How the distributed web could bring a new Golden Age for Libraries” — after submitting his talk, he changed the new title to “Storing data together: the movement to decentralize data and how libraries can lead it” because of the DataRefuge movement.

Zumwalt (aka @FLyingZumwalt on twitter), works at Protocol Labs, one of the primary developers of IPFS, the Interplanetary File System (IPFS) — grok their tagline “HTTP is obsolete. It’s time for the distributed, permanent web!” He has spent much of his spare time over the last 9 months working with groups like EDGI, DataRefuge, and the Internet Archive to help preserve government datasets.

Here’s what Matt said in a nutshell: The Web is precarious. But using peer-to-peer distributed network architecture, we can “store data together”, we can collaboratively preserve and serve out government data. This resonates with me as an FDLP librarian. What if a network of FDLP libraries actually took this on? This isn’t some far-fetched, scifi idea. The technologies and infrastructures are already there. Over the last 9 months, researchers, faculty and public citizens around the country have already gotten on board with this idea. Libraries just have to get together and agree that it’s a good thing to collect/download, store, describe and serve out government information. Together we can do this!

Matt’s talk starts at 3:07:41 of the YouTube video below. Please watch it, let his ideas sink in, share it, start talking about it with your colleagues and administrators in your library, and get moving. Government information could be the great test case for the distributed web and a new Golden Age for Libraries!

This presentation will show how the worldwide surge of work on distributed technologies like the InterPlanetary File System (IPFS) opens the door to a flourishing of community-oriented librarianship in the digital age. The centralized internet, and the rise of cloud services, has forced libraries to act as information silos that compete with other silos to be the place where content and metadata get stored. We will look at how decentralized technologies allow libraries to break this pattern and resume their missions of providing discovery, access and preservation services on top of content that exists in multiple places.


Check out Endangered Data Week – April 17-21, 2017

This week is Endangered Data Week, a new effort to raise awareness about publicly available data and the threats to its creation, sharing and retention. Follow along with the conversation at the Twitter hashtag #EndangeredData, check out the Endangered Data events near you, tune in on friday for the webinar hosted by the Digital Library Federation (DLF) “Endangered Accountability: A DLF-Sponsored Webinar on FOIA, Government Data, and Transparency” and definitely sign up for the new DLF Interest Group on Records Transparency/Accountability.

There’s never been such an open window of opportunity for govt information librarians to prove their metal and work together to assure the preservation of born-digital govt information in all its guises. So jump in and get involved today!

Political events in the United States have shed new light on the fragility of publicly administered data. In just the first few weeks of the Trump administration and 115th Congress, the Environmental Protection Agency was allegedly ordered to remove climate change information from its website, the USDA removed animal welfare data from its website, and the House passed H.Res.5, specifically excluding changes to the Affordable Care Act from mandatory long-term cost data analysis. The Senate and House of Representatives have both received proposed bills (S.103 and H.R.482) prohibiting funding from being used “to design, build, maintain, utilize, or provide access to a Federal database of geospatial information on community racial disparities or disparities in access to affordable housing.” While researchers, archivists, librarians, and watchdog groups work hard to create and preserve open data, there’s little guarantee that information under federal control will always survive changes to federal agencies.

via Endangered Data Week – April 17-21, 2017.

FGI accidental docs librarian webinar: “Saving govt data: a conversation with the future”

Please tune in next Wednesday, March 29, 2017 from 9am – 10am Pacific / 12:00 – 1:00pm Eastern for the next Help! I’m an Accidental Government Information Librarian Webinar “Saving government data: A conversation with the future.” You’ll need to RSVP for the session in order to get the link to the WebEx live session. “See” you there!

Help! I’m an Accidental Government Information Librarian presents … Saving government data: A conversation with the future, on Wednesday, March 29, 2017 from 12:00 – 1:00 p.m. (Eastern).

In recent months, the DataRefuge project has collaborated with hundreds of volunteers around the United States to collect, describe, and store federal data that support climate and environmental research and advocacy. This project, and others like it, works in conjunction with the End of Term Web Archive to capture and make available federal web content during administrative transitions.

Our discussion will explore the fragility of digital information, and expand on ideas about what data is. We’ll talk about current projects and efforts, and explore the future of this work. Finally, we’ll address the concept of sustainability, and propose a paradigm of empowered experimentation that aligns with our values and roles within libraries.

We will meet together for Session #69, online on Wednesday, March 29, 2017 from 12:00 – 1:00 p.m. (Eastern). Please RSVP for the session using this link:  http://bit.ly/GRS-Session69

We will use WebEx for the live session. Information on testing and accessing the session will be made available when you register.

The session will be recorded and available after the live session, linked from the NCLA GRS web page (http://www.nclaonline.org/government-resources).

Presenters:
Laurie Allen is the Assistant Director for Digital Scholarship in the Penn Libraries, where she leads a group working to expand the capacity of researchers at Penn to create and share scholarship in new forms. The group engages in digital project development, data management and curation, mapping, experimentations with emerging research methods, and open access publishing. In late 2016, Allen was part of the group that started Data Refuge, and has been involved in bringing together a group of collaborators to form a network of libraries, open data activists and open government efforts.

James A. Jacobs is Data Services Librarian Emeritus, University of California San Diego. He has more than 25 years experience working with digital information, digital services, and digital library collections. He is a technical consultant and advisor to the Center for Research Libraries in the auditing and certification of digital repositories using the Trusted Repository Audit Checklist (TRAC) and related CRL criteria. He served as Data Services Librarian at the University of California San Diego and co-taught the ICPSR summer workshop, “Providing Social Science Data Services: Strategies for Design and Operation”. He is a co-founder of Free Government Information.

James R. Jacobs is the US Government Information Librarian at Stanford University Libraries where he works on both collection development as well as digital projects like LOCKSS-USDOCS. He is a member of ALA’s Government Documents Roundtable (GODORT) and served a 3-year term on Depository Library Council to the Public Printer, including serving as DLC Chair. He is a co-founder of Free Government Information (freegovinfo.info) and Radical Reference (radicalreference.info) and is on the board of Question Copyright, a 501(c)(3) non-profit organization that promotes a better public understanding of the  effects of copyright, and encourages the development of alternatives to information monopolies.

Shari Laster is the Government Information Librarian and Data Services Librarian at the University of California, Santa Barbara. She currently serves as Assistant Chair/Chair-Elect for the Government Documents Round Table of the American Library Association, and is a past chair of the Depository Library Council, the advisory body for the Federal Depository Library Program.

via Help! I’m an Accidental Government Information Librarian Webinars | North Carolina Library Association.

DoE’s Carbon Dioxide Information Analysis Center (CDIAC) shut down without comment. Data in preservation danger

This is terrible. The US Department of Energy (DOE) has summarily shut down the Carbon Dioxide Information Analysis Center (CDIAC), located at the Oak Ridge National Laboratory (ORNL) as of 10/1/2016. CDIAC is the primary climate change data and information analysis center for DOE. CDIAC is supported by DOE’s Climate and Environmental Sciences Division within the Office of Biological and Environmental Research (BER).

A friend reports that CDIAC has limited funding and is trying to save its data in the NASA Distributed Active Archive Center (DAAC). There has been no outside comment and neither DOE nor ORNL have yet to issue a press release.

I just checked and the CDIAC site is in the Internet Archive Wayback Machine. However, the CDIAC’s entire data catalog is served out over FTP which is not captured by IA’s heritrix Web crawler.

NOTICE: CDIAC as currently configured and hosted by ORNL will cease operations on September 30, 2017. Data will continue to be available through this portal until that time. Data transition plans are being developed with DOE to ensure preservation and availability beyond 2017.

via Carbon Dioxide Information Analysis Center (CDIAC).

Research data lost to the sands of time

Here’s an interesting article, not on link rot (a topic FGI has been tracking for some time), but on *data rot*. In a recent article in Current Biology, researchers examined the availability of data from 516 studies between 2 and 22 years old. They found the following:

  • that the odds of a data set being reported as extant fell by 17% per year;
  • Broken e-mails and obsolete storage devices were the main obstacles to data sharing
  • Policies mandating data archiving at publication are clearly needed

Librarians have known of this issue for years — the Inter-university Consortium for Political and Social Research (ICPSR) was set up in 1962 to tackle this — but it does put the issue in focus. And finally the federal government — via efforts like the NSF’s data management plan and OSTP’s new directive to improve the management of and access to scientific collections — is beginning to get behind the effort to improve on data rot. And many libraries — not to mention scientists and researchers — are beginning to struggle with the issue of data preservation. The issue is too big for just government information librarians to handle obviously. But this is fertile space in which govt information librarians, data librarians, research communities, and federal agencies can come together. The Federal policy stating the importance of data preservation is there, it’ll just take effort by multiple stakeholders to make sure it actually happens. It’s a positive that the writers of Dragonfly, the blog of the National Network of Libraries of Medicine Pacific Northwest Region — where I came across the article — point out that academic institutions can and should play a leading role in data preservation. I wholeheartedly agree!

Vines, Timothy H., Arianne YK Albert, Rose L. Andrew, Florence Débarre, Dan G. Bock, Michelle T. Franklin, Kimberly J. Gilbert, Jean-Sébastien Moore, Sébastien Renaut, and Diana J. Rennison. “The availability of research data declines rapidly with article age.” Current Biology 24, no. 1 (2014): 94-97.
http://dx.doi.org/10.1016/j.cub.2013.11.014

The researchers found that for every year that had passed since the paper’s publication date, the odds of finding an email address that led to contact with a study author decreased by 7% and that the odds of turning up the data reduced by 17% per year.  The authors report that while some of the data sets were truly lost others fell more into the category of “unavailable,” since they existed, but solely on inaccessible media (think Jaz disk).  These findings will not come as a shock to those who have worked in a research lab.  This publication does put some tangible numbers behind the underlying message of NYU Health Sciences Library’s excellent dramatic portrayal of an instance of inaccessible data.  The authors conclude by suggesting that a solution to this problem moving forward can be found in more journals requiring the deposit of data into a public archive upon publication.  I would also suggest that academic institutions can take a role by establishing policies supporting research data preservation alongside providing a data repository.

Archives