Home » Posts tagged 'Web archiving' (Page 2)

Tag Archives: Web archiving

Our mission

Free Government Information (FGI) is a place for initiating dialogue and building consensus among the various players (libraries, government agencies, non-profit organizations, researchers, journalists, etc.) who have a stake in the preservation of and perpetual free access to government information. FGI promotes free government information through collaboration, education, advocacy and research.

Nominations sought for the U.S. Federal Government Domain End of Term Web Archive

Some of our readers may have seen this announcement already. But in case not, we need your help to preserve the .gov domain. See the announcement below to find out how.

How would YOU like to help preserve the United States federal government .web domain for future generations? But, that’s too huge of a swath of Internet real estate for any one person to preserve, right?!

Wrong! The volunteers working on the End of Term Web Archiving Project are doing just that. But we need your help.

The Library of Congress, California Digital Library, University of North Texas Libraries, Internet Archive, George Washington University Libraries, Stanford University Libraries, and the U.S. Government Publishing Office (GPO) have joined together for a collaborative project to preserve public United States Government websites at the end of the current presidential administration ending January 20, 2017. This web harvest — like its predecessors in 2008 and 2012 — is intended to document the federal government’s presence on the World Wide Web during the transition of Presidential administrations and to enhance the existing collections of the partner institutions. This broad comprehensive crawl of the .gov domain will include as many federal .gov sites as we can find, plus federal content in other domains (such as .mil, .com, and social media content).

And that’s where YOU come in. You can help the project immensely by nominating your favorite .gov website, other federal government websites, or governmental social media account with the End of Term Nomination Tool (link below). You can nominate as many sites as you want. Nominate early and often! Tell your friends, family and colleagues to do the same. Help us preserve the .gov domain for posterity, public access, and long-term preservation. Only YOU can help prevent … link rot!

For more information, contact uc3@ucop.edu

Internet preservation: what do we do now?

I’m at the International Internet Preservation Consortium General Assembly and Conference this week in beautiful Reykjavik, Iceland. Follow the flow of the conversation from the #IIPCga16 and #IIPCwac16 twitter hashtags.

Here are a few pieces of the week that have especially hit me as important:

  • The technologies and tools to collect, preserve and use Web archives are technically challenging, but getting better every day. The twitter stream includes MANY links to great tools and use cases for Web archives. I really appreciated British Library’s Andy Jackson’s presentation about his work “Building tools to archive the modern Web”, Brewster Kahle’s “Distributed Web” proposal, Harvard Library’s “Web archiving environmental scan”, but there’s a lot of amazing work going on in this space!
  • 30 national libraries are crawling and preserving their own domains. Government information is of great interest, and many countries have legal deposit laws that put them on sound legal footing to collect and preserve their countries’ Web content.
  • The US .gov/.mil End of term crawl 2016 is coming up quickly and we’re making plans. There will again be a link recommendation tool and perhaps other non-technical ways for the community to help.
  • Despite the great tools and very smart technologists, this group could really use input from subject/domain specialists. We’re the ones who have the specialized knowledge to know what to collect. It’s clear in my mind that the world needs MORE govt information librarians, not less! I highly recommend attending IIPC next year in Lisbon, Portugal (I’ll be there I hope!!) or in the future if the conference is in a city near you.

Politwoops archive of politicians’ deleted tweets uploaded to Internet Archive

In August, 2015, Twitter shut down the Politwoops service and its sister site Diplotwoops that tracked politicians’ deleted tweets. Twitter revoked access to its API for all Politwoops sites in 30 countries around the world, citing terms of service violations.

In a move to preserve all of those deleted tweets, Open State has uploaded its archive of tweets to the Internet Archive.

In a move to preserve the public record for everyone, Open State has uploaded its complete Politwoops archive of deleted tweets by politicians to the Internet Archive. The archive consists of 1,106187 deleted tweets by 10,404 politicians collected in 35 countries and parliaments over a period of five years.

Explore the oldest U.S. Website — via the Stanford wayback machine!

Who knew that the oldest US Website was a page from the Stanford Linear Accelerator (SLAC)? Now you can explore the evolution of that oldest Website via the Stanford Libraries Wayback Machine. We’re now locally running an instance of the Internet Archive’s Wayback Machine. Soon, all of our Web harvesting collections will also be available via the Stanford Wayback search interface. This includes some rich collections of government publications including Freedom of Information (FOIA), Congressional Research Service (CRS) Reports, Fugitive US Agencies, Bay Area Governments, and more!

At a microscopic level, web archives document the evolution of individual websites. At a macroscopic level, they document the evolution of the Web itself. In the case of web archives for the period when the entire Web consisted of only a handful of individual websites, changes to even a single website reflect changes to the Web itself. We are pleased to announce the availability of such an archive, notably featuring the oldest U.S. website, dating to December 21, 1991.

via Explore the oldest U.S. website | Stanford University Libraries.

NLM to change the way it archives its web pages

The National Library of Medicine (NLM) reports that it will begin using the Internet Archive’s Archive-It Web archiving service to capture and preserve periodic snapshots of NLM Web resources.

This is a change in policy. NLM says that it “will cease the application and management of its previous permanence policy to individual Web pages and documents” which included recording permanence metadata within HTML headers, performing pre-publication appraisal and review by NLM archivists, and maintaing version control for deleted and updated content. It says that the new approach will broaden and simplify its approach to archiving its own Web sites and that it is confident that its new approach “will provide a richer and deeper historical record of NLM programs, activities, and services.”