Home » Posts tagged 'Internet archive'
Tag Archives: Internet archive
It’s that time again folks. The End of Term Archive is once again gearing up to harvest the .gov/.mil Web domain. For the End of Term 2020, The Library of Congress, University of North Texas Libraries, Internet Archive, Stanford University Libraries, and the U.S. Government Publishing Office (GPO) are joining efforts again, this time with new partners Environmental Data & Governance Initiative (EDGI) and the General Services Administration (GSA), to preserve public United States Government websites at the end of the current presidential administration ending January 20, 2021. This web harvest – like its predecessors in 2008, 2012, and 2016 – is intended to document the federal government’s presence on the World Wide Web during the transition of Presidential administrations and to enhance the existing collections of the partner institutions. This broad comprehensive crawl of the .gov domain will include as many federal .gov sites as we can find, plus federal content in other domains (such as .mil, .com, and social media content) and FTP’d datasets.
Here’s the official announcement asking for YOUR help. Please forward widely!
WE NEED YOUR HELP TO PRESERVE THE .GOV WEB DOMAIN!
How would YOU like to help preserve the United States federal government .gov/.mil Web domain for future generations? But, that’s too huge of a swath of Internet real estate for any one person or organization to preserve, right?!
Wrong! The volunteers working on the End of Term Web Archiving Project are doing just that. BUT WE NEED YOUR HELP!
And that’s where YOU come in. You can help the project immensely by nominating your favorite .gov website/document/dataset, other federal government websites, or governmental social media account with the End of Term Nomination Tool. You can nominate as many sites as you want. Nominate early and often! Win a prize for the most seed nominations!! Tell your friends, family and colleagues to do the same. Help us preserve the .gov domain for posterity, public access, and long-term preservation. Only YOU can help prevent … link rot!
In a recent post on the blog of the Web Science and Digital Libraries Research Group, Shawn Jones reports on research that is vital to all those interested in long term access to government information.
- How well are the National Guideline Clearinghouse and the National Quality Measures Clearinghouse Archived? Shawn M. Jones, Web Science and Digital Libraries Research Group (July 15, 2018).
In the post, Jones reports on his research into how much of the content of two sites (more…)
Drop everything and watch this presentation from the 2017 Code4Lib conference that took place in Los Angeles March 6-9, 2017. Heck, watch the entire proceedings because there is a bunch of interesting and thoughtful stuff going on in the world of libraries and technology! But in particular, check out Matt Zumwalt’s presentation “How the distributed web could bring a new Golden Age for Libraries” — after submitting his talk, he changed the new title to “Storing data together: the movement to decentralize data and how libraries can lead it” because of the DataRefuge movement.
Zumwalt (aka @FLyingZumwalt on twitter), works at Protocol Labs, one of the primary developers of IPFS, the Interplanetary File System (IPFS) — grok their tagline “HTTP is obsolete. It’s time for the distributed, permanent web!” He has spent much of his spare time over the last 9 months working with groups like EDGI, DataRefuge, and the Internet Archive to help preserve government datasets.
Here’s what Matt said in a nutshell: The Web is precarious. But using peer-to-peer distributed network architecture, we can “store data together”, we can collaboratively preserve and serve out government data. This resonates with me as an FDLP librarian. What if a network of FDLP libraries actually took this on? This isn’t some far-fetched, scifi idea. The technologies and infrastructures are already there. Over the last 9 months, researchers, faculty and public citizens around the country have already gotten on board with this idea. Libraries just have to get together and agree that it’s a good thing to collect/download, store, describe and serve out government information. Together we can do this!
Matt’s talk starts at 3:07:41 of the YouTube video below. Please watch it, let his ideas sink in, share it, start talking about it with your colleagues and administrators in your library, and get moving. Government information could be the great test case for the distributed web and a new Golden Age for Libraries!
This presentation will show how the worldwide surge of work on distributed technologies like the InterPlanetary File System (IPFS) opens the door to a flourishing of community-oriented librarianship in the digital age. The centralized internet, and the rise of cloud services, has forced libraries to act as information silos that compete with other silos to be the place where content and metadata get stored. We will look at how decentralized technologies allow libraries to break this pattern and resume their missions of providing discovery, access and preservation services on top of content that exists in multiple places.
This is an amazing offer from Brewster Kahle and the internet Archive. Kahle just wrote a letter to the House Subcommittee on Courts, Intellectual Property and the Internet Committee on the Judiciary stating unequivocally that they will “archive and host — for free, forever, and without restriction on access to the public — all records contained in PACER.” The “Public Access to Court Electronic Records” or PACER system is the supposedly publicly accessible system of federal court records that charges exorbitant fees to download, thus making it for all intents and purposes blocking meaningful access to federal court records. But with this letter, the whole system could become actually accessible, for free and in perpetuity!
By this submission, tile Internet Archive would like to clearly state to the Judiciary Committee, as well as to the Administrative Office of the U.S. Courts and the Judicial Conference of the United States, that we would be delighted to archive and host — for free, forever, and without restriction on access to the public — all records contained in PACER…
In order to recognize the vision of universal free access to public court records, the Federal Judiciary would essentially have to do nothing. We are experts at “crawling” online databases in an efficient and careful fashion that does not burden those systems. We are already able to comprehensively crawl PACER from a technical perspective, but the resulting fees would be astronomical. The Federal Judiciary has a Memorandum of Understanding with both the Executive Office for us Trustees and with the Government Printing Office that gives each entity no-fee access for the public benefit. The collection we would provide to the public would be far more comprehensive than the GPO’s current court opinion program- although I must laud that program for providing a digitally-authenticated collection of many opinions.
By making federal judicial dockets available in this manner, the Federal Judiciary would enable free and unlimited public access to all records that exist in PACER, finally living up to the name of the program. In today’s world, public access means access on the Internet. Public access also means that people can work with big data without having to pass a cash register for each document.
The OpenGov Foundation wrote just released their “Statement on Internet Archive Offer to Deliver Free and Perpetual Public Access to PACER” in which they said:
“The vital public information in PACER is the property of the American people. Public information, from laws to court records, should never be locked away behind paywalls, never be stashed behind arbitrary barriers and never be covered in artificial restrictions. Forcing Americans to pay hard-earned money to access public court records is no better than forcing them to pay a poll tax.
“The Internet Archive’s offer to archive and deliver unrestricted public access to PACER for free and forever is the best possible Valentine’s Day gift to the American people. The Internet Archive is proposing a cost-effective and innovative public-private partnership that will finally fix a clear injustice. There is no reason to do anything but accept this offer in a heartbeat.”
This just came through my twitter feed from @MuckRock. Through a FOIA request which shook it loose from the notoriously difficult NSA, we now have access to NSA’s 2007 Untangling the Web: a guide to Internet research. It kind of reads like a Terry Pratchett novel if Terry was having a psychotic/psychedelic episode. As MuckRock notes, “you don’t have to go very far before this takes a hard turn into ‘Dungeons and Dragons campaign/Classics major’s undergraduate thesis’ territory.” Read on, you’ll thank me later!
And if you’re interested, I collected and cataloged a version for our library. The original NSA link to the document no longer resolves (and it was put up just last year!!), but there’s an archived copy in the WayBack Machine.
The NSA has a well-earned reputation for being one of the tougher agencies to get records out of, making those rare FOIA wins all the sweeter. In the case of Untangling the Web, the agency’s 2007 guide to internet research, the fact that the records in question just so happen to be absolutely insane are just icing on the cake – or as the guide would put it, “the nectar on the ambrosia.”