Home » Posts tagged 'Internet archive'
Tag Archives: Internet archive
Good Forbes piece on the End of Term crawl
Check out this recent piece in Forbes about the End of Term project (eotarchive.org). And if you’re so moved to help out, you can nominate any federal government url through our handy nomination tool. There’s still time to help us save democracy’s information!
Meet The Citizens Racing To Save Government Websites From Vanishing. Leslie Katz, Forbes, Oct 23, 2024
…With the Nov. 5 election just a week away, they’re harvesting vast amounts of government data before the White House welcomes new residents or former ones in January. The information will live on in the End of Term Web Archive, a giant repository of federal government websites preserved for the historical record as one administrative term ends and a new one begins. Librarians, archivists and technologists across the country join forces every four years to donate time, effort and resources to what they dub the end-of-term crawl, with the resulting datasets available to the public for free…
“Citizens have a right to access information about what their government is doing in their name,” says James Jacobs, a government information librarian at Stanford University, an End of Term Web Archive project partner. “That’s why libraries have long collected these materials to make sure they are organized, preserved and easily accessible for the long term.”
End of Term crawl 2024 is now underway!
Well it’s that time again. The 2024 End of Term web crawl of the federal .gov/.mil web space (and other domains 🙂 ) has begun. We have just posted our first public announcement on the Internet Archive blog.
As we have done since 2008 (NARA did the first comprehensive crawl in 2004), a group of volunteers from the Internet Archive, GPO, Library of Congress, NARA, University of North Texas, and Stanford will be doing a “comprehensive” web harvest of the Federal government’s web space. For more information and background on the project, see our home page at https://eotarchive.org/. These archives can be searched full-text via the Internet Archive’s collections search (https://web.archive.org/) and also downloaded as bulk data for machine-assisted analysis from the project site.
But MOST IMPORTANTLY, we need YOUR help! We are currently accepting nominations for websites to be included in the 2024 End of Term Web Archive. Submit a url nomination by going to our nomination tool (hosted by University of North Texas!) and clicking the big yellow “add a url” button in the top right:
https://digital2.library.unt.edu/nomination/eth2024/
We encourage you to nominate any and all U.S. federal government websites that you want to make sure get captured. We’re also interested in any and all urls of federal sites that are NOT hosted on .gov/.mil (there are lots of federal government sites hosted on .edu, .org, and even .com! That includes social media but also research labs and other private/public partnerships). We already have a solid list of top level domains (eg epa.gov, congress.gov, defense.mil etc). Nominating urls deep within .gov/.mil websites helps to make our web crawls as thorough and complete as possible. Prizes will be awarded for most url nominations by individuals and institutions!
So get to it! Help us do the most complete crawl we can and also assure that the sites/publications/videos/data etc that are most important to YOU make it into the archive!!
Nominations sought for the U.S. Federal Government Domain End of Term 2020 Web Archive
It’s that time again folks. The End of Term Archive is once again gearing up to harvest the .gov/.mil Web domain. For the End of Term 2020, The Library of Congress, University of North Texas Libraries, Internet Archive, Stanford University Libraries, and the U.S. Government Publishing Office (GPO) are joining efforts again, this time with new partners Environmental Data & Governance Initiative (EDGI) and the General Services Administration (GSA), to preserve public United States Government websites at the end of the current presidential administration ending January 20, 2021. This web harvest – like its predecessors in 2008, 2012, and 2016 – is intended to document the federal government’s presence on the World Wide Web during the transition of Presidential administrations and to enhance the existing collections of the partner institutions. This broad comprehensive crawl of the .gov domain will include as many federal .gov sites as we can find, plus federal content in other domains (such as .mil, .com, and social media content) and FTP’d datasets.
Here’s the official announcement asking for YOUR help. Please forward widely!
WE NEED YOUR HELP TO PRESERVE THE .GOV WEB DOMAIN!
How would YOU like to help preserve the United States federal government .gov/.mil Web domain for future generations? But, that’s too huge of a swath of Internet real estate for any one person or organization to preserve, right?!
Wrong! The volunteers working on the End of Term Web Archiving Project are doing just that. BUT WE NEED YOUR HELP!
And that’s where YOU come in. You can help the project immensely by nominating your favorite .gov website/document/dataset, other federal government websites, or governmental social media account with the End of Term Nomination Tool. You can nominate as many sites as you want. Nominate early and often! Win a prize for the most seed nominations!! Tell your friends, family and colleagues to do the same. Help us preserve the .gov domain for posterity, public access, and long-term preservation. Only YOU can help prevent … link rot!
- End of Term 2020 Nomination Tool: Submit URLs here.
- About the End of Term 2020 Project
- End of Term Web Archive (2008, 2012, and 2016)
- Follow us on Twitter @eotarchive
- For more information, contact us at eot-info AT archive DOT org
Preserving What’s Gone — The Healthcare Guidelines Case
In a recent post on the blog of the Web Science and Digital Libraries Research Group, Shawn Jones reports on research that is vital to all those interested in long term access to government information.
- How well are the National Guideline Clearinghouse and the National Quality Measures Clearinghouse Archived? Shawn M. Jones, Web Science and Digital Libraries Research Group (July 15, 2018).
In the post, Jones reports on his research into how much of the content of two sites (more…)
Lunchtime listen: “Storing Data Together” by Matt Zumwalt at Code4Lib2017
Drop everything and watch this presentation from the 2017 Code4Lib conference that took place in Los Angeles March 6-9, 2017. Heck, watch the entire proceedings because there is a bunch of interesting and thoughtful stuff going on in the world of libraries and technology! But in particular, check out Matt Zumwalt’s presentation “How the distributed web could bring a new Golden Age for Libraries” — after submitting his talk, he changed the new title to “Storing data together: the movement to decentralize data and how libraries can lead it” because of the DataRefuge movement.
Zumwalt (aka @FLyingZumwalt on twitter), works at Protocol Labs, one of the primary developers of IPFS, the Interplanetary File System (IPFS) — grok their tagline “HTTP is obsolete. It’s time for the distributed, permanent web!” He has spent much of his spare time over the last 9 months working with groups like EDGI, DataRefuge, and the Internet Archive to help preserve government datasets.
Here’s what Matt said in a nutshell: The Web is precarious. But using peer-to-peer distributed network architecture, we can “store data together”, we can collaboratively preserve and serve out government data. This resonates with me as an FDLP librarian. What if a network of FDLP libraries actually took this on? This isn’t some far-fetched, scifi idea. The technologies and infrastructures are already there. Over the last 9 months, researchers, faculty and public citizens around the country have already gotten on board with this idea. Libraries just have to get together and agree that it’s a good thing to collect/download, store, describe and serve out government information. Together we can do this!
Matt’s talk starts at 3:07:41 of the YouTube video below. Please watch it, let his ideas sink in, share it, start talking about it with your colleagues and administrators in your library, and get moving. Government information could be the great test case for the distributed web and a new Golden Age for Libraries!
This presentation will show how the worldwide surge of work on distributed technologies like the InterPlanetary File System (IPFS) opens the door to a flourishing of community-oriented librarianship in the digital age. The centralized internet, and the rise of cloud services, has forced libraries to act as information silos that compete with other silos to be the place where content and metadata get stored. We will look at how decentralized technologies allow libraries to break this pattern and resume their missions of providing discovery, access and preservation services on top of content that exists in multiple places.
Latest Comments