OSTI using Archive-It for E-Prints
The Energy Department's Office of Scientific and Technical Information (OSTI) is using the Internet Archive's Archive-It service to "provide uninterrupted access to more than a million online research papers from OSTI's E-print Network."
- EPrint Network Special Collection
This collection provides searching of more than 1 million scientific e-prints. The E-print Network is a deep Web source of scientific and technical information created by researchers active in a wide range of fields, including chemistry, biology and life sciences, materials science, nuclear sciences and engineering, energy research, and computer and information technologies. Information customers can use E-print Network to browse scientific Web sites, find scientific societies, receive alerts and search and access scientific e-prints, the documents circulated electronically to facilitate peer exchange and scientific advancement. OSTI leads development and adaptation of new capabilities for preservation and dissemination of research important to the U.S. Department of Energy (DOE).
See also: OSTI archives scientific data on the Web, by Trudy Walsh, GCN, 06/29/07.
"Without a way to periodically archive this material, important science content within this ever-growing, ever-changing online, e-print environment could disappear," said Walter Warnick, director of OSTI.










Good model for GPO
This is a great model for FDLP libraries in concert with the GPO to build off of. Imagine if Archive-It was used to crawl the .gov/.mil/.state domains? Digital documents could be captured, harvested, described and distributed.
One thing I've been thinking about is next steps for Archive-It and other digital harvesting systems. We need to put a system in place for sifting through the vast amounts of data that get captured. One way to discern what's important (i.e., documents) and what's chaff (i.e., spacer gifs) is to create a system that connects queries to search results and automatically tags those items that are frequently clicked on by users. Tagged items could then be fed into the cataloging work flow (preferably distributed among 1250 FDLP libraries) in order to add more thorough descriptive metadata. Goodbye fugitive documents!
Am I crazy? Perhaps, but I'd love to hear others' thoughts.
Post new comment