digital collections

OSTI: collecting AND connecting scientific govt information

Did you know that the Office of Scientific and Technical Information (OSTI) has a blog? The OSTI blog turned 1 year old last month but has only been in our In other news... section for a short time.

I'm really impressed with the work that OSTI is doing to build digital collections of scientific and technical information as well as to push the boundaries of access by building databases, federated search tools, being an OAI node, distributing bibliographic records and generally finding unique and innovative ways to make scientific and technical information available on the Web (I just love the idea of an adopt-a-doc program!!).

In particular, a blog post entitled Beyond Collecting: Connecting from a few weeks back (yes my feedreader is bursting at the seams :-) ) caught my eye. They've basically gone out and built a digital infrastructure along the lines of what we at FGI have been advocating for lo these many years. That is, they've realized that they can't possibly collect it all. Instead of building one big central repository, they're relying on many agencies and actors to host content and standards-based metadata of interest to them. OSTI can then use increasingly robust digital tools to aggregate and provide search mechanisms for vast amounts of information -- to "connect users with the highest quality science information without collecting or hosting it."

THAT'S what I envision for the Federal Depository Library Program: a collaborative network of libraries (a technical and social P2P network!) hosting content of interest to their local communities, creating and maintaining standardized metadata, connecting up with each other to create powerful search tools across the network. This is the many-hands-make-light-work digital model to which we in the documents community should be espousing.

--that is all.


OSTI has embraced a new paradigm for sharing scientific and technical information (STI). Historically, OSTI has fulfilled its mission of providing STI to scientists, researchers, and the public by hosting, or collecting, documents and/or metadata. OSTI's new paradigm is to make content searchable that is often hosted by others; today, OSTI connects those seeking the content with the organizations that host it.

Beginning in the late 1940's, with OSTI's production of the Nuclear Science Abstracts - which was to go on for nearly 30 years, OSTI entered into the business of collecting information. Beginning in the 1990’s, OSTI began creating web application to make the collected content openly accessible and conveniently searchable. ETDE Web, DOE Information Bridge, the Energy Citations Database, and DOE R&D Accomplishments are some of the successful applications.

In the last several years, OSTI’s approach to disseminating STI has evolved. Recent applications such as the Eprint Network, Science.gov, DOE Science Accelerator, and WorldWideScience.org connect users with the highest quality science information without collecting or hosting it.

How does OSTI move beyond collecting to connecting and what does connecting mean? OSTI's new applications search content that is housed in document repositories owned by a number of government agencies and government-sanctioned organizations. OSTI applications search a number of these repositories on the fly and they aggregate the content from the sources they search and present the most relevant of the search results to the user. This simultaneous and real-time search of multiple repositories is called federated search. OSTI's federated search applications serve as portals to specific subjects. In being subject-specific, they connect users to the highest quality STI in their fields of interest.

Why is OSTI embracing the connection model? Quite simply, OSTI can far better achieve its mission by making great quantities of content openly accessible and conveniently searchable, but it is impossible to collect and keep current such quantities of content from multiple content sources. “Connecting” to content is doable, while “collecting” is not. (My emphasis added!)

We believe that by connecting users to content, we provide a more comprehensive and authoritative search. In doing so, we accelerate the advancement of science.

Funding Collections and Services in the Public Interest

Do you ever worry about funding for your library? Have you ever thought about how to get a grant to help your library? Do you wonder about how you might attract grant funding to a library in the age of Google and the Web?

If you answered "yes" to any of those questions, I recommend the article Digital Infrastructure and Public Interest by Vince Stehle, in Grantmakers in the Arts Reader, Fall 2008.

(I posted a link to this article a few days ago but, after John referred to it in his 66 Days to Government Information Liberation post, I wanted to follow up a bit and mention why I think the Stehle article is important for libraries. This also gives me an opportunity to contribute some more to the excellent discussion that John is facilitating about Government Information Liberation.)

Stehle is a program director at the Surdna Foundation, which makes grants in the areas of environment, community revitalization, effective citizenry, the arts, and the nonprofit sector, and he was writing for Grantmakers in the Arts Reader. In addressing his audience of grantmakers, foundations, and people who support non-profits he says that there is an opportunity and even "an imperative" for foundations to support non-commercial work and help build "a public interest infrastructure" that will "promote the free exchange of knowledge over the Internet."

In specifically emphasizing the need for non-commercial support he says that we cannot rely on the private sector to operate in the broad public interest except as that interest translates into profit:

"While there are billions of dollars in Silicon Valley venture firms seeking to invest in the next Google, Facebook, or YouTube, there is no equivalent capital pool available for investment in the expansion of social enterprises operating in the public interest."

We often make that point here at FGI and extend it to those in government who see their information content as an "asset" and a source of needed dollars and not as a public good that should be in the public domain, freely and openly available for use and reuse. As Stehle says:

"So the real challenge is for grantmakers to figure out how to effectively identify, vet, and support promising new media and information services that put the public interest before commercial profits." [emphasis added]

I believe we in libraries should listen to Stehle's message and think about what it means for grant support for libraries. After all, most (all?) libraries are non-profits, and so many of our best libraries (and certainly our FDLP libraries) explicitly support the public interest, and libraries need funding to do their work.

To put this in a library context, I think we need to think about what libraries have to offer that other institutions and grant seekers do not. As I mentioned in an earlier post, libraries -- because of their values of free, equitable, open public access to information -- are better positioned than anyone else to seek and get funding for those very kinds of activities that Stehle describes.

But, how do we differentiate libraries from others? What are our unique roles? Many libraries are struggling to define their roles and purposes in society. John picks up on this and says that Stehle is one of those who "argue from the perspective, the library/web morphing together into some kind of global resource is a done deal." (I disagree with John on this; I don't see where Stehle says this or anything like it.)

John seems to be saying (correct me if I am wrong) that the center of libraries' responsibilities has shifted because there are new distribution mechanisms and because we have new abilities to make better use of information. He says that it (the role of libraries?) "is something no longer centered on possession and/or control...."

I think this is a grave mistake. While I agree strongly with John that libraries can and should use technology to "knit together the medium of governance (politics, policy, law, and programs) with how our communities use the civic message to inform their daily lives," I also believe that possession and control of information is an essential, primary role for libraries. If we do not possess copies of information and control where it is and control its very existence (keep it from disappearing or being altered or lost), we cannot do the exciting mashups that we want to do.

I also think that, while libraries can and should use technology to "knit" and "weave" information from a lot of different sources (see: collections, services, and "mini-libarians"), I don't think that this is a unique role for libraries -- nor should it be. What libraries can do that is unique, though, is select, acquire, organize, and preserve information and ensure that our services for that information make it possible for others to do their own "knitting and weaving."

In short, libraries can make the case that one of their roles in society is to maintain digital collections that others can use and reuse and mix and mashup. We can make the case that society will lose information if it relies only on information-producers to preserve information for the long term and we can argue that society will lose free, open access if we rely on those who see their "content" as an "asset." We can make the case that libraries are non-profit, public-interest organizations that will guarantee long term preservation and free access to information. We can argue that if the information is not preserved, there will be nothing to share and knit and mash-up. We can argue that libraries facilitate information use and reuse.

But, don't take my word for it. Re-read the excellent article Managing Digital Assets in Higher Education: An Overview of Strategic Issues by Donald J. Waters from 2005 (or my brief summary and comment of it). Or read the paper that Stehle refers to, Sustainable Public Media Infrastructure which describes non-profit organizations that are creating permanent, sustainable public knowledge and communications infrastructure that is designed for public benefit. Then reflect on the primary, central importance of permanent digital collections in libraries.

Digital Government Summits

This morning, I checked my friends' Twitter updates, as I often do. I was intrigued by the discovery that my friend and colleague Michael Sauers would be attending (and Twittering about) the Nebraska Digital Government Summit today. The description of the event makes me think this summit might be of interest to government documents librarians:

As citizens increasingly use technology in the workplace and in their personal lives, they expect government information and services to be readily accessible through technology. The Nebraska Digital Government Summit will provide an opportunity to learn how new and emerging technologies can be used to expand access to services, reduce costs, increase efficiency, and improve public safety.

A quick look at the site reveals that there are similar events in most states. Have any FGI readers ever attended one of these summits? What did you think?

Explaining "Born Digital" Gov Docs to Patrons & Professors

I had to explain to a student patron and their Professor today what is meant by "born digital" and how digital government documents are wonderful resources for a paper if we do not have the print version or when the print version doesn't exist (or is horribly out of date). Have any of you had to explain this a lot?

It all started when the student patron told me she could only have three web sources for her Nursing research paper after I had shown her the wonderful world of digital documents online. She had found an eleven year old version of a government print source in our catalog but I cringed...born digital documents online via NIH or the U.S. Dept. of Health had more up to date medical information on her topic! I told her to use both the print and online sources. She would be able to see if there were any noticeable differences from the 1997 print version and the 2007/2008 online information on her topic.

I contacted the Professor and explained this too. All is well and she will allow for the use of online government information. She was just hoping to avoid the use of too many general (i.e. crappy) websites. I understand that but I wanted to make sure that the student would not be punished for using several good government online documents and websites for her paper.

I didn't get into the nitty gritty digital authentication of government documents, but with some Professors who require legislative research, I tell them about the digitally authenticated documents that currently exist from GPO.

I have a feeling we government document librarians are going to have to explain this concept of "born digital" gov docs and digital authentication more often...especially now that more and more gov docs are being born digitally.

What do you want to know about Archive-it?

I'd like to survey you, our loyal FGI readers. I'm co-presenting with Molly Bragg at next week's Depository Library Council conference about digital collections using archive-it (see title and abstract below). I've got an outline but I'd really like to know what questions YOU have about archive-it and digital collections. What do YOU want to know about archive-it? So, please please please leave a comment here so that my presentation will be even more amazing :-)

Title of Presentation:

Gone Today, Here Tomorrow: Archiving and Preserving Born Digital Government Documents

Abstract:

Stanford University Library has been a federal depository library since 1895. In 2007, the library began collecting born digital documents using Archive-It, the web archiving service from Internet Archive (www.archive-it.org). In this presentation James Jacobs will discuss his group's objectives and procedures for selecting and archiving digital content and share examples of the unique content preserved. Molly Bragg will present an overview of web archiving projects and tools used and developed by Internet Archive. These tools are used by libraries around the world to preserve government documents and other born digital content.

Federal Agencies Digitization Guidelines Initiative

I've been reading and digesting the recently released Federal Agencies Digitization Guidelines Initiative website and the sustainable formats page, so I can discuss it (if there is time) during my presentation at next week's Depository Library Conference.

A dozen federal agencies launched an initiative to establish a common set of guidelines for digitizing historical materials. Two working groups have been established: the Still Image (books, photographs, maps, etc.) and the Audio-Visual Working Group. They have two draft documents currently up for review and comment: Tiff Image Metadata and Digital Imaging Framework. Comments are due on November 15.

I'm also loving their glossary of terms, which "has been generated to serve the participating agencies as a standardized vocabulary for their deliberations and guidelines" and it is "a work in progress" so suggestions are welcome.

Partnering With GPO

GPO recognizes that with the ever-increasing amount of electronic U.S. Government information, we need your help! Since 1997, depository libraries have worked with GPO to ensure permanent public access to electronic content and to provide services to assist other depositories and the public by becoming a GPO partner.

Our recent partnerships include:

Does your library have a project, resource, or service that would benefit the depository library community and the public? Consider a partnership with GPO and have a direct impact upon citizens' access and use of government information. Learn more about GPO's partnership program.

The ever-increasing amount of electronic U.S. Government information requires a team effort.

How do you collect digital documents?

I spend a good deal of time scouring newspapers and Web sites like Docuticker (RSS Feed) and UN Pulse (RSS Feed) in order to add digital government documents to my library's collections. Sometimes I have the url cataloged; or if I think the document is particularly in danger of disappearing, I'll upload them to the Internet Archive's govt documents collection. Below are a few that I've come across in my digitravels recently.

At the upcoming International Documents Taskforce (IDTF) meeting at ALA Annual Conference (GODORT conference schedule here), I'm giving a short presentation about digital collections. I'd really like to hear how/if others are doing digital collection development either randomly or as a matter of course. Please leave a comment and let me know your thoughts, ideas, and hopes. Please include any information you care to share -- what you do, how you do it, if you have favorite haunts/Websites etc.

Internet Archive Slideshow @ Wired.com

The Internet Archive has many fans here at FGI. If you're not familiar with this project, go check out the slide show at Wired magazine about the mechanics of the Internet Archive Book-Scanning project.

"While Google has made headlines over the last two years for scanning thousands of copyrighted works for its Book Search project, the Internet Archive is quietly digitizing around 1,000 public domain titles every day...the text collection on archive.org is the world's largest online collection of free books, with nearly 350,000 titles and growing."

I wrote about creating a digital government documents library with Google Books a few weeks ago, but the Internet Archive also has a plethora of digitized government publications, as pointed out to me in the comments. Since then, I've been happily "bookmarking" government documents of interest to my patrons and my depository. These bookmarked documents can be shared via a wiki subject guide or a social bookmarking tool of your choice.

However, unlike Google Books, there is no RSS feed for recently bookmarked documents, and your bookmarks are not arranged via topic or title order, but by the date you bookmarked them. Maybe these features could be suggested to them or brought up in the forum? You can also contribute or donate to the Internet Archive as well. Nevertheless, the satisfaction you get from using and marketing this non-profit, actual library should be rewarding enough!

5.2 Million 19th Century Passenger Arrival Records Now Online at NARA

The National Archives and Records Administration (NARA) announced the online availability of over 5.2 million records of passengers who arrived at the ports of Baltimore, Boston, New Orleans, New York, and Philadelphia in the 19th century. These records were transcribed from original ship manifests into databases by Temple University's Center for Immigration Research and donated to NARA.

Intrigued, I went to NARA's Access to Archival Databases (AAD) and searched "Records for Passengers Who Arrived at the Port of New York During the Irish Famine" between 1846-1851 (over 607,800 records!), and I found several of my Troy clan ancestors that arrived in 1851. I'll have to compare the names with the extensive family tree that my grandfather made. If he was alive today, he'd be searching this database for hours!

Other record sets include: Data Files Relating to the Immigration of Germans to the United States, 1850-1897; Data Files Relating to the Immigration of Italians to the United States, 1855-1900; and Data Files Relating to the Immigration of Russians to the United States, 1834-1897.

Syndicate content Syndicate content