We started the LostDocs blog back in September 2009 to collect e-mail receipts for items that were reported to GPO as "fugitive documents" -- agency documents that should have made it into the Federal Depository Library Program and/or the Catalog of Government Publications.
In the process of running this blog, we have identified 40 documents reported since April 2008 that were cataloged by GPO after being reported as "fugitive documents." These fall into the "found documents" category of our blog.
You can find our list of 40 (and counting) cataloged fugitives here. This spreadsheet will be updated whenever we identify new GPO cataloging for items that had been reported as fugitive documents.
The results are interesting and somewhat disturbing, but not definitive.
The 40 items were cataloged in times varying from three days to 524 days. The mean cataloging time was 213 days. The median cataloging time was 184 days or about six months.
If the cataloging times above were typical of all documents reported through the LostDocs process, we think this would be a major problem for GPO that would require some serious soul searching and dialog about how this result could be changed and what tradeoffs and/or extra community involvement would be required as a result.
We are NOT making the claim that these cataloging times are typical for reported fugitive documents. We honestly do not know what is typical. Jim Jacobs, FGI's resident data librarian, had this to say about our sample of cataloged documents:
As for sample size and relevance: the number of items in the sample can't tell us the significance or accuracy of the results. We'd have to know two other things: the size of the universe (of all reported lost docs), and the accuracy of the sample. Since the sample was self- selected (by those reporting) rather than random, and since we don't know if the sample is 1% or 85% of all submitted lostdocs, we can't claim that the findings necessarily reflect the status of the whole universe. (does that make sense? If only people w/ long waits reported to us, our sample does not accurately reflect all lostdocs.)
When we first thought about making lostdocs reports available to the community at large, we first approached GPO with a partnering opportunity. We would maintain the blog, and offer them the opportunity to comment on the blog whether something was out of scope for CGP or already in the catalog. In return, we asked them to modify their LostDocs form so that when they received a report, the blog would automatically get a copy. If this partnership had been accepted, then we would know the two facts Jim cited above that are needed to tell us whether we have typical results or not. GPO declined to accept our partnership agreement, citing their workload. We're not questioning that they are overworked.
We do feel that the results above deserve further investigation. Perhaps GPO could prepare a report on documents cataloged as a result of fugitive reports over the past few years. Unless they've discarded the e-mail receipts (which would be defensible), they have the dates of when documents were reported. The CGP lists when an item was first added to the CGP. They could have an intern make a semester project of putting the two together and then posting the results to fdlp.gov.
If they have tossed previous e-mail receipts, they could start saving them for a year starting in January 2010 and do the analysis we propose above in 2011. But in either case we feel the analysis should be done. If it confirms our results then it will be good ammunition in Congress to procure more cataloging staff or to start cataloging collaborations with FDLP members. If the GPO analysis concludes that items reported to lost docs are in fact cataloged in a timely manner, then that will help build trust with the documents community and motivate more people to report fugitive documents. Either way it is a win-win for GPO.
While poking around the Government Printing Office's (GPO)'s OPAL training site at http://www.opal-online.org/archivegpo.htm, I found a couple of online workshops that I think will be valuable to beginner and expert alike:
Searching for Free Government Full Text Docs Online: Where to Begin? presented in October 2009 by Holly Harper, GPO intern and MLIS student at the University of Washington.
Geology Librarianship and Government Documents presented in August 2009 by Stephanie Earls, GPO intern and MLIS student at the University of Washington.
They appear to run best in Internet Explorer. The recordings were made by two library school interns working with GPO's Robin Haun-Mohamed. The intention was to create programming that would be helpful to generalist librarian and new depository staff.
I think they've done well at this and created some videos that should be shared with non-librarians as well. I publicly thank Robin and the GPO staff that made these possible. You may wish to pause the videos in places to make notes of URLs.
One new thing I learned (or was reminded anew) by the "Full Text Docs" presentation was the ability to browse publications in FDSys by collection, congressional committee or by Date. Use the "last 24 hours" option to see just how much information government is pumping out these days. And that's just a fraction of what's available.
My highlighting these two OPAL presentations should not be interpreted as a slight on the other good material you can find there. Go, watch and explore.
In September 2009 we at Free Government Information (FGI) started the "lost docs blog" at lostdocs.freegovinfo.info to collect your receipts from GPO about the fugitive documents you reported through GPO's lost docs form at www.fdlp.gov/lostdocs or through GPO's Help system at gpo.custhelp.com.
Here is the November Lost Docs Report and Appeal:
Thanks to the continued generosity of documents librarians, we posted 60 reports of fugitive documents submitted to GPO. These receipts were a mixture of old receipts and items actually reported in November 2009.
Of these 60 reported items, 17 items have been cataloged by GPO. You can view this list by visiting lostdocs.freegovinfo.info/category/found/ and looking at the postings with November 2009 dates. We are appreciative of these new records.
In our view, only one of the items reported to GPO and posted to the blog in November were either out of scope for the Catalog of Government Publications or were already in the catalog. You can view this item by visiting lostdocs.freegovinfo.info/category/false/ and looking for items with November 2009 dates.
If you like the concept of a public listing of fugitive documents reported to GPO, there are a number of easy ways to help us:
- If you report a fugitive document to GPO, send your e-mailed receipt to firstname.lastname@example.org. We welcome any item reported to GPO in the past month.
- Visit the blog at lostdocs.freegovinfo.info and comment on the listed items. Comments can include -- Did your library receive the item? Did you find it in the CGP? Do you think the item is out of scope for the CGP? Did you report the item as well and so on.
- Post the blog link to your website or share it on Facebook, Twitter, or other social media.
- Subscribe to the blog feed at lostdocs.freegovinfo.info/feed/
or better yet incorporate the feed into your website or blog.
Congressional Printing: Background and Issues for Congress, by R. Eric Petersen,
Congressional Research Service, R40897 (November 5, 2009).
This report, which will be updated as events warrant, provides an overview and analysis of issues related to the processing and distribution of congressional information by the Government Printing Office. Subsequent sections address several issues, including funding congressional printing, printing authorizations, current printing practices, and options for Congress. Finally, the report provides congressional printing appropriations, production, and distribution data in a number of tables.
220 Years Later, It’s Time to Publish the Constitution Annotated Online in XML, By Daniel Schuman, Sunlight Foundation, (09/17/09).
The Constitution Annotated has been written by the Library of Congress for nearly 100 years, and contains analysis of nearly 8,000 U.S. Supreme Court cases.
Over the decades, GPO has published print versions of this extraordinary resource every two years, with limited electronic versions available from 1992 edition onward. Although the Library of Congress has drafted the Constitution Annotated in XML for a number of years, that data is no longer present when it is published online by GPO.
The Government Printing Office (GPO) has just released 8 new collections into the Federal Digital System (FDsys) -- http://www.fdsys.gov/. That brings the number of collections in FDsys to 21 -- very cool indeed. The new collections are:
- Congressional Directory (105th Congress to present)
- Congressional Record (Bound) (1999 to 2001)
- Congressional Record Index (1983 to present)
- Economic Report of the President (1995 to present)
- GAO Reports and Comptroller General Decisions (1994 to 2008)
- History of Bills (1983 to present)
- United States Government Manual (1995/1996 to present)
- United States Statutes at Large (2003 to 2006)
The Congressional Directory, Congressional Record (Bound), United States Government Manual, and United States Statutes at Large will be available with authenticated digital signatures.
There is a capabilities release schedule with an API and several other useful functionalities scheduled to be operational in 2010, only a few months away.
Given all the hubbub about the GPO purl server crash over 2 weeks ago (and counting), I decided to re-read FDSys Releases and Capabilities version 5.0 (PDF). There's nothing in the document about the migration from purls to handles (which seems to have been put on some back burner in a back closet). There's mention of "System Backup/Restore" (section 4.6.13), but this being a "definitions" document, there's no discussion about *how* the system backup/restore will occur nor how the system "shall support an average peak time availability of 99.7%." I hope that information regarding system infrastructure backup and redundancy is soon forthcoming.
[Update: 10/13/09: I've revised my thinking on the cloud as the term is loaded and doesn't really mean what I'm describing. A friend from the San Diego Supercomputer Center said, "some greybeards are going back to the original metaphor: the grid" and suggested the term "shared digital libraries" which is good. But what I'm describing is more like a biological ecosystem, the FDLP ecosystem. jrj]
Last week's GPO purl server crash should be disconcerting to both the documents community and the public at large (in fact, although the hardware's been restored, resolution is ongoing as I write). I know GPO staff are just as worried about this and are doing everything they can to fix the purl server.
"The PURL Server is currently inaccessible. GPO is working with IT staff to restore service as soon as possible. We regret any inconvenience caused by the server problems. An updated listserv will be sent once service is restored."
But in the meantime, there are 1250+ library catalogs and innumerable links to government documents that are not working. The crash of a critical piece of GPO's infrastructure brings a couple of things to mind:
1) What worries me about this is that FDsys and it's supposed upgrade in hardware/software/systems design is for all intents and purposes the same as GPOaccess. That is, FDsys is a monolith where the failure of one piece can cause the whole system to ground to a halt. As our readers know, we've been advocating for a long time for a distributed digital FDLP (a *true* "digital depository" system!). We're heartened by what we see of FDsys so far, but we need to be building a system with built-in redundancies.
I envision a collaborative and distributed system of digital content, collaborative cataloging/metadata creation, as well as technical infrastructure. With this kind of system in place, a failed purl server will only cause a momentary blip in service as a backup purl server kicks on instead of a several week+ outage. How many system degradations (WAIS) and failures (purl server) until we shift our thinking from "client-server" (with libraries decidedly on the "client" side of the equation) to "Peer-to-peer" concepts and build systems with built-in redundancies that mirror what the FDLP has been for the last 150 years? How long before we build an FDLP cloud?
I had known that the Internet Archive had submitted a response to the GPO's RFP for mass digitization. A friend just sent me the link to the proposal submitted to GPO (embedded below and here's the link to the proposal and supporting documents).
As you can probably guess, we've been pulling for the Archive to get the bid, not least of which because the Archive is a 501(c)(3) non-profit library and we've stated on more than one occasion that privatization of public domain government information is a very bad idea. But also, we've been heartened by the quality of the Archive's scans to date, their openness and willingness to be collaborative in their processes and data access and sharing. Those qualities certainly come through in their proposal for mass digitization -- not to mention the fact that they've actually made their proposal public!
While the award has not been officially announced, we really hope that the Archive wins the award. Perhaps GPO will name them as an official depository library and work with them not only on the "legacy" collection (there needs to be a better description of the deep and rich collections of depository libraries than the somewhat pejorative "legacy" :-| ) but on digital deposit of government documents going forward.
--that is all.
see this link for chat logs and more details on GITCO
TOPIC: GITCO committee structure and our impact within GODORT and beyond
Accessing government information electronically is now common in both US and international contexts. How can GITCO best position itself withing GODORT/ALA and beyond to provide leadership on issues associated with electronic government information?
This session is meant to be a brainstorm -- to collect ideas and examples, rather than to follow each contribution to its conclusion. The room will be open after the session if you would like to add things after the planned session. There is also a brief participant survey which includes a place for feedback.
Agenda for Today's Forum:
* reflections on past projects
* reflections on committee structure within GODORT
*take the survey
Today I attended the "Chat with GPO" OPAL session, which focused on authentication and authentication for FDLP partners.Ted Priebe, GPO's Director of Library Planning & Development (LPD) and Lisa Russell, the Manager of LPD's Content Management unit presented material and answered questions.
Basically, LSCM wants to partner with Federal Depository Libraries and find ways to authenticate content hosted by the FDL partners. The digital signatures of authentication will indicate partnership with the FDL institution and the contact information for that institution. This is great news, especially for those FDLs also interested in hosting digital content in partnership with GPO.
The authentication session is archived on the GPO OPAL site.