purl

Purls vs handles

Building on yesterday's post on Critical GPO systems and the FDLP cloud, I've done a little digging into GPO's proposed migration from Purls to the use of "handles." According to RFP 3650 "Handle system overview,"

The Handle System includes an open protocol, a namespace, and a reference implementation of the protocol. The protocol enables a distributed computer system to store names, or handles, of digital resources and resolve those handles into the information necessary to locate, access, and otherwise make use of the resources. These associated values can be changed as needed to reflect the current state of the identified resource without changing the handle. This allows the name of the item to persist over changes of location and other current state information. Each handle may have its own administrator(s) and administration can be done in a distributed environment (my emphasis). The Handle System supports secured handle resolution. Security services such as data confidentiality, data integrity, and non-repudiation are provided upon client request.

Purls and handles do roughly the same thing: they're link resolvers. But, as Larry Stone's 2000 article for MIT's Persistent Naming discovery project, "Competitive Evaluation of PURLs" points out, there are differences that make handles a better choice for long-term operation and persistence. Without getting too technical, handles are not connected to any protocol (i.e., HTTP) or domain (i.e., .gov) and can therefore work regardless of the network design or protocol used. This is extremely important for scalability and persistence over the long term. In addition, handles can do more than resolve to URLs. "The Handle System design allows for various other types of resolution objects, metadata, and extensible addtions to each Handle object record."

In short, handles are more persistent, more scaleable, and can do more. But most importantly in my mind, handle administration, "can be done in a distributed environment." This makes handles perfect for the FDLP cloud because the work of resolving links can be done in a distributed environment. So I say, kudos to GPO for moving to the handle system.

Oh, hold that applause for a moment. My search also turned up the following document from Fall 2007 Depository Library Council meeting entitled, "Handles Council Briefing Topic" (PDF). This briefing document basically describes what I've just said above and describes a gradual transition/migration from purls to handles with an anticipated timeline to, "coincide with Release 1-C of FDsys in 2008." There's a March, 6 2008 report, "Report on the handles beta test" that calls the handles beta test "satisfactory." But no information is available after that report. So what happened?

I know the building of FDsys has been no easy task and that GPO staff have worked really hard to keep to their published release schedule; but I'd like to know why the handles migration didn't occur in 2008. If more testing is involved, I'm sure there are libraries that would be willing to be beta-beta testers for handles. Perhaps this is an opportune time to finally implement the migration to the handles system.

--that is all.

Critical GPO systems and the FDLP cloud

[Update: 10/13/09: I've revised my thinking on the cloud as the term is loaded and doesn't really mean what I'm describing. A friend from the San Diego Supercomputer Center said, "some greybeards are going back to the original metaphor: the grid" and suggested the term "shared digital libraries" which is good. But what I'm describing is more like a biological ecosystem, the FDLP ecosystem. jrj]

Last week's GPO purl server crash should be disconcerting to both the documents community and the public at large (in fact, although the hardware's been restored, resolution is ongoing as I write). I know GPO staff are just as worried about this and are doing everything they can to fix the purl server.


"The PURL Server is currently inaccessible. GPO is working with IT staff to restore service as soon as possible. We regret any inconvenience caused by the server problems. An updated listserv will be sent once service is restored."

But in the meantime, there are 1250+ library catalogs and innumerable links to government documents that are not working. The crash of a critical piece of GPO's infrastructure brings a couple of things to mind:

1) What worries me about this is that FDsys and it's supposed upgrade in hardware/software/systems design is for all intents and purposes the same as GPOaccess. That is, FDsys is a monolith where the failure of one piece can cause the whole system to ground to a halt. As our readers know, we've been advocating for a long time for a distributed digital FDLP (a *true* "digital depository" system!). We're heartened by what we see of FDsys so far, but we need to be building a system with built-in redundancies.

I envision a collaborative and distributed system of digital content, collaborative cataloging/metadata creation, as well as technical infrastructure. With this kind of system in place, a failed purl server will only cause a momentary blip in service as a backup purl server kicks on instead of a several week+ outage. How many system degradations (WAIS) and failures (purl server) until we shift our thinking from "client-server" (with libraries decidedly on the "client" side of the equation) to "Peer-to-peer" concepts and build systems with built-in redundancies that mirror what the FDLP has been for the last 150 years? How long before we build an FDLP cloud?

FDLP Cloud

(**made with IHMC Cmap tools**)

2) There was an interesting discussion of purl server outage on the code4lib list including a good workaround from a technological standpoint (pasted email below). It points to the fact/reminder that what we do within the FDLP has an affect on others in the wider library community (not to mention the public at large!) and that "our" content and the systems built to serve that content is critical for the work of others whether we know it or not. It also points to the need for us to reach out to those communities in order to build systems of use to both end-users as well as those building other systems, mashups, repositories etc. So I would highly recommend that we be *more* proactive in connecting with other communities within the library community (LITA, CODE4LIB, WEB4LIB, ACRL, state associations etc) as well as outside the FDLP (govt transparency community, historians and other academic communities, journalists etc).


------------------ CODE4LIB POST (with added info by James re MARC view) --------------------------------

Thanks to everyone who helped me confirm that the GPO PURL server is down. An official announcement on the GPO Listserv said:

"The PURL Server is currently inaccessible. GPO is working with IT staff to restore service as soon as possible. We regret any inconvenience caused by the server problems. An updated listserv will be sent once service is restored."

While the server is down, here is one workaround (thanks to Patricia Duplantis):

  1. Copy the purl link listed in your library catalog
  2. Go to http://catalog.gpo.gov/
  3. Click "Advanced Search"
  4. Search for word in "URL/PURL", enter the PURL
  5. Click "Go"
  6. In MARC view, the original URL at the time of cataloging should appear in a 53x note.

This incident, however, illuminates a weakness in PURL systems: access is broken when the PURL server breaks, even though the documents are still online at their original URLs.

Maybe someone more familiar with PURL systems can tell me... is there any way to harvest data from a PURL server, so that a backup/mirror can be available?

Keith

--that is all.

Syndicate content Syndicate content