Home » Posts tagged 'purl'

Tag Archives: purl

Our mission

Free Government Information (FGI) is a place for initiating dialogue and building consensus among the various players (libraries, government agencies, non-profit organizations, researchers, journalists, etc.) who have a stake in the preservation of and perpetual free access to government information. FGI promotes free government information through collaboration, education, advocacy and research.

GPO to migrate to PURLZ

Well this is good news to anyone who remembers the great GPO purl crash of ought-9. GPO just announced the contract for upgrading their permanent URL architecture, migrating to PURLZ. I hope they’ll build participation by depository libraries into their new architecture. It would be great to have a failsafe on non-.gov servers as well.

GPO is pleased to announce that a contract for upgrading the PURL Server architecture and hosting the new solution has been awarded to Zepheira. The new PURL architecture provides greater flexibility, new features, and the scalability to face an increased demand for PURLs. GPO is currently in the process of migrating to this new architecture (PURLS to PURLZ.)

This transition boasts many benefits, including:

* A more robust system architecture
* Immediate system back-up through synchronization
* Immediate system fail-over
* Enhanced statistical reporting
* Enhanced Web referral reporting
* Improved speed for resolution of redirects

The targeted date for the transition to this new architecture is summer 2010.

More information will be forthcoming at the Spring Depository Library Council Meeting in Buffalo, New York (April 26 – 28, 2010) and via the FDLP Desktop and fdlp-l.

Could DOIs Solve Three Depository Challenges?

I am not an expert on Digital Object Identifiers (DOI) or Handles or other methods of creating permanent, persistent links to information on the web, so I pose this as a question. Could DOIs help solve three problems that, if solved, would provide better preservation, better access, and a better user experience?

The three challenges are:
1. The need for reliable, permanent, persistent links.
2. The need to provide a simple user interface to depository collections.
3. The need to guarantee authenticity of government information.

Here is why I think the answer is Yes.

Problem: Providing reliable, permanent, persistent links. Currently, GPO uses PURLs (Persistent Uniform Resource Locators) for creating permanent links. PURLs provide “persistent” links so that, when a page moves and its URL changes, that change need only be recorded once — in the PURL database — and all the hundreds or thousand of links to the PURL resolve to the new address automatically without being changed themselves. While this is an efficient way to deal with the dynamic nature of web addresses and, while this system works, it is fragile. We had a graphic demonstration of that fragility last August when the GPO PURL server crashed. When that happened, no one anywhere in the world who relied on PURL links to the 115,000 PURLs pointing to government information could reach that information using those links for more than two weeks. This was not the fault of GPO (athough restoration time could have been reduced with better disaster planning). Rather, the very nature of PURLs makes them fragile in this way and vulnerable to the crash of a single server.

Solution: Persistence is a function of organizations, not of technology. DOIs address the fragility problem by building a social structure that guarantees persistence. As the DOI organization says, “Persistence is a function of organizations, not of technology; a persistent identifier system requires a persistent organization and defined processes. The International DOI Foundation (IDF) provides a federation of Registration Agencies (RA). Dependency on any one RA is removed.” In other words, if one server crashes, others are available immediately. Rather than relying on a single organization (GPO) and a single server at that organization, DOIs rely on multiple Registration Agencies and multiple servers. DOIs are reliable because they use redundancy and have no single points of failure (Wikipedia).

Problem: Providing a simple user interface. Imagine with me for a moment a depository system that deposits digital documents in FDLP libraries. Once such a system is in place, we will have the same document in multiple locations — perhaps one copy in GPO’s Federal Digital System (FDsys), one copy in each of a dozen or more FDLP libraries, perhaps an “original” copy at house.gov or senate.gov, and so forth. What is the user to do? Will libraries show dozens of links with an explanation after each as to what it is and hope users will have the patience to read the explanations, make an informed decision, and, if that particular link is down, go back and repeat the process? This sounds like a lousy user experience to me.

Solution: Multiple redirections. DOIs provide a way to resolve multiple URLs with a single DOI. (Resolution of Multiple URLs). This would mean that multiple copies of digital documents could be stored at many separate FDLP libraries and all could use a single, clickable link (a DOI) that would get users the copy of that document based on criteria the library defines. For example, one library might have the DOI point to the original first and the local library copy second; another library might point to the “network-closest” copy first and then other more distant copies; and so forth. DOIs do this by storing metadata with the DOI. Rather than storing only the current URL of a registered item, DOIs can record a list of locations with hints for how the resolving client should select a location, including an ordered set of selection methods.

Here is an illustration of how it works:

This solution would have the added benefit of enabling and facilitating a true digital depository system in which digital information is deposited into FDLP libraries. FGI is a strong advocate of a depository system that does this for several reasons that we have described repeatedly here and in our writings and presentations. In brief, we believe that this would make it possible for individual FDLP libraries to build their own local digital collections focused on the needs of their own user communities; it would aid preservation by ensuring that multiple copies exist under different technical, financial, and administrative structures; and it would create a better user experience by providing a way to integrate digital FDLP/Title-44 documents with non-Title-44 federal documents, documents from state and local governments, and other non-government information. DOIs would, therefor facilitate preservation as well as access.

Problem: Guarantee Authenticity. How does a user know that the document they just retrieved is “authentic,” that it has not been altered, that it really is what it purports to be? Many people hope for a technological solution (e.g., PKI, time stamps, encryption, digital signatures, watermarks). We at FGI believe that these are techniques that people use and that the authenticity comes, not from the technique, but from users’ trust in the people who set up the techniques.

No one explained this better than Abby Smith (Digital Authenticity in Perspective in “Authenticity in a Digital Environment,” Council on Library and Information Resources, Publication 92. May 2000). She noted that, when technologists were asked about how to establish the authenticity of a digital object, they were skeptical of technological “solutions” and said that “there is no technological solution that does not itself involve the transfer of trust to a third party.”

Solution: Trust is a social phenomenon, not a technical one. So, imagine how this might work. Imagine a document that is in FDsys, and in the digital collections of several FDLP libraries, and also at the New York Times, and at any number of other places on the web. There might be a dozen URLs for that one document. But, if GPO assigned a single DOI to it and made sure it pointed to FDsys and to “Official Depository Copies” at FDLs, that one DOI would, by definition, point to “authentic” copies — the original and those officially deposited in Title-44-authorized Federal Depository Libraries. The “prefix” part of a DOI refers to the registering agency (in this case GPO) and would further help “brand” the DOI as authentic. Users wanting “authentic” government information would look for DOIs bearing the GPO prefix — and they would find what they wanted with a single click, no matter where the particular copy they get is stored. (In addition, the DOI metadata can include authenticity information.)

Precedents. GPO would not be alone in using DOIs. Who uses DOIs? ICPSR, OECD, the European Communities’ EU publications office, CrossRef, and many others.

Barriers. The main barrier I can see to adopting DOIs is cost. I assume that it will surely cost more than implementing PURLs. But the two costs cannot be compared directly because the costs buy different things. Implementing PURLs gets us a fragile redirection system. Implementing DOIs gets us a redirection system of persistent identifiers, the ability to have multiple redirects to multiple copies, and a new way of thinking about authenticity.

I welcome comments and responses to my question and particularly hope that those with more knowledge in this area will fill in the gaps I have left.

Purls vs handles

Building on yesterday’s post on Critical GPO systems and the FDLP cloud, I’ve done a little digging into GPO’s proposed migration from [w:Purl]s to the use of “handles.” According to RFP 3650 “Handle system overview,”

The Handle System includes an open protocol, a namespace, and a reference implementation of the protocol. The protocol enables a distributed computer system to store names, or handles, of digital resources and resolve those handles into the information necessary to locate, access, and otherwise make use of the resources. These associated values can be changed as needed to reflect the current state of the identified resource without changing the handle. This allows the name of the item to persist over changes of location and other current state information. Each handle may have its own administrator(s) and administration can be done in a distributed environment (my emphasis). The Handle System supports secured handle resolution. Security services such as data confidentiality, data integrity, and non-repudiation are provided upon client request.

Purls and handles do roughly the same thing: they’re link resolvers. But, as Larry Stone’s 2000 article for MIT’s Persistent Naming discovery project, “Competitive Evaluation of PURLs” points out, there are differences that make handles a better choice for long-term operation and persistence. Without getting too technical, handles are not connected to any protocol (i.e., [w:HTTP]) or domain (i.e., .gov) and can therefore work regardless of the network design or protocol used. This is extremely important for scalability and persistence over the long term. In addition, handles can do more than resolve to URLs. “The Handle System design allows for various other types of resolution objects, metadata, and extensible addtions to each Handle object record.”

In short, handles are more persistent, more scaleable, and can do more. But most importantly in my mind, handle administration, “can be done in a distributed environment.” This makes handles perfect for the FDLP cloud because the work of resolving links can be done in a distributed environment. So I say, kudos to GPO for moving to the handle system.

Oh, hold that applause for a moment. My search also turned up the following document from Fall 2007 Depository Library Council meeting entitled, “Handles Council Briefing Topic” (PDF). This briefing document basically describes what I’ve just said above and describes a gradual transition/migration from purls to handles with an anticipated timeline to, “coincide with Release 1-C of FDsys in 2008.” There’s a March, 6 2008 report, “Report on the handles beta test” that calls the handles beta test “satisfactory.” But no information is available after that report. So what happened?

I know the building of FDsys has been no easy task and that GPO staff have worked really hard to keep to their published release schedule; but I’d like to know why the handles migration didn’t occur in 2008. If more testing is involved, I’m sure there are libraries that would be willing to be beta-beta testers for handles. Perhaps this is an opportune time to finally implement the migration to the handles system.

–that is all.

Critical GPO systems and the FDLP cloud

[Update: 10/13/09: I’ve revised my thinking on the cloud as the term is loaded and doesn’t really mean what I’m describing. A friend from the San Diego Supercomputer Center said, “some greybeards are going back to the original metaphor: the grid” and suggested the term “shared digital libraries” which is good. But what I’m describing is more like a biological ecosystem, the FDLP ecosystem. jrj]

Last week’s GPO purl server crash should be disconcerting to both the documents community and the public at large (in fact, although the hardware’s been restored, resolution is ongoing as I write). I know GPO staff are just as worried about this and are doing everything they can to fix the purl server.

“The PURL Server is currently inaccessible. GPO is working with IT staff to restore service as soon as possible. We regret any inconvenience caused by the server problems. An updated listserv will be sent once service is restored.”

But in the meantime, there are 1250+ library catalogs and innumerable links to government documents that are not working. The crash of a critical piece of GPO’s infrastructure brings a couple of things to mind:

1) What worries me about this is that FDsys and it’s supposed upgrade in hardware/software/systems design is for all intents and purposes the same as GPOaccess. That is, FDsys is a monolith where the failure of one piece can cause the whole system to ground to a halt. As our readers know, we’ve been advocating for a long time for a distributed digital FDLP (a *true* “digital depository” system!). We’re heartened by what we see of FDsys so far, but we need to be building a system with built-in redundancies.

I envision a collaborative and distributed system of digital content, collaborative cataloging/metadata creation, as well as technical infrastructure. With this kind of system in place, a failed purl server will only cause a momentary blip in service as a backup purl server kicks on instead of a several week+ outage. How many system degradations (WAIS) and failures (purl server) until we shift our thinking from “[w:client-server]” (with libraries decidedly on the “client” side of the equation) to “[w:Peer-to-peer]” concepts and build systems with built-in redundancies that mirror what the FDLP has been for the last 150 years? How long before we build an FDLP cloud?

(**This post was updated on October 26, 2017. The image was originally uploaded to Scribd, but Scribd deleted the document from their servers at some point, probably when they went to a pay-to-play model :-|. JRJ)

2) There was an interesting discussion of purl server outage on the code4lib list including a good workaround from a technological standpoint (pasted email below). It points to the fact/reminder that what we do within the FDLP has an affect on others in the wider library community (not to mention the public at large!) and that “our” content and the systems built to serve that content is critical for the work of others whether we know it or not. It also points to the need for us to reach out to those communities in order to build systems of use to both end-users as well as those building other systems, mashups, repositories etc. So I would highly recommend that we be *more* proactive in connecting with other communities within the library community (LITA, CODE4LIB, WEB4LIB, ACRL, state associations etc) as well as outside the FDLP (govt transparency community, historians and other academic communities, journalists etc).

—————— CODE4LIB POST (with added info by James re MARC view) ——————————–

Thanks to everyone who helped me confirm that the GPO PURL server is down. An official announcement on the GPO Listserv said:

“The PURL Server is currently inaccessible. GPO is working with IT staff to restore service as soon as possible. We regret any inconvenience caused by the server problems. An updated listserv will be sent once service is restored.”

While the server is down, here is one workaround (thanks to Patricia Duplantis):

  1. Copy the purl link listed in your library catalog
  2. Go to http://catalog.gpo.gov/
  3. Click “Advanced Search”
  4. Search for word in “URL/PURL”, enter the PURL
  5. Click “Go”
  6. In MARC view, the original URL at the time of cataloging should appear in a 53x note.

This incident, however, illuminates a weakness in PURL systems: access is broken when the PURL server breaks, even though the documents are still online at their original URLs.

Maybe someone more familiar with PURL systems can tell me… is there any way to harvest data from a PURL server, so that a backup/mirror can be available?


–that is all.