Critical GPO systems and the FDLP cloud

[Update: 10/13/09: I've revised my thinking on the cloud as the term is loaded and doesn't really mean what I'm describing. A friend from the San Diego Supercomputer Center said, "some greybeards are going back to the original metaphor: the grid" and suggested the term "shared digital libraries" which is good. But what I'm describing is more like a biological ecosystem, the FDLP ecosystem. jrj]

Last week's GPO purl server crash should be disconcerting to both the documents community and the public at large (in fact, although the hardware's been restored, resolution is ongoing as I write). I know GPO staff are just as worried about this and are doing everything they can to fix the purl server.


"The PURL Server is currently inaccessible. GPO is working with IT staff to restore service as soon as possible. We regret any inconvenience caused by the server problems. An updated listserv will be sent once service is restored."

But in the meantime, there are 1250+ library catalogs and innumerable links to government documents that are not working. The crash of a critical piece of GPO's infrastructure brings a couple of things to mind:

1) What worries me about this is that FDsys and it's supposed upgrade in hardware/software/systems design is for all intents and purposes the same as GPOaccess. That is, FDsys is a monolith where the failure of one piece can cause the whole system to ground to a halt. As our readers know, we've been advocating for a long time for a distributed digital FDLP (a *true* "digital depository" system!). We're heartened by what we see of FDsys so far, but we need to be building a system with built-in redundancies.

I envision a collaborative and distributed system of digital content, collaborative cataloging/metadata creation, as well as technical infrastructure. With this kind of system in place, a failed purl server will only cause a momentary blip in service as a backup purl server kicks on instead of a several week+ outage. How many system degradations (WAIS) and failures (purl server) until we shift our thinking from "client-server" (with libraries decidedly on the "client" side of the equation) to "Peer-to-peer" concepts and build systems with built-in redundancies that mirror what the FDLP has been for the last 150 years? How long before we build an FDLP cloud?

FDLP Cloud

(**made with IHMC Cmap tools**)

2) There was an interesting discussion of purl server outage on the code4lib list including a good workaround from a technological standpoint (pasted email below). It points to the fact/reminder that what we do within the FDLP has an affect on others in the wider library community (not to mention the public at large!) and that "our" content and the systems built to serve that content is critical for the work of others whether we know it or not. It also points to the need for us to reach out to those communities in order to build systems of use to both end-users as well as those building other systems, mashups, repositories etc. So I would highly recommend that we be *more* proactive in connecting with other communities within the library community (LITA, CODE4LIB, WEB4LIB, ACRL, state associations etc) as well as outside the FDLP (govt transparency community, historians and other academic communities, journalists etc).


------------------ CODE4LIB POST (with added info by James re MARC view) --------------------------------

Thanks to everyone who helped me confirm that the GPO PURL server is down. An official announcement on the GPO Listserv said:

"The PURL Server is currently inaccessible. GPO is working with IT staff to restore service as soon as possible. We regret any inconvenience caused by the server problems. An updated listserv will be sent once service is restored."

While the server is down, here is one workaround (thanks to Patricia Duplantis):

  1. Copy the purl link listed in your library catalog
  2. Go to http://catalog.gpo.gov/
  3. Click "Advanced Search"
  4. Search for word in "URL/PURL", enter the PURL
  5. Click "Go"
  6. In MARC view, the original URL at the time of cataloging should appear in a 53x note.

This incident, however, illuminates a weakness in PURL systems: access is broken when the PURL server breaks, even though the documents are still online at their original URLs.

Maybe someone more familiar with PURL systems can tell me... is there any way to harvest data from a PURL server, so that a backup/mirror can be available?

Keith

--that is all.

Average: 5 (1 vote)

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

PURL restoration progress

As of last Friday about 3,300 of the 116,237 PURLs had been restored (2.8%). As of today (Monday, August 31), 3,677 PURLs have been restored (3.1%). That's a restoration rate of 0.3% on one day. Looks like we are in this for the long haul. Since the PURL server has gone down, I have had 23 users denied access to government information via PURLs in our OPAC at the University of Denver.

Keep those purl updates coming

Thanks Chris for the status of purl restoration. It's also good to put a human face on this by analyzing catalog click-throughs. Please keep those updates coming so the community can keep abreast of the purl update status.

Continued PURLs Turnaways

As far as I can see today (Wednesday, Sept. 2) none of the GPO PURLs are working. Since the PURL server has been down I have had 45 users turned away from government information. When I see the records that were clicked on, I go into these records and add an 856 field above the PURL, with a note saying: "Direct access to online version", with a link to the actual URL. For example, see: http://130.253.4.23/record=b3713524~S3

purl server update #3 posted by GPO staff

This was posted to govdoc-l on friday. So it looks like the purl server will be fixed early next week.


Subject: PURL Server Update 3

GPO staff continues to give restoration of the PURL server the highest priority. Our goal is to have it operational and accessible early next week.

A script was executed and it continues to run to rebuild the PURL resolution database. At present the database is populated with more than 87,000 of the 115,000 PURLs. The database is being rebuilt at an average rate of 10,000 PURLs every twelve hours.

Because of the availability of backup files, no data was lost. For additional information from earlier postings about the disruption in the PURL service, please refer to the FDLP Desktop General Announcements section at and display 10 articles.

GPO again apologizes for the inconvenience the PURL server outage has caused and we thank you for your patience as we restore this vital service.

This reminds me that there was a very good suggestion on purl-dev list that I think the depository community should be pushing:

"The PURL server may optionally use the MySQL database for persistence, which can be replicated to another MySQL instance. That seems to me, in conjunction with DNS redirection, to be the easiest way for the moment to provide failover for a PURL system."

Any libraries willing to host a mirror of a purl mysql database?

purl server update #4 posted by GPO staff

GPO sent the following update around yesterday on the latest regarding the purl server crash. As of yesterday, they're up to 114,000 purls out of 15,000+. So it would seem that we're in the home stretch.

GPO staff continues to give restoration of the PURL server the highest priority. At present the resolution database is populated with more than 114,000 PURLs. GPO is continuing to monitor and to work on additional operational enhancements as needed.

For additional information from earlier postings about the disruption in the PURL service, please refer to the FDLP Desktop General Announcements section and display 10 articles.

GPO again apologizes for the inconvenience the PURL server outage has caused and we thank you for your patience as we restore this vital service.

purl server update #5 posted by GPO staff

It seems that the purl server issues are still being worked on. The length of time that this has been going on is disconcerting.


1. PURL Backup Procedures Tested

On Monday, October 5, 2009, GPO conducted testing of new procedures for the backup PURL application and server. The testing took place between 9:30 p.m. and 9:45 p.m. During this time both the server and the application were operational.

The test was successful and once it was completed the IP address was restored to the original PURL production server. Between 10:00 p.m. and midnight there was intermittent access to PURL URLs while network resources re-synchronized. The address resolution protocol cache on the routers required refreshing after the IP address change that occurred during testing. The server and application remained functional.

2. IP Address

At about noon today, 6 October, GPO began experiencing intermittent IP address problems, which affect the PURL servers. GPO IT staff are working to resolve the problem.

GPO apologizes for any inconvenience this may have caused. We thank you for your patience as we work to improve business processes that ensure public access to Government information.

_________________________________

If you have questions or comments, please use the askGPO help service
at: . When submitting a question,
please choose the category "Federal Depository Libraries" and the
appropriate subcategory, if any, in order to ensure that your question
is routed to the correct area.

purl server update #6 posted by GPO staff

well at least this technological bump was quickly troubeshot. more soon.

GPO IT staff identified the intermittent IP address problem, which transpired about noon today, as a bad network interface card. The PURL application is currently running from the backup server and all PURLs are fully operational. We will continue to use the backup server until the primary server is fixed.

GPO apologizes for any inconvenience this may have caused and we thank you for your patience.

Post new comment

The content of this field is kept private and will not be shown publicly.
  • Lines and paragraphs break automatically.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Easily link to terms in various wikis. For help, see <a href="/interwiki/3">interwiki</a>.

More information about formatting options

Syndicate content