cloud computing

NARA Addresses The Cloud

The U.S. National Archives and Records Administration (NARA) has a new document that addresses agencies using "cloud computing":

Addressing records management implications associated with cloud computing, NARA notes that, "Various cloud architectures lack formal technical standards governing how data is stored and manipulated in cloud environments. This threatens the long-term trustworthiness and sustainability of the data."

See also: NARA Addresses Cloud Record Keeping, By Elizabeth Montalbano, InformationWeek (February 22, 2010).

The Document Cloud

DocumentCloud is a new service being developed with startup funding from the James L. Knight Foundation. It sounds like an excellent service. It will be software, a Web site, and a set of open standards that will make original source documents easy to find, share, read and collaborate on, anywhere on the Web."

I cannot help but wonder why libraries are not at the forefront of projects like this.

Started by reporters at the New York Times and ProPublica, this service will give individuals and organizations involved in original reporting mechanisms for sharing the documents they obtain and discover and making those documents available to other for new reporting and new uses.

Over two dozen organizations are working on the development of DocumentCloud, including traditional publications and news organizations such as The Atlantic, Chicago Tribune, Forbes, The Seattle Times, Thomson Reuters, Washington Post, and WNYC Radio, as well as organizations that collect and publish documents, such as The National Security Archive, ACLU National Security Project, OpenCRS, and the Sunlight Foundation,

Users will be able to search for documents by date, topic, person, location, etc. and will be able to do "document dives," collaboratively examining large sets of documents. Think of it as a card catalog for primary source documents. DocumentCloud is not meant to be a general document hosting service, like Scribd, Docstoc or Google Docs. Our goal is to build a service that makes source documents easier to find and share regardless of where they are hosted. It is a complement to these services, and not a competitor. the goal is to make documents even easier to find on search engines. DocumentCloud will have information about documents and relations between them, for example what locations, people, or organizations a group of documents have in common. Conceived of by journalists working at ProPublica and The New York Times, DocumentCloud will be managed as an independent nonprofit.

Their FAQ notes: "Will there be an API? Hell yes."

See also: Coming soon: Data mining made easier, By Alex Byers, Nieman Watchdog (July 11, 2009).

Rethinking the cloud

A couple of recent events have caused me to reanalyze and clarify my thoughts about Cloud computing: first there was the GPO Purl server crash and today there's the story about massive data loss from T-Mobile and Microsoft/Danger for anyone using a Sidekick:

"Regrettably, based on Microsoft/Danger's latest recovery assessment of their systems, we must now inform you that personal information stored on your device—such as contacts, calendar entries, to-do lists or photos—that is no longer on your Sidekick almost certainly has been lost as a result of a server failure at Microsoft/Danger."

Ouch indeed!

Cloud computing is basically the outsourcing of Web services (storage, email and other application layers, computational cycles etc) to a third party. Although I am guilty of using the cloud metaphor to describe the digital FDLP, it's clear from the concept map below that I don't mean we should outsource FDLP Web services to third parties. I hope it's clear that I'm describing a collaborative and distributed system of digital content, collaborative cataloging/metadata creation, as well as shared technical infrastructure in which data and technological redundancy and collective and proactive action reign. This is the exact opposite of the "cloud."

So what would that metaphor be? I was thinking of the birch or banyan tree; but it's more like the symbiosis or mutual aid exhibited by certain ants and trees. It's a Peer-to-peer network with a conscience. Let's call it the FDLP ecosystem.

FDLP Cloud

Cloudy Daze

There has been a recent uptick in the movement toward and the hype about cloud computing. The federal government's embrace of cloud computing with its apps.gov store for agencies to easily obtain cloud computing resources is, perhaps, the most visible.

A couple of recent articles provide context and realism to the hyperbole.

  • Legal Implications of Cloud Computing - Part One (the Basics and Framing the Issues), By David Navetta, InfoSecCompliance Blog (September 12, 2009). (Also available on LLRX).

    Bottom line: this is not your father's outsourcing relationship, and trying to protect clients with contracts may be very difficult or impossible unless the cloud computing community begins to build standards and processes to create trust.

    ...there is going to be incredible financial pressure on organizations to take advantage of the pricing and efficiency of cloud computing and if attorneys fail to understand the issues ahead of time there is a serious risk of getting "bulldozed" into cloud computing arrangements without time or resources to address some serious legal issues that are implicated.

  • Demystifying Cloud Computing for Higher Education, by Richard N. Katz, Philip J. Goldstein, and Ronald Yanosky, ECAR Research Bulletin, Volume 2009, Issue 19 (September 22, 2009) [membership required].

    Public clouds are profit-driven and are most effective with those services that are highly commodified. If an IT service can be offered in a standardized fashion without special regard to end user variations, or to local, state, regional, or even national regulatory differences, then that service can be offered as an undifferentiated commodity service—presumably at a great price. In such a case, the dominant legal principle is likely to be caveat emptor—buyer beware—backed by standard contract language shielding the provider from any significant liabilities for process failures or data corruption and loss.

    ...The challenges and risks that will constrain higher education’s adoption of cloud computing relate to trust, confidence, and surety.

    ...Notwithstanding the near unanimous belief that cloud computing is an important enabler of a fundamental shift in the organization and economics in enterprise IT, the (non-hyperbolic) literature and the discussion with community leaders also make clear that at present the topic is mired in hype and near-utopian optimism.

Google's Role in the Government Cloud

An announcement from Google and more articles about the federal government's cloud computing initiative (see: Feds go for the Clouds) help reveal the advantages and difficulties of the initiative.

According to Babcock, "Federal CIO Vivek Kundra made it clear Tuesday that curtailing the constant buildout of federal data centers was one of his goals." The government is investigating ways that "private suppliers of cloud services can be substituted for building more government data centers."

Babcock reports that Kundra recognizes the security issues:

While there is agreement on the general outline of cloud computing, security in the cloud needs to be better defined and implemented, Kundra said.

...Users who ship data to the cloud will need contractual guarantees that it will be maintained with the same level of security as it was in-house, but neither vendors nor users are sure yet how such guarantees can be made.

Google's announcement addresses some of the issues of security and responsibility. Google says that it intends to build a separate "Dedicated Google cloud for government customers in the US."

Claburn reports:

"The government cloud will come from Google-owned-and-operated facilities," said Google Enterprise director of product management Matthew Glotzbach, in a phone interview. "It will be sections of existing facilities. But it will be a fully parallel instance of Google Apps. The difference being we're working with the government to meet the specific needs of government data regulations."

...Google's reticence to allow IT professionals to inspect its data centers as part of their due diligence has been a source of criticism in the past.

So, the emphasis in the government announcements is on cost savings and efficency tempered with an awareness of the need for security and accountability. As I said in an earlier post, it seems that this is probably a step forward for government information technology, but it remains to be seen if it is a step forward for government information.

Has government examined cloud computing sufficiently?

John Foley, editor of InformationWeek, wonders if the government is ready for cloud computing.

...government agencies are moving fast into an area where fundamental issues have yet to be resolved. It may only take one data breach, major service outage, or other cloud mishap to slow things down.

Feds go for the Clouds

In an statement today on the Whitehouse blog, Vivek Kundra, the Federal Chief Information Officer, announced the launch of Apps.gov, "an online storefront for federal agencies to quickly browse and purchase cloud-based IT services."

Apps.gov is not a government built and managed computing environment available to all agencies. It is a storefront of services offered to government by private companies. If I understand this correctly, apps.gov offers government agencies a way of quickly finding an approved service and getting it quickly. If you suddenly need a terabyte of storage more than you needed yesterday, you can just get it, rather than go through a lengthy procurement process and installing storage in your own IT center. If you have a temporary need, you can get just what you want when you need it. These kinds of services are traditionally called "Infrastructure as a Service," "Software as a Service," and "Platform as a Service."

In addition, Apps.gov offers social media tools from Web 2.0 providers as free services. These are governed by a Terms of Service (TOS) Agreement. Social media services include blogs, video hosting, photo-sharing, wall-postings, email, instant messaging, and music sharing. Click on the Apps.gov FAQ for more detail.

This has a lot of potential for making agencies more flexible and more cost efficient. As Kundra says,

Like a utility such as electricity or water, cloud computing allows users to only consume what they need, to grow or shrink their use as their needs change, and to only pay for what they actually use. With more rapid access to innovative IT solutions, agencies can spend less time and taxpayer dollars on procedural items and focus more on using technology to achieve their missions.

But it also outsources a lot of government information technology in ways that make it unclear who will be ultimately responsible for the dissemination and stewardship of government information or for the privacy of users. This is certainly one big step forward for government IT. We'll have to see if it is a step forward or backward for government information.

See also: The US Government Is Going Google, by Jennifer Van Grove, Mashable (September 15th, 2009).

Update: In an article in Tech Daily Dose (U.S. Gov't Takes Cloud Computing Leap, By Andrew Noyes, September 15, 2009), Kundra is reported to have said that moving the government toward a cloud computing climate could require changes to the 2002 Federal Information Security Management Act and that officials must ensure that agencies are not "held hostage" by one particular technology vendor.

Overview of Clouds

There is a nice overview of "cloud computing" with a library perspective in Library Journal:

Purls vs handles

Building on yesterday's post on Critical GPO systems and the FDLP cloud, I've done a little digging into GPO's proposed migration from Purls to the use of "handles." According to RFP 3650 "Handle system overview,"

The Handle System includes an open protocol, a namespace, and a reference implementation of the protocol. The protocol enables a distributed computer system to store names, or handles, of digital resources and resolve those handles into the information necessary to locate, access, and otherwise make use of the resources. These associated values can be changed as needed to reflect the current state of the identified resource without changing the handle. This allows the name of the item to persist over changes of location and other current state information. Each handle may have its own administrator(s) and administration can be done in a distributed environment (my emphasis). The Handle System supports secured handle resolution. Security services such as data confidentiality, data integrity, and non-repudiation are provided upon client request.

Purls and handles do roughly the same thing: they're link resolvers. But, as Larry Stone's 2000 article for MIT's Persistent Naming discovery project, "Competitive Evaluation of PURLs" points out, there are differences that make handles a better choice for long-term operation and persistence. Without getting too technical, handles are not connected to any protocol (i.e., HTTP) or domain (i.e., .gov) and can therefore work regardless of the network design or protocol used. This is extremely important for scalability and persistence over the long term. In addition, handles can do more than resolve to URLs. "The Handle System design allows for various other types of resolution objects, metadata, and extensible addtions to each Handle object record."

In short, handles are more persistent, more scaleable, and can do more. But most importantly in my mind, handle administration, "can be done in a distributed environment." This makes handles perfect for the FDLP cloud because the work of resolving links can be done in a distributed environment. So I say, kudos to GPO for moving to the handle system.

Oh, hold that applause for a moment. My search also turned up the following document from Fall 2007 Depository Library Council meeting entitled, "Handles Council Briefing Topic" (PDF). This briefing document basically describes what I've just said above and describes a gradual transition/migration from purls to handles with an anticipated timeline to, "coincide with Release 1-C of FDsys in 2008." There's a March, 6 2008 report, "Report on the handles beta test" that calls the handles beta test "satisfactory." But no information is available after that report. So what happened?

I know the building of FDsys has been no easy task and that GPO staff have worked really hard to keep to their published release schedule; but I'd like to know why the handles migration didn't occur in 2008. If more testing is involved, I'm sure there are libraries that would be willing to be beta-beta testers for handles. Perhaps this is an opportune time to finally implement the migration to the handles system.

--that is all.

Critical GPO systems and the FDLP cloud

[Update: 10/13/09: I've revised my thinking on the cloud as the term is loaded and doesn't really mean what I'm describing. A friend from the San Diego Supercomputer Center said, "some greybeards are going back to the original metaphor: the grid" and suggested the term "shared digital libraries" which is good. But what I'm describing is more like a biological ecosystem, the FDLP ecosystem. jrj]

Last week's GPO purl server crash should be disconcerting to both the documents community and the public at large (in fact, although the hardware's been restored, resolution is ongoing as I write). I know GPO staff are just as worried about this and are doing everything they can to fix the purl server.


"The PURL Server is currently inaccessible. GPO is working with IT staff to restore service as soon as possible. We regret any inconvenience caused by the server problems. An updated listserv will be sent once service is restored."

But in the meantime, there are 1250+ library catalogs and innumerable links to government documents that are not working. The crash of a critical piece of GPO's infrastructure brings a couple of things to mind:

1) What worries me about this is that FDsys and it's supposed upgrade in hardware/software/systems design is for all intents and purposes the same as GPOaccess. That is, FDsys is a monolith where the failure of one piece can cause the whole system to ground to a halt. As our readers know, we've been advocating for a long time for a distributed digital FDLP (a *true* "digital depository" system!). We're heartened by what we see of FDsys so far, but we need to be building a system with built-in redundancies.

I envision a collaborative and distributed system of digital content, collaborative cataloging/metadata creation, as well as technical infrastructure. With this kind of system in place, a failed purl server will only cause a momentary blip in service as a backup purl server kicks on instead of a several week+ outage. How many system degradations (WAIS) and failures (purl server) until we shift our thinking from "client-server" (with libraries decidedly on the "client" side of the equation) to "Peer-to-peer" concepts and build systems with built-in redundancies that mirror what the FDLP has been for the last 150 years? How long before we build an FDLP cloud?

FDLP Cloud

(**made with IHMC Cmap tools**)

2) There was an interesting discussion of purl server outage on the code4lib list including a good workaround from a technological standpoint (pasted email below). It points to the fact/reminder that what we do within the FDLP has an affect on others in the wider library community (not to mention the public at large!) and that "our" content and the systems built to serve that content is critical for the work of others whether we know it or not. It also points to the need for us to reach out to those communities in order to build systems of use to both end-users as well as those building other systems, mashups, repositories etc. So I would highly recommend that we be *more* proactive in connecting with other communities within the library community (LITA, CODE4LIB, WEB4LIB, ACRL, state associations etc) as well as outside the FDLP (govt transparency community, historians and other academic communities, journalists etc).


------------------ CODE4LIB POST (with added info by James re MARC view) --------------------------------

Thanks to everyone who helped me confirm that the GPO PURL server is down. An official announcement on the GPO Listserv said:

"The PURL Server is currently inaccessible. GPO is working with IT staff to restore service as soon as possible. We regret any inconvenience caused by the server problems. An updated listserv will be sent once service is restored."

While the server is down, here is one workaround (thanks to Patricia Duplantis):

  1. Copy the purl link listed in your library catalog
  2. Go to http://catalog.gpo.gov/
  3. Click "Advanced Search"
  4. Search for word in "URL/PURL", enter the PURL
  5. Click "Go"
  6. In MARC view, the original URL at the time of cataloging should appear in a 53x note.

This incident, however, illuminates a weakness in PURL systems: access is broken when the PURL server breaks, even though the documents are still online at their original URLs.

Maybe someone more familiar with PURL systems can tell me... is there any way to harvest data from a PURL server, so that a backup/mirror can be available?

Keith

--that is all.

LJ interviews LoC

Library Journal has an interesting set of interviews with Library of Congress staff Michelle Springer (on Flickr Commons, Twitter, and blogging), Sally McCallum (on Linked Data and SKOS [Simple Knowledge Organization System]), and Bill LeFurgy (on cloud storage and preservation).

I particularly like this, from LeFurgy:

Q.) What type of an impact do you think cloud computing could have on the library field?

A.) It has great potential. Cloud computing offers the prospect of a distributed preservation infrastructure, which is vital because no single institution can handle the job alone. Most cultural heritage organizations lack the resources and the capabilities to preserve large amounts of digital content—and the volume of data worthy of preservation is growing by the day. Libraries will need to work together as part of a collaborative network to achieve the necessary economy of scale. Services like cloud computing fit nicely within this collaborative concept.

GSA puts its USA.gov Web site in the cloud

NextGov reports that the General Services Administration will be outsourcing the hardware and "the programs that run the federal government's official Web portal (USA.gov) from government servers to those operated by a private company."

USA.gov uses Microsoft "Live search" for indexing and searching and the article does not mention any changes in that. Apparently, the shift is mostly about hardware.

Cloud Computing In Government

Cloud Computing In Government: From Google Apps To Nuclear Warfare, by John Foley, InformationWeek, Feb, 9, 2009.

Jackson says interest in cloud computing is high among government agencies, which see it as a way to cut costs and speed time to deployment. "There are requests for quotations and RFIs coming out, and we're responding to a lot of them," he says.

Syndicate content Syndicate content