GPO LOCKSS Report: Mistakes and Irrelevancies
As I mentioned a few days ago, the final analysis report of the GPO LOCKSS Pilot Project is now available on GPO Access at
The comments that follow are based on analysis by Jim A. Jacobs and Daniel Cornwall.
The most disappointing thing about the report isn’t so much that GPO rejects LOCKSS as a distribution mechanism, but its reasons for doing so. The 19 page report doesn’t read as much like an evaluation of software than as a document defending a previously made decision on the basis of mere assertions, many of which have nothing to do with LOCKSS itself.
We believe the biggest problems with this report fall into two categories; 1) statements about LOCKSS that appeared to be based on a lack of understanding about how LOCKSS works and 2) negative statements made that have little connection to LOCKSS. Under the second class of problems is one so big that Jim will provide a separate posting on this issue. This is the issue of IP authentication. GPO repeats the non-scalability of IP authentication as a major stumbling block to using LOCKSS as a distribution channel. The report never explains why IP authentication is so important to GPO and that will be the subject of Jim’s posting.
So, what else is wrong with this report? Let’s start with some statements about LOCKSS that as a participant in LOCKSS both as publisher and library simply don’t make sense.
Page 12 – LOCKSS may have format migration issues similar to CD-ROMs and other tangible electronic media in depository libraries.
I’m not positive I really know what this means, but if they’re talking about LOCKSS hardware, that’s no more of an issue than having computers in libraries at all. LOCKSS caches are just regular computers as the report itself says. Because content is duplicated on other LOCKSS caches, migrating hardware is as simple as buying a new computer, installing LOCKSS, picking your content and sitting back as content is reloaded from other caches. If GPO is referring to file format migration, LOCKSS researchers have been working on that issue since 2005 and have come up with a promising approach.
Page 12 – Explore options for making content available from a single site that would allow LOCKSS libraries and non-LOCKSS libraries to access content from the same source. This would eliminate duplication of effort required to make content available to both groups of libraries.
If you check out any of the nonrestricted materials available through LOCKSS, whether it is Alaska State Documents or BioMed Central Journals, all content available through LOCKSS is already available to non-LOCKSS users from the same interface. No duplication of effort is needed. Simply have the content available in some kind of human-intuitive archive units, and people and LOCKSS caches alike are happy. Unless GPO has plans for restricting public domain government information, this just isn’t an issue.
Page 12-13 – However, formatting the content to enable LOCKSS use could require more clicks than the current model for non-LOCKSS users to access the content.
All LOCKSS needs are issue dates organized by year or some other suitable Archival Unit. Seems to me that is how most journals are laid out. No more clicks would be involved. GPO could have bolstered it’s case by providing an example, but I just can’t think of one.
Page 11 – If LOCKSS were to become the only distribution method for e-journals distributed to the FDLP, all depositories would have to join the LOCKSS Alliance.
No. Only libraries wishing to build local digital collections and pursuing preservation through multiple copies. Otherwise the model of access instead of custody will continue to operate. Making content available through LOCKSS is a matter of providing a publisher manifest page and writing a LOCKSS plugin. LOCKSS doesn’t restrict your content any more than you already do. No library would be forced into LOCKSS unless GPO chose to make it a requirement.
Finally, there is the unspoken issue that indicates a lack of understanding about LOCKSS. GPO consistently refers to LOCKSS as a system of distribution, but never mentions its preservation value. LOCKSS has proved itself since 1999 in the area of commercial journals. I would think that any evenhanded evaluation of LOCKSS would need to examine its a capability as a preservation system, which is has done for eight years longer than FDSys has.
The second major problem area is that this purported evaluation of the LOCKSS technology is sprinkled with negative comments that have little to do with LOCKSS itself. In this category I’d put:
- Some libraries may want to â€œweedâ€ publications from their caches in order to regain disk space at times. (page 11)
- It is unclear what percentage of FDLP libraries want to utilize LOCKSS for e-journal content. (page 11)
- It is not clear whether libraries that do want LOCKSS want it as an exclusive service to depositories, or whether they simply want to enable libraries to archive content locally. (page 11)
- Many agencies will complain if they believe GPO is taking business away from their Web sites by republishing content on GPO Web sites. (page 11)
- The depository library community has not been surveyed to determine whether there is wide enough support for LOCKSS to use it as the only e-journal delivery mechanism to justify requiring all libraries to receive e-journal content through LOCKSS. (page 12)
What all of the above statements have in common is that GPO doesn’t know the answers to the implied questions, hasn’t asked despite being formally interested in LOCKSS for more than three years, and have no bearing on LOCKSS capability. They really sound like excuses rather than reasons.
Aside from these two major sets of problems, I believe that most of the problems that GPO attributes to LOCKSS apply equally to the Future Digital System (FDSys). Think about it. If GPO serves agency content through FDSys, won’t that take business away from agency web sites? Or when GPO states on page nine “technical sustainability and longevity of the LOCKSS platform as a long-term archiving solution needs to be assessed”, how is this different from the Future Digital System, except that LOCKSS has been around since 1999 and the FDSys is still in development?
Since I’d like to note when I agree with an opponent’s point, I’d like to say that GPO does have a point about the time intensive nature of manually harvesting journals. Their collection times are similar to what I have in collecting Alaska State Journals and annual reports. That’s part of the reason that Alaska’s LOCKSS program focuses on monographs distributed on our monthly shipping lists. This allows it to distribute hundreds of titles in a single plugin in a way that adds at least minimal metadata to the LOCKSS system.
Rather than abandoning LOCKSS because manually harvesting journals is too hard, GPO or some enterprising library or group of libraries should explore the automated download and posting of materials found in GPO’s New Electronic Titles on a monthly basis. Even an experiment with a single item number or small agency might be useful.
In as much as the Future Digital System could produce an output like the new electronic titles page, all they might have to do would be to include a LOCKSS permission statement on their “new titles” results page and let the LOCKSS community write the plugins. They could have their centralized system and interested libraries, some depository and some not, could have assurance that publications would be preserved outside of a chronically underfunded federal agency.