Home » Commentary » Wait! Don’t Digitize and Discard! A White Paper on ALA COL Discussion Issue #1a

Our mission

Free Government Information (FGI) is a place for initiating dialogue and building consensus among the various players (libraries, government agencies, non-profit organizations, researchers, journalists, etc.) who have a stake in the preservation of and perpetual free access to government information. FGI promotes free government information through collaboration, education, advocacy and research.

Wait! Don’t Digitize and Discard! A White Paper on ALA COL Discussion Issue #1a

[UPDATE 6/27: COL’s final report is now available online. We’ve added a link to it below along with the draft report]

We here at FGI are all for greater access to government information and have long supported and worked toward a fully digital FDLP. When discussing the future of the FDLP, we believe it is important to create policy based on thorough fact-based analysis, to learn from FDLP history and not repeat mistakes which in the past led to benign neglect of documents collections — many of which were borne out of trying to handle government documents collections on the cheap. For example, lack of adequate cataloging, one of our biggest current problems today, is a direct result of libraries not investing sufficiently in describing FDLP collections.

With this in mind, we have been tracking closely on the work of the ALA Committee on Legislation’s (COL) FDLP Task Force, which was created at the request of COL to “provide their perspectives on options for the future of the FDLP.” The COL FDLP Task Force recently released a collated draft FDLP discussion document (6/27 the FINAL report is also now available). This document will be the main item on the Task Force’s agenda next week at the ALA Annual Conference in Chicago. There is much of interest in the COL Discussion document, and much that is non-controversial. This includes:

  • avoiding duplicative efforts by using the FDLP registry to coordinate digitization efforts.
  • GPO coordinating and facilitating digitization projects (see point 1).
  • Authentication of government publications.
  • Digital deposit. (FGI has long been in favor of digital deposit and we have worked very hard to move forward on digital distribution, including getting the LOCKSS-USDOCS effort off the ground).
  • ALA accreditation including training about govt information (we hope this will include training in services AND collections).

However, we take issue with COL’s issue #1a: “Should libraries be allowed to de-accession and destroy these collections for the greater good of broader on-line access?” The short answer to this question is an emphatic NO. But in order to unpack the issues further and add context and facts to the Task Force’s discussion next week, we’ve written a White Paper, “Wait! Don’t Digitize and Discard! A White Paper on ALA COL Discussion Issue #1a” (PDF attached below).

    download the whitepaper in the format of your choice!

  • PDF
  • mobi (for Kindle)
  • epub (for Kobo and other ereaders)

We look forward to a lively and interesting discussion in Chicago next week!!




Wait! Don’t Digitize and Discard!

A White Paper on ALA COL Discussion Issue #1a
June 2013

By James A. Jacobs and James R. Jacobs

“Many years ago GPO turned over its historical collection to the National Archives and almost immediately we began to regret the absence of a tangible collection.” (Russell, 2003)

The ALA Committee on Legislation (COL) “Discussion Document” (Draft 6-17-13) asks as part of “Issue #1” concerning the digitization of FDLP collections if libraries should be allowed to “de-accession and destroy” collections for the “greater good of broader on-line access.

While there is much in the Discussion Document that is generally agreed upon in the FDLP community, Issue #1 is extremely problematic as it poses a false choice that unjustifiably equates discarding paper with digitization and better access.

The following provides background details and context that the COL Discussion Document lacks. We as a community need to have all the facts in order to discuss and decide on the fate and future of FDLP collections.

A False Choice. 

The COL question implies that online access can only be achieved if libraries are allowed to de-accession and destroy paper collections. This is not true. There is no technical, legal, policy, or procedural requirement or need to destroy paper collections in order to digitize them and provide online access to them. It is not too extreme to say that it is misleading to phrase the question by implying that libraries can only digitize if they are willing to discard paper. If the greater good, online access, and supporting users are indeed the goals, no permission is needed to proceed.

Any library or group of libraries can digitize its FDLP print collections today and provide online access to those digitizations without discussing it with ALA and without asking for permission from GPO. This is “allowed” today.

FDLP libraries that wish to digitize and provide online access to their historical collections can and should move forward on that goal.

Why Link Digitization and Discarding?

Why do both the COL question and recommendation #1a link “destruction” of collections with “digitization” of those collections? It is troubling that COL connects these two very different activities without explanation or justification. This is however, not a new idea. In fact this is a rehash of old ideas that have been considered and discarded before (Housewright, Jacobs).

A variety of reports over the last few years have used three excuses to link these two actions.

  1. Costs. This argument appears in a number of variations: Paper collections are too costly to maintain; space is at a premium and libraries must remove books to repurpose space; providing digital access is cheaper than providing paper access. This is such a big topic that we examine it separately below, but, briefly: Those who say they cannot digitize unless they are allowed to discard paper collections should explicitly explain what if any cost savings digitization will bring about and how those savings will be spent. If the cost savings are not spelled out, this argument should be treated, at best, as a starting point for a long discussion of costs rather than an ending point for policy making.
  2. Technology. Some digitization processes — including “Google Books” — destroy the original. This is sometimes done because the original item is tightly bound and the binding must be cut off to accurately scan individual pages. In other cases, digitization projects find it cheaper to unbind books than to use non-destructive methods of scanning. Even when destructive scanning is used, there is no reason to create an expansive policy that allows all libraries to destroy all copies of a book even if one library destroys one copy of that book during digitization.
  3. Law. Because FDLP libraries must follow FDLP procedures that regulate what can be discarded and by whom, some argue that they won’t digitize unless they are free to discard. There is nothing in Title 44 that prohibits any library that wants to provide online access to its paper FDLP collection from digitizing all or part of that collection. So, this argument does not provide a justification; it just asserts that libraries want to discard and it relies on one of the above two arguments as an actual justification. This rhetoric is, nevertheless, sometimes used to imply that digitizing and discarding are inseparable. They are not, and this is a false justification.

In short, the question in the COL document offers a false choice between digitizing and discarding on the one hand, and digitizing and not discarding on the other. The question as posed misleads; any serious consideration of it must be based on a clearer understanding of the issues that are glossed over in the brief text that accompanies COL Issue #1a.

Access to Paper Copies Is Important.

The COL document apparently suggests that, once FDLP collections are digitized, libraries should be able to rely on the digital copies for access and thus be allowed to destroy their paper collections. This model ignores the need for paper access copies. A better approach to collection management and digitization would first study the advantages of and needs for access to paper copies rather than assume that they will be unnecessary or unwanted after digitization. That libraries should have paper copies for access is not a controversial idea. Even John Burger, the Executive Director of the Association of Southeastern Research Libraries, who promotes such digitize-and-discard projects, says libraries need to retain an adequate number of paper copies for direct user examination. But the COL document apparently ignores such ideas.

COL implies that the number of paper copies in the world can be reduced to a small number for preservation and the rest can safely be de-accessioned and destroyed. This model of treating paper copies as emergency preservation copies-of-last-resort in a vault is not the only available model, nor is it the best. An alternative model suggests that we should consider the need for users to have access copies of paper documents and that we should keep an adequate number of working, usable, loanable copies geographically near their users. In addition, FDLP libraries should strive to bibliographically connect digital copies with paper copies. “Every document its reader,” and “every format its use” to paraphrase Ranganathan’s 5 laws of library science.

These are not abstract or Luddite ideas; they come out of a common sense approach to collection management and digitization. There are at least two reasons for them. First, until and unless we can guarantee that our digital copies are one hundred percent accurate and complete, paper copies will continue to be needed by users. Second, some FDLP items will be difficult to deliver digitally and the paper copy will continue to be easier to use.

This may vary over time as there are changes in the user-computing environments that are widely-available and that are preferred by users. But current digitization practices do not even match the capabilities and limitations of some of today’s popular technologies. For example, some items — such as large format books with high text density, color plates, maps, etc., 1000- page documents with tabular data, or the Congressional Record(!) — are not easily usable on small-screen, colorless, e-book devices.

Taken together, these ideas mean that some users will require and prefer the original paper format over the digital format. That means that libraries will need to provide access copies as well as preservation copies. Access copies are more subject to loss and damage than copies saved for preservation-only. That means we should also ensure we have enough copies to provide replacement access-copies for the long-term.

Before libraries consider destroying their paper collections, they should consider access as well as preservation as a reason to retain paper copies. This also raises another issue that we address below: How do we determine what “an adequate number” is?

Access Is Not Preservation.

The COL document provides a vague and confusing view of access and preservation. It says that “digitization can assist in preservation, but is not, itself, a preservation format” and it recommends having “enough” paper copies as determined (apparently) by a preservation plan. It therefore appears to be suggesting that libraries will rely on digital copies for access and a few paper copies as copies-of-last-resort for preservation. The document does not specifically advocate the preservation of digital copies or digital copies as a preservation format. Nevertheless, if libraries are to rely on digital copies for access, someone must assure that those digital copies are preserved or we will lose long-term access.

This creates two problems for the COL discussion point and recommendation. First, if COL is suggesting that libraries rely on paper copies as the only preservation format, then it should not recommend “allowing” the destruction of paper until we know how many paper copies we need to achieve the paper-as-preservation-format goal. It would only be logical to propose a policy of paper destruction if a “comprehensive preservation plan” eventually determines that the world needs fewer paper copies than we have. To propose such a policy without such a determination is both premature and reckless.

Second, creating digital copies solely for access is a different process than creating preservable digital copies, which takes more care and more expense. If COL does indeed intend these to be preservable long-term digital copies, it should be more explicit about that. Proposing digitizing for access when digitization for preservation is needed would be shortsighted and foolhardy and would confuse the cost issue.

Access is not preservation and digital access (the stated purpose of digitization in the COL document) is not the same as digital preservation. Even the word digitization is a vague term that can mean many things (FAGDI). Digitization does not magically preserve the original information. In fact, the information captured in many digitizations of books is often either incomplete or damaged or both. (One of many examples of this is shown in a recent FGI post about a Department of Commerce publication “Commercial Handbook of China” [http://freegovinfo.info/node/3960].)

Even the narrow function of “access” requires more of a commitment to standards than the COL document provides. This is necessary in order to ensure that the information in the original is not corrupted or lost during digitization. The OAIS digital preservation standard has a specific set of criteria for migrating information from one format to another. This involves identifying the “Transformational Information Properties” of the original Content Information. (Consultative Committee for Space Data Systems 2012). The standard for Trusted Digital Repositories (TDR) requires that such repositories verify ingested content for completeness and accuracy (Consultative Committee for Space Data Systems 2011). Proposing the destruction of the original information packages (the books) without also proposing the application of such standards would almost certainly result in the permanent loss of information.

Proposing destruction of paper copies to achieve digital access is simply a bad idea unless the concepts outlined here are also addressed. Assuming that a few paper copies will be “enough” for preservation is questionable. Advocating a policy that encourages destruction of paper collections before addressing the issues of preservation and access is premature.

How many copies?

The COL document implies that it will be acceptable to discard our paper FDLP collections when we have a comprehensive preservation plan and “enough tangible copies.” While it is true that the FDLP community does not yet have a comprehensive preservation plan, there is no evidence that warrants COL’s apparent prediction that we will need fewer paper copies in the future.

Until we know how many paper copies are needed to ensure long-term preservation and access, it is unwise to propose policies that will have the effect of destroying paper collections. The existing studies that address this issue (e.g., Schonfeld, Schottlaender, Yano) do not provide adequate information to apply them to our FDLP paper collections. Those studies mostly focus on substituting digital surrogates for paper journal articles, which are a relatively homogeneous body of literature about which we can make generalizations. We do not have a study that examines the accuracy of digitizing and preserving a very heterogeneous body of literature such as government publications that vary widely in age, format, size, original paper quality, and many of which have brittle and yellowed paper and contain much information that is difficult to accurately digitize (e.g., tables of statistical information, charts, graphs, photographs, drawings, foldout maps). It will, in fact, be difficult to generalize about such a heterogeneous body of literature.

As noted above, if we choose to ignore the need for paper copies for access as well as for preservation (as the COL document apparently does), we may destroy copies that we later need for access. This would be unwise, to say the least.

A rational digitization process must consider how decisions we make today may limit or expand our ability to deliver content both today and in the future and consider the potential need to re-digitize in the future. Re-digitization may be desirable as technologies for digitization improve and as the technologies for digital delivery and digital use and re-use evolve. Future digitization technologies may require destructive digitization. To be able to meet future needs we will need to keep enough paper copy originals for more than one re-digitization.

In short, there are many reasons to keep paper copies and too many unresolved research questions to suggest that we know enough to destroy our paper collections.

Quality of Digitization.

Before libraries consider discarding and destroying paper collections they should address the issues surrounding the quality of digitizations. Although there are many standards for digital production, we need to also consider user-based standards to ensure that the production standards we choose will produce digital objects that meet the needs of users. There are many ways to digitize and they result in different products with different utility. Before we discuss destroying paper collections based on unspecified promises of unspecified digitization processes we should first consider two specific user-oriented issues related to the quality of any digitizations we might wish to rely on.

First, we must be sure that any page-image digitizations that we wish to rely on are accurate and complete (Jacobs and Jacobs). As noted above, the physical and content characteristics of government publications make them difficult to digitize adequately at a reasonable cost (GPO, 2004).

Second, we must be sure that text-extraction from digital page-images is accurate and complete and meets the use requirements and expectations of increasingly sophisticated users of digital content. Our experience so far with digitization provides ample evidence of widespread incomplete and inaccurate Optical Character Recognition (OCR). Currently, there is no standard for expressing the accuracy and completeness of OCR.

Allowing Destruction or Breaking a Commitment?

As noted above, COL question #1a uses the word “allow” and the recommendation only implies that destruction will occur. Neither the question nor the recommendation requires destruction. Those who support this proposition will undoubtedly argue that it will provide all libraries with more flexibility and will not require any library to discard anything. Indeed, some libraries have repeatedly expressed the desire for the flexibility to substitute digital copies for paper copies. There is, however, a serious problem with such an argument.

Every FDLP library has made a commitment to the government and to the American People to provide access to FDLP information (Hoduski). The commitment to access, combined with the administration of the program, has also successfully preserved this information. The FDLP program has a demonstrated history of successful long-term preservation and access.

Before suggesting that libraries renege on their existing, successful commitment, they should present a reasonable substitute plan that assures long-term preservation and access. COL provides neither. We have no comprehensive preservation plan so it is premature to suggest that we can assure long-term digital preservation. And, COL provides no substitute plan for maintaining (much less improving) access. Although we agree that it is possible to enhance access (and service) to paper collections through digitization, and possible to provide long-term digital preservation, it would be irresponsible to destroy our ability to provide access to paper without a specific plan that creates a new commitment to levels of access and service.

Replacing an existing commitment with no commitment is simply unacceptable.

Service.

When libraries consider digitization, they should consider service for those collections as well as access to those collections. The COL Issue #1a does not mention service. This is a significant omission. Arguably, digital collections need as much attention to service — if not more — as paper collections do. The OAIS model provides a minimal service model, but even it specifies that digital collections need collection management, preservation planning, and services for discovery and delivery of content. A more complete and realistic model for service provision for a digital collection will include the provision of user-friendly interfaces, APIs, and discovery tools. A more library-oriented model will recognize the need for dedicated staff with collection experience and knowledge who can provide interactive services and respond to user feedback in order to develop new tools over time. Collections without services or with inadequate services should not be an option for libraries seeking to provide value to their user communities. This again brings us to the question of costs.

Cost.

As noted above, the argument for the destruction of paper collections often boils down to economics. Libraries will argue that they cannot afford to provide preservation and access to both digital and paper collections. Indeed, providing adequate services, access, and delivery of digital information can be expensive when done well. But it is disingenuous to claim that libraries have to digitize and discard because of costs if one does not also account for the full costs of providing the new service. It is misleading to claim cost savings without showing which costs will be saved and how much will be saved, and without specifying how the savings will be used. To propose an irreversible policy (destroying collections) before providing such details is irresponsible.

Although we have seen proposals that casually suggest that digitization won’t cost more than twelve cents per page, such suggestions are at best over-simplified and misleading. At worst, they are grossly inaccurate by orders of magnitude.

Although cost-tradeoffs can be complex and can vary from library to library, it is easy to grasp the essential issues involved. No digitization policy should be based on vague promises of low costs or cost savings. Any claims of cost savings that do not account for all the costs that a digitization project will incur are incomplete. Any proposal that does not describe the effects of the policy on collections and services is incomplete. Incomplete proposals should not be used as a basis for policy.

  • Purpose, Functionality, and Intended Use of Digitization.

    Before planning a digitization project, it is vital that the project specify the intended purposes of the digitizations. As noted above, the COL document does not do so. There are different costs associated with different uses. For example, a higher quality of digitization is needed if they are meant to replace rather than supplement paper copies. Accounting for intended use and user needs should also include the entire life-cycle of the digitizations. In planning for use, it is necessary to anticipate and account for changing user needs and expectations and changing computing environments. Projects that fail to accurately account for changing computer capabilities can cause problems or additional costs later in the life of the information (Marks).

  • The Costs of Digitization.

    The costs of digitizing have been demonstrated to be anywhere from twenty-two cents per page (University of Michigan) to more than eight dollars per page (Nichols). One study of digitizing statistical tables (of which there are many in government publications) demonstrated a cost as high as $3.55 per table. The reason for these variations is that there is no one, single, standard procedure that everyone uses and that has a known, fixed cost. Every digitization project must make many decisions and each decision affects the quality and utility of the resulting digital object. Projects must choose procedures and methods and hardware and software and how to chain them together. Even with all these choices made, there are choices of standards and choices of outputs. Every one of those decisions is associated with a cost. Some of these costs are not within the control of the digitizing project. For example, the cost of digitization varies with the quality, size, and nature of the original material being digitized. It is unlikely that a single cost-per-page estimate can accurately account for the digitization requirements of a large body of heterogenous literature like the FDLP collections. No digitization project should be undertaken without a realistic specification of the costs and a tested specification of the quality of output based on the target collection to ensure that it matches the intended purposes of the digitizations.

  • The Cost of the Life-Cycle of information.

    The cost of digitization is only the first of many costs. Studies have shown that the cost of digitization is only about one third of the life cycle cost (University of Michigan, UNESCO). In addition to the creation of digital objects a project must account for the costs of keeping and providing access to the digitizations. Some of these costs may be more expensive for digital collections than for paper collections. One study demonstrated that digital books are 208 times more expensive to store than printed books (Chapman). The OAIS model for preservation specifies six such functional activities: Ingest, Storage, Management, Administration, Preservation Planning, and Access. In addition to these, a library should also include a Service Function. The “cost of digitization” should factor in all these functions, not just initial creation of digital objects.

  • Long-term Costs.

    In addition to accounting for the costs of the life-cycle of information (ingest, preservation and management, discovery and delivery), it is also necessary to account for the costs of the life span of the information. Projects that intend to provide long-term access must account for the ongoing functional costs for the life of the information. This may be longer than the life span of any individual library or archive.

  • Use of Cost Savings.

    If a policy can demonstrate cost-savings, it should also specify how those cost-savings will be used. If, for example, there is a demonstrable cost savings from the destruction of FDLP collections, will the cost-saving be earmarked for digital FDLP collections and services, or will they be redirected to other collections and other services? It is also important to note that cost-avoidance may not produce any funds for replacing lost collections or services. Studies that examine and promote the cost-avoidance and cost savings accrued when libraries reduce their physical inventory recognize that cost-avoidance does not mean that cost-savings “would actually be available for redirection in support of other operations” (Malpas).

  • Non-monetary Costs: Measuring the Value of the Library.

    Most if not all FDLP libraries are non-profit organizations. They measure their worth in their value to their users. It is, therefore, essential that individual libraries include how a policy will affect their value to their users. Some of the literature that examines and advocates “consolidation” of paper collections (i.e., discarding copies) and managing a centralized digitized collection explicitly promotes outsourcing of collections and services. Malpas, for example, assumes that the cost savings would come from outsourcing the management of digitized books. Libraries should determine if policies such as the COL proposal would shift the value that users get away from the individual library and toward the outsourced services. In the long run, this will reduce the role of the individual library to that of a business office that buys services from the outsourcing vendors.

  • The value of an individual library is increased when it selects and controls its collection and provides services and collections to its designated communities. If a library reduces its monetary costs but in so doing loses its ability to select and control a collection designed for its user community, it pays a non-monetary cost and loses value to its users.

    Librarians that are considering digitizing and discarding their collections should examine what the role of their library will be in the resulting future. Will their library have control over what is in the new digitized library or will some other agency or company have the control over what is added and what is discarded? Will their library have control over the selection of information provided to their users, or will they have to present a large, monolithic collection made up of “everything” contributed by many libraries? Will they have control over discovery tools and user interface, or will those too be controlled by others? Will they have control over the digital objects delivered to their users and the usability of those objects, or will someone else decide what is an acceptable level of usability? Will they be able to integrate their new digitized collections with other digital collections, or will they create another walled-off silo of information? Will they have control over the APIs to their collections, or will someone else provide a generic API?

    In short, who will control what users get? Will they be substituting a monolithic one-size-fits-all library for a library designed for designated communities? As we think nationally, will it be good to provide the same collection and the same tools to undergraduates and graduate students and K-12? Will lawyers and physicians find the same collections and the same tools for access equally user-friendly and effective? Will those who want to find the current population of their city and those who want to analyze demographic data over time both be satisfied with a single collection with a single user-interface?

Conclusion.

The digitization of FDLP historical collection promises many things: better discoverability, enhanced usability, better access, and more. We believe strongly that the digitization of the Public’s historic collections should and will happen. We also believe that it is important to move into this new future by thoughtfully planning to deliver those promises. To propose that we must destroy our paper collections in order to digitize them without a clear plan to meet our commitments for long-term preservation, access, and service is unjustified. To insist this is the only path to digitization is misleading and irresponsible.

End notes

Burger, John. “ASERL response to LJ Op-Ed, “The Future of the FDLP: From Conversation to Confrontation.” ASERL-Selectives Mailing List. (Dec. 16, 2011).

Chapman, Stephen. 2006. “Counting the Costs of Digital Preservation: Is Repository Storage Affordable?” Journal of Digital Information 4(2).

Consultative Committee for Space Data Systems. 2011. Recommended Practicepreservation, Issue 1 Audit and Certification of Trustworthy Digital Repositories. Magenta Book. Washington, D.C.: Consultative Committee for Space Data Systems.

Consultative Committee for Space Data Systems. 2012. Reference Model for an Open Archival Information System (OAIS). Magenta Book, issue 2. Washington, D.C.: Consultative Committee for Space Data Systems.

FADGI. Still Image Working Group. A Resource List for Standards Related to Digital Imaging of Print, Graphic, and Pictorial Materials. Federal Agencies Digitization Guidelines Initiative. January 28, 2010.

Hoduski, Bernadine Abbott. “Who Is Protecting the People’s Property?” SRRT Newsletter, Issue 178, (March 2012).

Housewright, Ross and Roger C. Schonfeld. Modeling a Sustainable Future for the United States Federal Depository Library Program’s Network of Libraries in the 21st Century: Final Report of Ithaka S+R to the Government Printing Office, Ithaka S+R, (May 16, 2011).

Jacobs, James R. “Public comments and response to Ithaka S+R Models draft report,” FreeGovInfo (March 1, 2011).

Jacobs, James A., and James R. Jacobs. 2013. “The Digital-Surrogate Seal of Approval: a Consumer-oriented Standard.” D-Lib Magazine 19(3/4). (March 15, 2013).

Marks, Joseph. “National Archives’ first Wikipedian in residence to bring more holdings to the public” NextGov (07/11/2011).

Nichols, Stephen G, and Abby Smith. 2001. Appendix VI: “Comparative Costs for Book Treatments.” in, The Evidence in Hand: Report of the Task Force on the Artifact in Library Collections. Washington, D.C.: Council on Library and Information Resources. publication.

Russell, Judith. Remarks by Judy Russell, 142nd ARL Membership Meeting, 142nd ARL Membership Meeting, Federal Relations Luncheon (May 15, 2003).

Schonfeld, Roger C., and Ross Housewright. 2009. What to Withdraw: Print Collections Management in the Wake of Digitization. Ithaka S+R.

Schottlaender, Brian E.C., Gary S. Lawrence, Cecily Johns, Claire Le Donne, and Laura Fosbender. 2004. “Collection Management Strategies In A Digital Environment, “A Project Of The Collection Management Initiative Of The University Of California Libraries, Final Report to the Andrew W. Mellon Foundation. University of California, Office of the President, Office of Systemwide Library Planning.

U.S. Government Printing Office. Report on the Meeting of Experts on Digital Preservation: Metadata Specifications, Washington, D.C.: U.S. Government Printing Office (14 June 2004).

University of Michigan Digital Library Services. “Assessing the costs of conversion : Making of America IV : The American voice 1850-1876.” (2001).

UNESCO. “Memory of the World: Documenting against collective amnesia.” In Focus. (2012)

Yano, Candace Arai, Z.J. Max Shen, and Stephen Chan. 2008. Optimizing the Number of Copies for Print Preservation of Research Journals. Berkeley, CA: University of California Berkeley, Industrial Engineering & Operations Research.

Authors

James A. Jacobs is Librarian Emeritus, University of California San Diego. He has more than 20 years experience working with digital information, digital services, and digital library collections. He is a technical consultant and advisor to the Center for Research Libraries in the auditing and certification of digital repositories using the Trusted Repository Audit Checklist (TRAC) and related CRL criteria. He served as Data Services Librarian at the University of California San Diego from 1985 to 2006 and co-taught the ICPSR summer workshop, “Providing Social Science Data Services: Strategies for Design and Operation” from 1990 to 2012. He is a co-founder of Free Government Information.

James R. Jacobs is the Federal Government Information Librarian at Stanford University’s Cecil B. Green Library and program lead for the LOCKSS-USDOCS program. He is a member of the Government Documents Roundtable (GODORT) of the American Library Association and has served on Depository Library Council to the Public Printer, including as DLC Chair from 2011 – 2012. He is co-founder of Free Government Information and Radical Reference and serves on the board of Question Copyright, a 501(c)(3) organization that promotes better public understanding of the history and effects of copyright, and encourages the development of alternatives to information monopolies.

CC BY-NC-SA 4.0 This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.


6 Comments

  1. “Until and unless we can guarantee that our digital copies are 100% accurate and complete….”

    Amen. We can only safely fully remove paper one we have transitioned to viewing the digital byte-stream as normative – not the paper.

    We can only do that, IMO, once we start treating legal corpora as ledgers with complete, open audit-trail for all transactions against the corpus.

    See What lawyers need to learn from accountants.

  2. For the most part, I agree with most of the points that were made in your white paper particularly those salient arguments related to the importance of preservation and digitization. I would even agree with your comment that the question “Should libraries be allowed to de-accession and destroy these collections for the greater good of broader on-line access?” is a false choice.

    However, I’m not wholly sold on your arguments supporting your assertion that “libraries SHOULD NOT de-accession and destroy these collections for the greater good of broader on-line access”. Please take note of the subtle change, because this is really the statement that your white paper is addressing.

    The question that I wrestle with is how does the government information community move forward to create the type of digital environment that both of you have consistently talk about in your papers? We may have not liked the original quality of Google Books wand, but for many of us in the library community the positive outcomes were:

    -emphasis placed on cataloging our documents collections
    -a trusted repository was created
    -some form of access was attained

    It’s unfortunate that the important issues of preservation, digital meta-data and tangible copies got lost in the Google mass digitization efforts. There were actually some pretty interesting discussions happening in 2004 from “experts in digital preservation”. However, I digress.

    So why am I supportive of destructive scanning? First, it is an exaggeration to think that many libraries are considering doing this wholesale. This is most likely being considered by larger institutions that have either the resources to do this type of scanning or have a relationship with Google. Second, although destructive scanning is not necessarily “preservation quality” most would certainly agree that it is a better quality scan then “the wand”.

    There are many other “cost” benefits that your paper questions that I will not attempt to address, because the single most important reason why I support “destructive scanning” is that it brings to the forefront the need to address exactly the issues stated in your paper. In sum, it forces the community to seriously consider coming up with a plan that can be implemented and move beyond talking about what “should be” and talking about what “can be”.

    In keeping with the “half-full” metaphor, there a number of pretty cool initiatives that I think provide some working models. The ASREL disposition database and their “Center of Excellence” have amazing possibilities for creating those tangible collections that you are discussing in your paper. The TRAILS project is an excellent model for creating an inventory and systematic approach for digitization. Finally, I’m optimistic about HaithiTrust new initiative to create a registry for FDLP documents.

    Thanks for sharing your thoughts on preservation and reminding us of the tenuous nature of digital objects as we move forward. It’s my hope, however, that we can see “destructive scanning” as only a step in the process of creating digital content that is useful for our users.

    Stephen

  3. [Editor’s note: we posted the following response to Stephen Woods on Govdoc-L and copying it here for the public record. jrj]

    Hi Stephen, (apologies in advance for the long response)

    Thank you for your thoughtful reply! As you know, Jim and I have been at the forefront of digital govt information efforts for some time and so come to this discussion with both passion and experience. And we agree that there’s much in the COL document that is non-controversial (avoiding duplicative efforts, authentication, digital deposit, better training and ALA accreditation etc). But we’ve been working on 1/2-full / 1/2-empty metaphors for too long and this leaves us thirty for more 😉

    We apologize if we were not clear. What the white paper is intended to do is suggest that the COL question should be rejected and more thought should be put into any future such proposals.

    We did *not* mean to propose the alternative question or alternative policy (as you attribute to us: “libraries SHOULD NOT de-accession and destroy these collections for the greater good of broader on-line access”).

    We *do* believe that libraries should not de-accession and destroy their collections unless and until we have a better understanding of the issues involved and a more specific proposal with commitments on the table.

    Is it possible to demonstrate that de-accessioning and destruction of paper collections will automatically lead to a greater good? We invite those who promote such policies to spell out how this is so. and be specific about the commitments necessary to achieve that greater good and specify the total costs of digitization, the intended purposes and uses of digitizations and their associated Designated Communities, and how a specific plan with those costs will meet those goals. We object to a policy proposal that *guarantees* reneging on an existing commitment (to paper collections and their preservation and access) and replaces it with a vague, unspecified promise of “the greater good.” (insert bird-in-the-hand-two-in-the-bush metaphor here). And really, won’t the “greater good” be serviced by having BOTH digital AND paper formats to meet the most user needs possible? (I’m reminded of that silly “it’s not complicated” ATT commercial http://youtu.be/F0FL1AzCAJ8 :-)).

    We believe, 1) that there is no reason to *inextricably* link destruction of paper and digitization, which is what the COL question does. 2) we also believe that the question as worded along with the proposal and accompanying text did more to muddy than clarify the issues of access, preservation, service, and the advantages and disadvantages of paper and digital.

    We applaud the emphasis that HT, ASERL and others have done on documents cataloging and look forward to the HT govt documents registry!

    It is certainly true that “access” is cool. We love better access! It is also true that HT is a trusted repository *for what it ingested*. Although HT (and UMich as one of the original google books partners) had some say in the quality of what google did, the digitization quality and purpose was driven by google with “access” being the driving force. Google has always been most interested in *words* not *books* so 100% completeness was never their goal.

    We would argue that when libraries look to reformatting paper to digital, we should think more than google did about what that means in terms of preservation and in terms of future uses by future users of the digital objects we create today. It might be more accurate to say “better” (or better access) should not be the only criteria used for digitization. Access is great, but what about preservation, discoverability, usability, long-term flexibility, accuracy, completeness? The stakes are raised much futher on these questions when focusing on access only also results in destruction of the original. We would want to be sure that, if we have a short-term access goal, that we do not meet that goal with an irreversible policy that makes it harder to reach a better goal later. Libraries should be thinking of the long term.

    Please note that it is the COL document, not our white paper, that links *discarding* at *all* libraries with digitization (even if digitization is done by a few).

    We did not say that many libraries are considering destructive digitization. Indeed, the latest report from gpo http://www.fdlp.gov/home/repository/doc_download/2286-fdlp-forecast-study-data-report-library-forecast-question-14
    says that 82% of FDLP libraries reported that they do not intend to digitize their collections.

    The COL document makes its extreme policy suggestion without saying why it is necessary to link digitizing at some libraries to discarding at all libraries. (We believe it is not necessary to make this link.)

    Allow us to make another subtle distinction. Focusing on destruction as the only necessary component to digitization (as the COL document does) is the wrong approach to ensuring quality. Although destructive scanning *can* be better than non-destructive scanning, this is only one factor that contributes to the quality. It is in fact easy to argue that there is bad destructive scanning and there is good non-destructive scanning. When COL focuses on this one aspect and neglects to mention all other factors, one should ask why COL gives such pre-immenent rank to this one factor? There are many other factors that contribute to quality scanning and many of those are costly. If quality is not an explicit part of any policy proposal, it makes the digitize-and-discard-will-save-money argument questionable on its face.

    We invite you to think about an alternative: If “access” is all COL wants to accomplish with digitization, then cheap, non-destructive scans can greatly enhance access today. Libraries can do this today without asking anyone’s permission. There are good justifications for engaging in such projects. In fact, many special-collections and archives use non-destructive scanning to create access-only digital copies. But, if something else is desired (and we believe there are good reasons to aim higher than simple access-to-images-of-books), then libraries that have taken on a commitment to preservation of and access to FDLP collections should put more thought into exactly what we want to do, how much it will cost, and how we can achieve our goals.

    BTW, I don’t think google used the scanning “wand” approach. See for example
    http://hardware.slashdot.org/story/09/05/15/1834246/how-googles-high-speed-book-scanner-de-warps-pages

    We are all for discussing these issues. We don’t think we need to threaten destruction of our paper collections in order to prompt discussion, though. We support digital access. We question moving forward on one or two issues (as we believe the COL document suggests) without considering the other issues.

    We believe a wise approach is to focus on what we want to accomplish first and then figure out how we can accomplish it. Even if this takes longer, in the long-run, libraries and users will be better served.

    We believe the future of libraries will come from libraries leading with thoughtful, useful, ideas that will ensure both access and long-term preservation to the information our users need. We see two paths for libraries: In one, libraries act positively to increase the value of the library to their users. We believe that this will lead to increased funding. On the other path, libraries accept bad funding by looking for cheaper and cheaper solutions (or worse, selling off content to commercial services with perpetual access/licensing fees) to living within their decreasing budgets. This will lead to fewer collections, the destruction of collections, reduction of services, and so forth. This is a downward spiral to decreased funding.

    The existence of useful projects does not justify the destruction of paper collections without more thought about all the issues. To give just one example: both ASERL and TRAIL use a producer-focused model of collections instead of a user-focused model. Although this made sense in the paper and ink world, need we tie ourselves to that model in the digital world? What about building collections that focus on communities of interest, disciplines, types of use, and so forth? The communities that libraries serve in the digital world need not be geographically based. They can be international and focused on user-needs both in terms of the content of collections and on usability of digital objects by users.

    We also need more than bibliographic control. We need to know about the quality, accuracy, completeness, and functional usability of digital copies. Knowing only that we have digitization or a paper copy is not useful enough in the digital age — either to the user or the library.

    Thanks for taking the time to share your thoughts. We urge others to really think about the full scope of the issues and address them. We just don’t think this particular COL issue item did a good job of that.

    best,

    James and Jim

  4. I wanted to add one more thing about relying on HathiTrust for access to government documents, specifically with regard to the quality of the digital objects in the HT, please note the following “finding” from the CRL certification of HT:

    One explicit goal described in the HathiTrust mission statement is to “coordinate shared storage strategies among libraries, thus reducing long-term capital and operating costs of libraries associated with the storage and care of print collections.” The repository should put in place and clarify its plan for achieving that goal, as the cost reduction described is a relevant metric of the value of HathiTrust and its services. The new HathiTrust pricing model, to be introduced in 2013, will directly correlate the overlap between the repository corpus and the print holdings of the participating libraries. This will increase pressure for participating libraries to divest of print volumes available through the repository.

    The quality assurance measures for HathiTrust digital content do not yet support this goal. Inspection criteria and standards are in place for materials ingested from the Google Books project, but it is not clear what results when an object fails such inspection. It is also unclear to what level of quality review materials digitized by partner institutions or those made available through entities such as the Internet Archive are subjected. This will be material to library decisions on whether to retain, conserve, or dispose of corresponding physical copies of books represented in the repository.

    Currently, and despite significant efforts to identify and correct systemic problems in digitization, HathiTrust only attests to the integrity of the transferred file, and not to the completeness of the original digitization effort. This may impact institutions’ workflow for print archiving and divestiture.

    http://www.crl.edu/sites/default/files/attachments/pages/CRL%20HathiTrust%202011.pdf

    (Full disclosure: I participated in the CRL audit of HathiTrust.)

    • Thanks for pointing this out Mr J. “…only attests to the integrity of the transferred file, and not to the completeness of the original digitization effort.” This is exactly why all digitization efforts need to use the Digital Surrogate Seal of Approval so that the bits AND the completeness of a digital publication can be assured by users and librarians as they go about their work.

Leave a Reply to James R. Jacobs Cancel reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Archives