Home » Posts tagged 'COL FDLP taskforce'

Tag Archives: COL FDLP taskforce

Our mission

Free Government Information (FGI) is a place for initiating dialogue and building consensus among the various players (libraries, government agencies, non-profit organizations, researchers, journalists, etc.) who have a stake in the preservation of and perpetual free access to government information. FGI promotes free government information through collaboration, education, advocacy and research.

COL’s FDLP Task Force Survey and FGI’s responses

[updated 4pm 1/7/14. I clarified a couple of statements. JRJ]

Last summer, the ALA Committee on Legislation (COL), Federal Depository Library Program (FDLP) Task Force released its FDLP report and recommendations. In response, FGI wrote a white paper “Wait! Don’t Digitize and Discard! A White Paper on ALA COL Discussion Issue #1a”. While agreeing with much of the report’s recommendations (avoid duplicative efforts, digital deposit etc), we took issue with some points and tried to unpack and give context to the problems and assumptions put forth by the task force.

COL asked the FDLP task force to continue for an additional year and their first order of business is an FDLP survey to gather input and feedback to “outline a process for ALA to bring together diverse opinions and to guide the Committee in its future consideration of policies in relationship to the FDLP.” While the survey was sent out to ALA divisions and round tables for official feedback, I think it important that ALL librarians with an interest in government information submit answers to the FDLP survey. Please submit your survey responses by February 14, 2014.

Also, please consider attending the COL meeting at ALA Midwinter on Saturday, January 25 from 10:30-11:30am in the Convention Center, room 107B.

Lastly, in the interest of public discussion, I thought our readers would be interested in seeing the 20 survey questions beforehand and our survey responses. Below are all the questions (starting with #3 as questions 1-2 are demographic) in bold as well as our submitted answers.


A comprehensive preservation plan includes digital documents supplemented with preserved tangible collections with a yet-to-be-determined number of full print collections, in controlled environments and in geographically dispersed locations. — NAPA report “Rebooting the Government Printing Office: Keeping America Informed in the Digital Age”

Identification of Materials – in order to implement a preservation plan, it will be critical to outline a process for identification and processing on the national level. This collaboration will require the broad participation of libraries, commercial and not for profit organizations, agencies, and associations.

3. Do all tangible materials within the FDLP need to be preserved?

YES!

I worry that this question is ambiguous and will result in a variety of answers which could too easily lead to misleading interpretations. To try to get around the ambiguity, let me say that I believe that there is no FDLP information that should be discarded or abandoned.

The community might well want to identify specific copies or editions or versions or formats of any given specific information content that need no longer be preserved because the information content is being approrpriately preserved. But, to do that, we need an accurate accounting of what information content exists, how many copies we have, the physical state of such materials, etc.

I would suggest that format (“tangible” or other) is not a useful criterion for selecting materials for preservation or discard. I would also suggest that information that exists only in paper copy should be preserved and, further, that every such work (edition?) should be preserved.

4. Realizing that not everything can be preserved immediately, what should be the process for determining the priority plan?

Is this question about “digitization” for preservation? Paper collections are being preserved now under long-time FDLP rules and procedures. If we’re talking about actual preservation, born digital should have priority.

I think this question may be confusing and conflating “digitization” with “preservation” and “historic” with “born-digital.”

Digitization of historic/paper publications, while providing better access, does not necessarily contribute to their preservation and digitization for preservation does not necessarily guarantee better (or any) access. For example, many publications that are scanned — e.g., those going through the google books project — are disbinded and destroyed without any guarantee that the digitization process has necessarily accurately or completely preserved the original content. Further, “digitization” encompasses a wide range of activities and digitizations may or may not meet quality standards for long-term preservation and use.

Also, many born-digital federal documents are arguably *more* in need of preservation action than paper documents — since paper documents are ostensibly already being preserved via the FDLP. The reason for this is that there is a single steward (such as a government agency) that has sole responsibility for preservation and access of those files and the files are therefore at risk of intentional or unintentional, poltical or bureaucratic or financial decisions that will lose or alter or discard that information.

I believe that any preservation plan and prioritization policy needs to have in place a process and framework for preserving an adequate number of physical paper copies as well as a process and framework for collecting, describing and preserving digitized paper and born-digital publications.

As for prioritizing what to digitize (in a NON-destructive manner!), I would give higher priority to those publications that have NOT been widely distributed or have not received any or adequate cataloging.

I would suggest that we should develop a number of criteria for determining priorities, not just a single criterion. For example, we might want to identify different categories of paper materials for digitization: some (plain text in a uniform format on good paper with clear print) are easier (and cheaper) to digitize more accurately, some (statistial publications, odd format publications, color and illustrated publications, older publications with thin paper/bleed-through print/etc., publications with non-uniform layout and fonts, etc.) are harder (and more expensive) to digitize accurately and completely. Setting priorities for diitization could take such a categorization into account along with other factors (condition of the paper copy, completeness and acuracy of existing metadata, known number of complete copies in paper and known number of complete paper copies needed for preservation-of-the-content (as opposed to simple access to the content), known number of paper copies needed for access, decisions about intent of digitizations (access, preservation, image-only or image+text, or image+text+reformatting (e.g. xml, tei, epub…?)…

Please read the following for more on these issues:

5. Who should be involved in preserving FDLP materials? GPO, FDLP libraries, commercial, and/or not for profit organizations?

GPO and FDLP libraries — with assistance from non-profits like the Internet Archive (which actually has official status as a library!) — should be the primary actors in any plan for preserving public domain government publications. Any digitization plan MUST include an agreement that digitizations will be freely and publicly available without subscription or fee and in DRM-free formats.

While commercial outfits may have experience in this area and may be consulted, I would strongly recommend against including commercial entities in any long-term preservation plan. Not only are commercial entities disinclined to do anything that does not support their bottom line, it goes against the spirit of the FDLP for public domain materials to be taken out of the public domain and only made available in subscription databases and/or commercial products. There is already ample evidence of private companies contracting with federal agencies to digitize content that is then privatized and taken out of the public domain to the detriment of the public who owns that information. See:

Preservation Methods – the processes for digitization and preservation are varied and some digitization is not necessarily preservation. There are a variety of projects that may contribute to a national preservation plan. The Task Force report affirms that there should be multiple locations and geographical distribution.

6. Is digitization a preservation standard or does it serve as a discovery/access resource or both?

“Digitization” is not a standard but a generic term that encompasses many different processes and procedures and can result in many different results of varying quality and suitability for different purposes. Even poor quality digitizations can be preserved, but that does not qualify them as “digital preservation” of the original information. “Digitization” is also only one step (the first step: creation) of a number of steps that would need to be taken (ingest, storage, data management, preservation and preservation planning, discovery/access/delivery, and service) in order to either provide access to or preservation of the digital objects created by any given digitization process. “Digitization” is, therefore not a useful concept on its own to address preservation or discovery or access.

To better address this question one needs to specify how an item will be digitized and for what purposes it will be digitized and develop an evaluation of the process to be used to determine if the output of the digitization meets the requirements.

Typically, today, most digitization projects (particularly large-scale projects) aim to provide access (not preservation) to page-images of the original books. Those scans — e.g., google books — offer pretty good (but not great) discovery (try and find a specific volume of any digitized serial in GBP and you’ll see what I mean about “pretty good” discovery), but many of the scans are of poor quality, with inaccurate or no OCR, blurred images and missing pages etc.

As we look to digitization as a process, we should evaluate what we want from that process and develop projects that match those goals. We should develop more projects that aim higher than simple access to digital images that are no better than (and, in some cases, not as good as) the original books. We should develop projects that would envision the possibilities of digital information, not just pictures of static information. This would include digitization that would enable reformatting the content for current and future devices and uses. All digital objects are born-digital objects. Perhaps most importantly, treating digitization of paper as no more than a surrogate for the original paper with no more functionality than the original is short-sighted at best and destructive at worst.

As noted earlier, there are various and varied levels of digitization. Even publications which are digitized to the highest current *digitization* standards should not necessarily be relied on as a copy of last resort. Without adequate attention to the unique qualities of individual documents, the digitizaiton may not fit the needs of all users and some users will continue to need access to paper publications. For example, the images of large size and color documents and documents with maps, tabular data and inserts may be less-usable than their paper originals.

7. Should there be different standards for copies – more rigorous for congressional materials and less for pamphlets for example?

NO. decisions about use of standards should not be made based on format of the original. Such a choice would imply that all pamphlets are less important to everyone forever than any congressional bound volumes, for example. this would be making an unwarranted judgement on the quality of content based on format and an unfounded judgement on the value of the content to unspecified users of the future.

Digitizations of paper should be undertaken to address needs of user communities of the future as well as the present. Short-term cost savings should not drive library decisions if it impedes long-term access, preservation, or usability of the information content. It is reasonable to assume that any publication that is worth scanning should be scanned at the highest quality standards that will lessen the likelihood of its needing to be re-scanned as digitization technologies continue to get better and user needs evolve.

8. What is the role of Regional FDLP libraries in preservation centers?

Regional FDLP libraries should be seen as the first best option in any preservation plan. Geographic distribution of both paper and digitized/born-digital publications will continue to be a necessary part of any plan going forward and regionals are best equipped to offer those services since they’re already set up and working. ALL regional libraries should be required to participate in or designate one library in their region to participate in LOCKSS-USDOCS as part of their depository responsibilities. This would fall under the current FDLP shared housing agreement concept.

Trusted Partners – the FDLP has a partnership program and the Task Force report notes that partnerships could be a critical component of a national preservation plan.

9. What are the qualifications of a trusted partner?

A trusted partner, at least in terms of digital preservation, is one that is built on OAIS principles, has a succession plan in place, and is not driven primarily by the profit motive. Consortia and other library-centric organizations should be seen as trusted partners as long as preservation AND free public access are inherent parts of their missions.

10. Can commercial and not for profit entities be considered a trusted partner?

Commercial entities will probably disqualify themselves as trusted partners if adqueate definitions of responsibility are in place; trusted partners should provide long-term, free public access and have a succession plan in place to describe what happens if they ever choose to break the partnership. So, in general, commerical entities will probably NOT be relied on as trusted partners as their missions are, by law, motivated first by profit rather than by public access, public service or information preservation. Non-profit entities can be trusted partners as long as preservation and public/free access are inherent parts of their missions.

11. What current initiatives exist that can contribute to partnerships? (LOCKSS and other initiatives).

FDLP libraries themselves, LOCKSS-USDOCS, Internet Archive, ASERL’s COE libraries, library consortia…

Registry and Identification – the FDLP has initiated a registry for digitization (http://registry.fdlp.gov) and this might be the basis for a preservation plan. The Task Force notes that cataloging tangible and online materials is still a critical component for any national efforts in discovering and accessing FDLP and other government information.

12. How should individual cataloging efforts be coordinated?

via GPO and the catalog of government publications (CGP).

13. How should commercial entities be incorporated with library efforts?

Commercial entities provide a useful and welcome *complement* to free public access entities — but they should never be seen as a substitute or replacement for free public preservation, access, and service.

Commercial entities should be encouraged to donate metadata toward the national registry and/or digitization projects. They should also be encouraged to deposit their content in collaborative archival services like LOCKSS-USDOCS for safekeeping.

Any national catalog (OCLC, CGP) which has links to publications in subscription services (e.g., Proquest Congressional database) should also include links to freely available digital copies.

14. What additional cataloging/identification projects exist that might contribute to a national effort?

Hathitrust registry of US federal government publications, ASERL collections of excellence, Internet Archive digitization efforts (be aware that IA has a complete set of historic Congressional publications (serial set, Congressional record, hearings etc) garnered from the N&O list a few years ago. They are just waiting for funding to digitize).

Broadening Expertise – in a distributed, electronic world of information, FDLP libraries are able to assist non-FDLP libraries and FDLP resources are more integrated with commercial information resources. The Task Force considers this an opportunity and challenge that will impact librarians and library workers regardless of type of library.

15. What are the professional development needs for librarians and library workers who may utilize FDLP information?

This growing idea that “all librarians are now documents librarians” really bothers me. It makes for a good bumper sticker, but are “all librarians engineering librarians”? Government information is a very specific area within LIS — which happens to touch on many subjects and disciplines — with specific and iteratively-built skill sets and knowledge base. Having a government information librarian on staff is critical to a library’s success in serving it’s community. Just as we shouldn’t expect all librarians to have in-depth knowledge of every subject and discipline, and shouldn’t expect every librarian to be a cataloger, and we shouldn’t expect all librarians to have in-depth knowledge of the workings of government and its information resources. At the same time, I have heard that a disturbingly large number of LIS programs are deprecating if not completely doing away with their government information curricula.

With that in mind, the documents community should first survey LIS programs to see what’s being taught, what are the requirements, what are the % of students taking government information courses, and whether or not LIS programs are *using* government information in their classes (no copyright!) for digitization, digital and physical preservation, indexing/discovery, text-mining, etc?

The documents community — as well as ALA as the accrediting organization! — then needs to create a model curriculum and require that ALL MSLIS programs have courses on government information to provide all librarians with basic familiarity about the FDLP program itself, FDSys and the FDLP core collection as well as basic knowledge of government information resources and collections at all levels of government at a minimum.

Within the documents community, there needs to be continuing education opportunities — that are open to ALL librarians — but for librarians to expand their govt information expertise and broaden their technological skill sets so that they’ll have at least a basic understanding of digitization, digital collection development, and other technologies to help them do their work in serving their communities. There needs to be more of “accidental government information librarian” webinars, but also in-person workshops similar to ICPSR’s 5-day workshops on data services.

Library administrations also need to be more supportive of the need for govt information librarians to travel to conferences (GODORT, DLC, etc) as that is where we learn from our colleagues and move the entire field forward.

16. How can expertise be spread to all librarians beyond FDLP designated librarians?

Every FDLP library ought to:

–reach out and make contact with other FDLP- and non-FDLP libraries in their area.
–Arrange viewings of GPO trainings for libraries in their city. There have been some very good ones.
–Consider site visits or virtual office hours for library staff at their institutions and around their cities.

Depositories ought to blog their reference questions. GIO chat service (http://govtinfo.org) should do that as well. This “seeds the cloud” and allows librarians and the public to more easily find government information resources.

Documents librarians should put in proposals to their state conferences.

Local GODORT chapters ought to consider emulating North Carolina’s Accidental Docs Librarian webinar series.

Government information librarians should have ongoing workshops within their own libraries.

17. How can core competencies related to government information be developed for all librarians?

See 15 and 16.

I believe GODORT is already working on core competencies within the GODORT Education Committee. The 21st Century Government Information initiative on WebJunction might be another place to look for core competencies.

The American Library Association has a vested interest in the development of skills, services, and advancement of the FDLP program. ALA’s role is to assist and support librarians and library workers who work with government information. ALA’s expertise contributes to national discussions and government policies and ALA can provide assistance in bringing together a variety of partners to advance a common goal.

18. How can ALA assist in the development of an FDLP preservation plan?

Adopt the Digital Surrogate Seal of Approval (DSSOA) and encourage individual libraries to do the same for their digitization projects. ALA can facilitate a national discussion and inventory of government documents. ALA can also advocate for a “government information librarian in every library” as a way to further the goals of a preservation plan as well as ongoing collection development and public service to library communities of all shapes and sizes.

19. How can ALA work with other association and entities to advance an FDLP preservation plan?

Lobby Congress for appropriate levels of funding and against long-term privatization of digital access. No cost should be for taxpayers, NOT agencies. Reach out to other organizations on the need for an FDLP preservation plan. Argue strongly for the continuing need for both local collections AND government information librarians in every library. Also advocate inclusion of historic government publications in consortial shared storage projects like the Western Regional Storage Trust (WEST).

20. What future actions should ALA pursue to advance an FDLP national preservation plan?

See #19. Sponsor (or help produce sponsorable projects) to investigate the digitization challenges of a heterogeneous collection such as FDLP’s and investigate the preservation, access, and usability requirements for the long term for such collections.

Accept that digitization is probably not the best approach for PRESERVING tangible documents. Consider microfilming or geographically dispersed high density storage facilities of last resort. ALA should prioritize preservation measures for born digital materials which seem to be decaying quickly through link rot.

Wait! Don’t Digitize and Discard! A White Paper on ALA COL Discussion Issue #1a

[UPDATE 6/27: COL’s final report is now available online. We’ve added a link to it below along with the draft report]

We here at FGI are all for greater access to government information and have long supported and worked toward a fully digital FDLP. When discussing the future of the FDLP, we believe it is important to create policy based on thorough fact-based analysis, to learn from FDLP history and not repeat mistakes which in the past led to benign neglect of documents collections — many of which were borne out of trying to handle government documents collections on the cheap. For example, lack of adequate cataloging, one of our biggest current problems today, is a direct result of libraries not investing sufficiently in describing FDLP collections.

With this in mind, we have been tracking closely on the work of the ALA Committee on Legislation’s (COL) FDLP Task Force, which was created at the request of COL to “provide their perspectives on options for the future of the FDLP.” The COL FDLP Task Force recently released a collated draft FDLP discussion document (6/27 the FINAL report is also now available). This document will be the main item on the Task Force’s agenda next week at the ALA Annual Conference in Chicago. There is much of interest in the COL Discussion document, and much that is non-controversial. This includes:

  • avoiding duplicative efforts by using the FDLP registry to coordinate digitization efforts.
  • GPO coordinating and facilitating digitization projects (see point 1).
  • Authentication of government publications.
  • Digital deposit. (FGI has long been in favor of digital deposit and we have worked very hard to move forward on digital distribution, including getting the LOCKSS-USDOCS effort off the ground).
  • ALA accreditation including training about govt information (we hope this will include training in services AND collections).

However, we take issue with COL’s issue #1a: “Should libraries be allowed to de-accession and destroy these collections for the greater good of broader on-line access?” The short answer to this question is an emphatic NO. But in order to unpack the issues further and add context and facts to the Task Force’s discussion next week, we’ve written a White Paper, “Wait! Don’t Digitize and Discard! A White Paper on ALA COL Discussion Issue #1a” (PDF attached below).

    download the whitepaper in the format of your choice!

  • PDF
  • mobi (for Kindle)
  • epub (for Kobo and other ereaders)

We look forward to a lively and interesting discussion in Chicago next week!!




Wait! Don’t Digitize and Discard!

A White Paper on ALA COL Discussion Issue #1a
June 2013

By James A. Jacobs and James R. Jacobs

“Many years ago GPO turned over its historical collection to the National Archives and almost immediately we began to regret the absence of a tangible collection.” (Russell, 2003)

The ALA Committee on Legislation (COL) “Discussion Document” (Draft 6-17-13) asks as part of “Issue #1” concerning the digitization of FDLP collections if libraries should be allowed to “de-accession and destroy” collections for the “greater good of broader on-line access.

While there is much in the Discussion Document that is generally agreed upon in the FDLP community, Issue #1 is extremely problematic as it poses a false choice that unjustifiably equates discarding paper with digitization and better access.

The following provides background details and context that the COL Discussion Document lacks. We as a community need to have all the facts in order to discuss and decide on the fate and future of FDLP collections.

A False Choice. 

The COL question implies that online access can only be achieved if libraries are allowed to de-accession and destroy paper collections. This is not true. There is no technical, legal, policy, or procedural requirement or need to destroy paper collections in order to digitize them and provide online access to them. It is not too extreme to say that it is misleading to phrase the question by implying that libraries can only digitize if they are willing to discard paper. If the greater good, online access, and supporting users are indeed the goals, no permission is needed to proceed.

Any library or group of libraries can digitize its FDLP print collections today and provide online access to those digitizations without discussing it with ALA and without asking for permission from GPO. This is “allowed” today.

FDLP libraries that wish to digitize and provide online access to their historical collections can and should move forward on that goal.

Why Link Digitization and Discarding?

Why do both the COL question and recommendation #1a link “destruction” of collections with “digitization” of those collections? It is troubling that COL connects these two very different activities without explanation or justification. This is however, not a new idea. In fact this is a rehash of old ideas that have been considered and discarded before (Housewright, Jacobs).

A variety of reports over the last few years have used three excuses to link these two actions.

  1. Costs. This argument appears in a number of variations: Paper collections are too costly to maintain; space is at a premium and libraries must remove books to repurpose space; providing digital access is cheaper than providing paper access. This is such a big topic that we examine it separately below, but, briefly: Those who say they cannot digitize unless they are allowed to discard paper collections should explicitly explain what if any cost savings digitization will bring about and how those savings will be spent. If the cost savings are not spelled out, this argument should be treated, at best, as a starting point for a long discussion of costs rather than an ending point for policy making.
  2. Technology. Some digitization processes — including “Google Books” — destroy the original. This is sometimes done because the original item is tightly bound and the binding must be cut off to accurately scan individual pages. In other cases, digitization projects find it cheaper to unbind books than to use non-destructive methods of scanning. Even when destructive scanning is used, there is no reason to create an expansive policy that allows all libraries to destroy all copies of a book even if one library destroys one copy of that book during digitization.
  3. Law. Because FDLP libraries must follow FDLP procedures that regulate what can be discarded and by whom, some argue that they won’t digitize unless they are free to discard. There is nothing in Title 44 that prohibits any library that wants to provide online access to its paper FDLP collection from digitizing all or part of that collection. So, this argument does not provide a justification; it just asserts that libraries want to discard and it relies on one of the above two arguments as an actual justification. This rhetoric is, nevertheless, sometimes used to imply that digitizing and discarding are inseparable. They are not, and this is a false justification.

In short, the question in the COL document offers a false choice between digitizing and discarding on the one hand, and digitizing and not discarding on the other. The question as posed misleads; any serious consideration of it must be based on a clearer understanding of the issues that are glossed over in the brief text that accompanies COL Issue #1a.

Access to Paper Copies Is Important.

The COL document apparently suggests that, once FDLP collections are digitized, libraries should be able to rely on the digital copies for access and thus be allowed to destroy their paper collections. This model ignores the need for paper access copies. A better approach to collection management and digitization would first study the advantages of and needs for access to paper copies rather than assume that they will be unnecessary or unwanted after digitization. That libraries should have paper copies for access is not a controversial idea. Even John Burger, the Executive Director of the Association of Southeastern Research Libraries, who promotes such digitize-and-discard projects, says libraries need to retain an adequate number of paper copies for direct user examination. But the COL document apparently ignores such ideas.

COL implies that the number of paper copies in the world can be reduced to a small number for preservation and the rest can safely be de-accessioned and destroyed. This model of treating paper copies as emergency preservation copies-of-last-resort in a vault is not the only available model, nor is it the best. An alternative model suggests that we should consider the need for users to have access copies of paper documents and that we should keep an adequate number of working, usable, loanable copies geographically near their users. In addition, FDLP libraries should strive to bibliographically connect digital copies with paper copies. “Every document its reader,” and “every format its use” to paraphrase Ranganathan’s 5 laws of library science.

These are not abstract or Luddite ideas; they come out of a common sense approach to collection management and digitization. There are at least two reasons for them. First, until and unless we can guarantee that our digital copies are one hundred percent accurate and complete, paper copies will continue to be needed by users. Second, some FDLP items will be difficult to deliver digitally and the paper copy will continue to be easier to use.

This may vary over time as there are changes in the user-computing environments that are widely-available and that are preferred by users. But current digitization practices do not even match the capabilities and limitations of some of today’s popular technologies. For example, some items — such as large format books with high text density, color plates, maps, etc., 1000- page documents with tabular data, or the Congressional Record(!) — are not easily usable on small-screen, colorless, e-book devices.

Taken together, these ideas mean that some users will require and prefer the original paper format over the digital format. That means that libraries will need to provide access copies as well as preservation copies. Access copies are more subject to loss and damage than copies saved for preservation-only. That means we should also ensure we have enough copies to provide replacement access-copies for the long-term.

Before libraries consider destroying their paper collections, they should consider access as well as preservation as a reason to retain paper copies. This also raises another issue that we address below: How do we determine what “an adequate number” is?

Access Is Not Preservation.

The COL document provides a vague and confusing view of access and preservation. It says that “digitization can assist in preservation, but is not, itself, a preservation format” and it recommends having “enough” paper copies as determined (apparently) by a preservation plan. It therefore appears to be suggesting that libraries will rely on digital copies for access and a few paper copies as copies-of-last-resort for preservation. The document does not specifically advocate the preservation of digital copies or digital copies as a preservation format. Nevertheless, if libraries are to rely on digital copies for access, someone must assure that those digital copies are preserved or we will lose long-term access.

This creates two problems for the COL discussion point and recommendation. First, if COL is suggesting that libraries rely on paper copies as the only preservation format, then it should not recommend “allowing” the destruction of paper until we know how many paper copies we need to achieve the paper-as-preservation-format goal. It would only be logical to propose a policy of paper destruction if a “comprehensive preservation plan” eventually determines that the world needs fewer paper copies than we have. To propose such a policy without such a determination is both premature and reckless.

Second, creating digital copies solely for access is a different process than creating preservable digital copies, which takes more care and more expense. If COL does indeed intend these to be preservable long-term digital copies, it should be more explicit about that. Proposing digitizing for access when digitization for preservation is needed would be shortsighted and foolhardy and would confuse the cost issue.

Access is not preservation and digital access (the stated purpose of digitization in the COL document) is not the same as digital preservation. Even the word digitization is a vague term that can mean many things (FAGDI). Digitization does not magically preserve the original information. In fact, the information captured in many digitizations of books is often either incomplete or damaged or both. (One of many examples of this is shown in a recent FGI post about a Department of Commerce publication “Commercial Handbook of China” [http://freegovinfo.info/node/3960].)

Even the narrow function of “access” requires more of a commitment to standards than the COL document provides. This is necessary in order to ensure that the information in the original is not corrupted or lost during digitization. The OAIS digital preservation standard has a specific set of criteria for migrating information from one format to another. This involves identifying the “Transformational Information Properties” of the original Content Information. (Consultative Committee for Space Data Systems 2012). The standard for Trusted Digital Repositories (TDR) requires that such repositories verify ingested content for completeness and accuracy (Consultative Committee for Space Data Systems 2011). Proposing the destruction of the original information packages (the books) without also proposing the application of such standards would almost certainly result in the permanent loss of information.

Proposing destruction of paper copies to achieve digital access is simply a bad idea unless the concepts outlined here are also addressed. Assuming that a few paper copies will be “enough” for preservation is questionable. Advocating a policy that encourages destruction of paper collections before addressing the issues of preservation and access is premature.

How many copies?

The COL document implies that it will be acceptable to discard our paper FDLP collections when we have a comprehensive preservation plan and “enough tangible copies.” While it is true that the FDLP community does not yet have a comprehensive preservation plan, there is no evidence that warrants COL’s apparent prediction that we will need fewer paper copies in the future.

Until we know how many paper copies are needed to ensure long-term preservation and access, it is unwise to propose policies that will have the effect of destroying paper collections. The existing studies that address this issue (e.g., Schonfeld, Schottlaender, Yano) do not provide adequate information to apply them to our FDLP paper collections. Those studies mostly focus on substituting digital surrogates for paper journal articles, which are a relatively homogeneous body of literature about which we can make generalizations. We do not have a study that examines the accuracy of digitizing and preserving a very heterogeneous body of literature such as government publications that vary widely in age, format, size, original paper quality, and many of which have brittle and yellowed paper and contain much information that is difficult to accurately digitize (e.g., tables of statistical information, charts, graphs, photographs, drawings, foldout maps). It will, in fact, be difficult to generalize about such a heterogeneous body of literature.

As noted above, if we choose to ignore the need for paper copies for access as well as for preservation (as the COL document apparently does), we may destroy copies that we later need for access. This would be unwise, to say the least.

A rational digitization process must consider how decisions we make today may limit or expand our ability to deliver content both today and in the future and consider the potential need to re-digitize in the future. Re-digitization may be desirable as technologies for digitization improve and as the technologies for digital delivery and digital use and re-use evolve. Future digitization technologies may require destructive digitization. To be able to meet future needs we will need to keep enough paper copy originals for more than one re-digitization.

In short, there are many reasons to keep paper copies and too many unresolved research questions to suggest that we know enough to destroy our paper collections.

Quality of Digitization.

Before libraries consider discarding and destroying paper collections they should address the issues surrounding the quality of digitizations. Although there are many standards for digital production, we need to also consider user-based standards to ensure that the production standards we choose will produce digital objects that meet the needs of users. There are many ways to digitize and they result in different products with different utility. Before we discuss destroying paper collections based on unspecified promises of unspecified digitization processes we should first consider two specific user-oriented issues related to the quality of any digitizations we might wish to rely on.

First, we must be sure that any page-image digitizations that we wish to rely on are accurate and complete (Jacobs and Jacobs). As noted above, the physical and content characteristics of government publications make them difficult to digitize adequately at a reasonable cost (GPO, 2004).

Second, we must be sure that text-extraction from digital page-images is accurate and complete and meets the use requirements and expectations of increasingly sophisticated users of digital content. Our experience so far with digitization provides ample evidence of widespread incomplete and inaccurate Optical Character Recognition (OCR). Currently, there is no standard for expressing the accuracy and completeness of OCR.

Allowing Destruction or Breaking a Commitment?

As noted above, COL question #1a uses the word “allow” and the recommendation only implies that destruction will occur. Neither the question nor the recommendation requires destruction. Those who support this proposition will undoubtedly argue that it will provide all libraries with more flexibility and will not require any library to discard anything. Indeed, some libraries have repeatedly expressed the desire for the flexibility to substitute digital copies for paper copies. There is, however, a serious problem with such an argument.

Every FDLP library has made a commitment to the government and to the American People to provide access to FDLP information (Hoduski). The commitment to access, combined with the administration of the program, has also successfully preserved this information. The FDLP program has a demonstrated history of successful long-term preservation and access.

Before suggesting that libraries renege on their existing, successful commitment, they should present a reasonable substitute plan that assures long-term preservation and access. COL provides neither. We have no comprehensive preservation plan so it is premature to suggest that we can assure long-term digital preservation. And, COL provides no substitute plan for maintaining (much less improving) access. Although we agree that it is possible to enhance access (and service) to paper collections through digitization, and possible to provide long-term digital preservation, it would be irresponsible to destroy our ability to provide access to paper without a specific plan that creates a new commitment to levels of access and service.

Replacing an existing commitment with no commitment is simply unacceptable.

Service.

When libraries consider digitization, they should consider service for those collections as well as access to those collections. The COL Issue #1a does not mention service. This is a significant omission. Arguably, digital collections need as much attention to service — if not more — as paper collections do. The OAIS model provides a minimal service model, but even it specifies that digital collections need collection management, preservation planning, and services for discovery and delivery of content. A more complete and realistic model for service provision for a digital collection will include the provision of user-friendly interfaces, APIs, and discovery tools. A more library-oriented model will recognize the need for dedicated staff with collection experience and knowledge who can provide interactive services and respond to user feedback in order to develop new tools over time. Collections without services or with inadequate services should not be an option for libraries seeking to provide value to their user communities. This again brings us to the question of costs.

Cost.

As noted above, the argument for the destruction of paper collections often boils down to economics. Libraries will argue that they cannot afford to provide preservation and access to both digital and paper collections. Indeed, providing adequate services, access, and delivery of digital information can be expensive when done well. But it is disingenuous to claim that libraries have to digitize and discard because of costs if one does not also account for the full costs of providing the new service. It is misleading to claim cost savings without showing which costs will be saved and how much will be saved, and without specifying how the savings will be used. To propose an irreversible policy (destroying collections) before providing such details is irresponsible.

Although we have seen proposals that casually suggest that digitization won’t cost more than twelve cents per page, such suggestions are at best over-simplified and misleading. At worst, they are grossly inaccurate by orders of magnitude.

Although cost-tradeoffs can be complex and can vary from library to library, it is easy to grasp the essential issues involved. No digitization policy should be based on vague promises of low costs or cost savings. Any claims of cost savings that do not account for all the costs that a digitization project will incur are incomplete. Any proposal that does not describe the effects of the policy on collections and services is incomplete. Incomplete proposals should not be used as a basis for policy.

  • Purpose, Functionality, and Intended Use of Digitization.

    Before planning a digitization project, it is vital that the project specify the intended purposes of the digitizations. As noted above, the COL document does not do so. There are different costs associated with different uses. For example, a higher quality of digitization is needed if they are meant to replace rather than supplement paper copies. Accounting for intended use and user needs should also include the entire life-cycle of the digitizations. In planning for use, it is necessary to anticipate and account for changing user needs and expectations and changing computing environments. Projects that fail to accurately account for changing computer capabilities can cause problems or additional costs later in the life of the information (Marks).

  • The Costs of Digitization.

    The costs of digitizing have been demonstrated to be anywhere from twenty-two cents per page (University of Michigan) to more than eight dollars per page (Nichols). One study of digitizing statistical tables (of which there are many in government publications) demonstrated a cost as high as $3.55 per table. The reason for these variations is that there is no one, single, standard procedure that everyone uses and that has a known, fixed cost. Every digitization project must make many decisions and each decision affects the quality and utility of the resulting digital object. Projects must choose procedures and methods and hardware and software and how to chain them together. Even with all these choices made, there are choices of standards and choices of outputs. Every one of those decisions is associated with a cost. Some of these costs are not within the control of the digitizing project. For example, the cost of digitization varies with the quality, size, and nature of the original material being digitized. It is unlikely that a single cost-per-page estimate can accurately account for the digitization requirements of a large body of heterogenous literature like the FDLP collections. No digitization project should be undertaken without a realistic specification of the costs and a tested specification of the quality of output based on the target collection to ensure that it matches the intended purposes of the digitizations.

  • The Cost of the Life-Cycle of information.

    The cost of digitization is only the first of many costs. Studies have shown that the cost of digitization is only about one third of the life cycle cost (University of Michigan, UNESCO). In addition to the creation of digital objects a project must account for the costs of keeping and providing access to the digitizations. Some of these costs may be more expensive for digital collections than for paper collections. One study demonstrated that digital books are 208 times more expensive to store than printed books (Chapman). The OAIS model for preservation specifies six such functional activities: Ingest, Storage, Management, Administration, Preservation Planning, and Access. In addition to these, a library should also include a Service Function. The “cost of digitization” should factor in all these functions, not just initial creation of digital objects.

  • Long-term Costs.

    In addition to accounting for the costs of the life-cycle of information (ingest, preservation and management, discovery and delivery), it is also necessary to account for the costs of the life span of the information. Projects that intend to provide long-term access must account for the ongoing functional costs for the life of the information. This may be longer than the life span of any individual library or archive.

  • Use of Cost Savings.

    If a policy can demonstrate cost-savings, it should also specify how those cost-savings will be used. If, for example, there is a demonstrable cost savings from the destruction of FDLP collections, will the cost-saving be earmarked for digital FDLP collections and services, or will they be redirected to other collections and other services? It is also important to note that cost-avoidance may not produce any funds for replacing lost collections or services. Studies that examine and promote the cost-avoidance and cost savings accrued when libraries reduce their physical inventory recognize that cost-avoidance does not mean that cost-savings “would actually be available for redirection in support of other operations” (Malpas).

  • Non-monetary Costs: Measuring the Value of the Library.

    Most if not all FDLP libraries are non-profit organizations. They measure their worth in their value to their users. It is, therefore, essential that individual libraries include how a policy will affect their value to their users. Some of the literature that examines and advocates “consolidation” of paper collections (i.e., discarding copies) and managing a centralized digitized collection explicitly promotes outsourcing of collections and services. Malpas, for example, assumes that the cost savings would come from outsourcing the management of digitized books. Libraries should determine if policies such as the COL proposal would shift the value that users get away from the individual library and toward the outsourced services. In the long run, this will reduce the role of the individual library to that of a business office that buys services from the outsourcing vendors.

  • The value of an individual library is increased when it selects and controls its collection and provides services and collections to its designated communities. If a library reduces its monetary costs but in so doing loses its ability to select and control a collection designed for its user community, it pays a non-monetary cost and loses value to its users.

    Librarians that are considering digitizing and discarding their collections should examine what the role of their library will be in the resulting future. Will their library have control over what is in the new digitized library or will some other agency or company have the control over what is added and what is discarded? Will their library have control over the selection of information provided to their users, or will they have to present a large, monolithic collection made up of “everything” contributed by many libraries? Will they have control over discovery tools and user interface, or will those too be controlled by others? Will they have control over the digital objects delivered to their users and the usability of those objects, or will someone else decide what is an acceptable level of usability? Will they be able to integrate their new digitized collections with other digital collections, or will they create another walled-off silo of information? Will they have control over the APIs to their collections, or will someone else provide a generic API?

    In short, who will control what users get? Will they be substituting a monolithic one-size-fits-all library for a library designed for designated communities? As we think nationally, will it be good to provide the same collection and the same tools to undergraduates and graduate students and K-12? Will lawyers and physicians find the same collections and the same tools for access equally user-friendly and effective? Will those who want to find the current population of their city and those who want to analyze demographic data over time both be satisfied with a single collection with a single user-interface?

Conclusion.

The digitization of FDLP historical collection promises many things: better discoverability, enhanced usability, better access, and more. We believe strongly that the digitization of the Public’s historic collections should and will happen. We also believe that it is important to move into this new future by thoughtfully planning to deliver those promises. To propose that we must destroy our paper collections in order to digitize them without a clear plan to meet our commitments for long-term preservation, access, and service is unjustified. To insist this is the only path to digitization is misleading and irresponsible.

End notes

Burger, John. “ASERL response to LJ Op-Ed, “The Future of the FDLP: From Conversation to Confrontation.” ASERL-Selectives Mailing List. (Dec. 16, 2011).

Chapman, Stephen. 2006. “Counting the Costs of Digital Preservation: Is Repository Storage Affordable?” Journal of Digital Information 4(2).

Consultative Committee for Space Data Systems. 2011. Recommended Practicepreservation, Issue 1 Audit and Certification of Trustworthy Digital Repositories. Magenta Book. Washington, D.C.: Consultative Committee for Space Data Systems.

Consultative Committee for Space Data Systems. 2012. Reference Model for an Open Archival Information System (OAIS). Magenta Book, issue 2. Washington, D.C.: Consultative Committee for Space Data Systems.

FADGI. Still Image Working Group. A Resource List for Standards Related to Digital Imaging of Print, Graphic, and Pictorial Materials. Federal Agencies Digitization Guidelines Initiative. January 28, 2010.

Hoduski, Bernadine Abbott. “Who Is Protecting the People’s Property?” SRRT Newsletter, Issue 178, (March 2012).

Housewright, Ross and Roger C. Schonfeld. Modeling a Sustainable Future for the United States Federal Depository Library Program’s Network of Libraries in the 21st Century: Final Report of Ithaka S+R to the Government Printing Office, Ithaka S+R, (May 16, 2011).

Jacobs, James R. “Public comments and response to Ithaka S+R Models draft report,” FreeGovInfo (March 1, 2011).

Jacobs, James A., and James R. Jacobs. 2013. “The Digital-Surrogate Seal of Approval: a Consumer-oriented Standard.” D-Lib Magazine 19(3/4). (March 15, 2013).

Marks, Joseph. “National Archives’ first Wikipedian in residence to bring more holdings to the public” NextGov (07/11/2011).

Nichols, Stephen G, and Abby Smith. 2001. Appendix VI: “Comparative Costs for Book Treatments.” in, The Evidence in Hand: Report of the Task Force on the Artifact in Library Collections. Washington, D.C.: Council on Library and Information Resources. publication.

Russell, Judith. Remarks by Judy Russell, 142nd ARL Membership Meeting, 142nd ARL Membership Meeting, Federal Relations Luncheon (May 15, 2003).

Schonfeld, Roger C., and Ross Housewright. 2009. What to Withdraw: Print Collections Management in the Wake of Digitization. Ithaka S+R.

Schottlaender, Brian E.C., Gary S. Lawrence, Cecily Johns, Claire Le Donne, and Laura Fosbender. 2004. “Collection Management Strategies In A Digital Environment, “A Project Of The Collection Management Initiative Of The University Of California Libraries, Final Report to the Andrew W. Mellon Foundation. University of California, Office of the President, Office of Systemwide Library Planning.

U.S. Government Printing Office. Report on the Meeting of Experts on Digital Preservation: Metadata Specifications, Washington, D.C.: U.S. Government Printing Office (14 June 2004).

University of Michigan Digital Library Services. “Assessing the costs of conversion : Making of America IV : The American voice 1850-1876.” (2001).

UNESCO. “Memory of the World: Documenting against collective amnesia.” In Focus. (2012)

Yano, Candace Arai, Z.J. Max Shen, and Stephen Chan. 2008. Optimizing the Number of Copies for Print Preservation of Research Journals. Berkeley, CA: University of California Berkeley, Industrial Engineering & Operations Research.

Authors

James A. Jacobs is Librarian Emeritus, University of California San Diego. He has more than 20 years experience working with digital information, digital services, and digital library collections. He is a technical consultant and advisor to the Center for Research Libraries in the auditing and certification of digital repositories using the Trusted Repository Audit Checklist (TRAC) and related CRL criteria. He served as Data Services Librarian at the University of California San Diego from 1985 to 2006 and co-taught the ICPSR summer workshop, “Providing Social Science Data Services: Strategies for Design and Operation” from 1990 to 2012. He is a co-founder of Free Government Information.

James R. Jacobs is the Federal Government Information Librarian at Stanford University’s Cecil B. Green Library and program lead for the LOCKSS-USDOCS program. He is a member of the Government Documents Roundtable (GODORT) of the American Library Association and has served on Depository Library Council to the Public Printer, including as DLC Chair from 2011 – 2012. He is co-founder of Free Government Information and Radical Reference and serves on the board of Question Copyright, a 501(c)(3) organization that promotes better public understanding of the history and effects of copyright, and encourages the development of alternatives to information monopolies.

Archives