future of federal depository library program

Lunchtime listen: Jacobs and Jacobs interviewed on the Library Cafe

Jim and I had a great time last week talking with Thomas Hill about FGI, the FDLP, and the future of government information. Tom is a librarian at Vassar College and hosts the Library Café, a "weekly program of table talk with scholars, artists, publishers and librarians about books, ideas, and the formation and circulation of knowledge." Thanks Tom for the opportunity to talk about the future of the FDLP and government information and for allowing us to upload a copy of the audio file to the Internet Archive.


GPO Response to NAPA Report's Recommendation to Charge for FDsys access

Acting Public Printer Davita Vance-Cooks has responded to the letter by the group CASSANDRA about the recent report Rebooting the Government Printing Office: Keeping America Informed in the Digital Age by the National Association of Public Administration (NAPA). .

The report recommends that GPO should consider "cost recovery" for access to FDsys (See NAPA releases report on GPO).

The Response from Vance-Cooks says that GPO has "no intention of charging public users a fee to access content available through FDsys. GPO remains committed to no-fee access to FDsys for the public as part of our mission of Keeping America Informed."

This is, of course, good news, but we have to temper our enthusiasm with the realization that GPO's ability to meet its intentions will inevitably be dictated by Congress and its budget.

The complete response is attached below:



Link to pdf copy at Internet Archive.

Please sign our petition for open access to ALL govt information (or as close to ALL as we can get)

[UPDATE 4/2/13: We've had some questions about the meaning of "ALL." Please read the comment thread for clarification. We don't mean "records" (which fall under FOIA) and we don't mean classified information. We mean public domain documents, publications, reports, data, statistics and the like. JRJ]

A convergence of several things -- the White House's new policy on Open Access to federally funded scientific information, the NAPA Report on the GPO, the CASSANDRA Letter to the Public Printer, and Sunshine Week among them -- has led us to create a petition on the White House's We the People petition site. If you believe in free permanent public access to authentic government information, we hope you'll sign the petition and forward on to all your friends and social networks to help us reach our goal of 100,000 signatures by April 11, 2013! Thanks in advance!!


WE PETITION THE OBAMA ADMINISTRATION TO:

Require free online permanent public access to ALL federal government information and publications.

1. Assure that GPO has the funds to continue to maintain and develop the Federal Digital System (FDsys).

2. Raise ALL Congressional, Executive & Judicial branch information, publications & data to the level of federally funded scientific information & publish ALL government information as "Open Access."

3. Mandate the free permanent public access to other Federal information currently maintained in fee-based databases - including the Public Access to Court Electronic Records (PACER), the National Technical Reports Library (NTRL), & USA Trade Online.

4. Establish an interagency, govt-wide strategy to manage the entire lifecycle of digital government information w/ FDLP Libraries - publication, access, usability, bulk download, long-term preservation, standards & metadata.

Background:

The National Academy of Public Administration (NAPA) completed an operational review of the Government Printing Office (GPO) mandated under the 2012 Consolidated Appropriations Act (Public Law 112-74). The NAPA report, “Rebooting the Government Printing Office: Keeping America Informed in the Digital Age,” acknowledged the obligation Congress has to establish an interagency government-wide strategy to manage the lifecycle of digital government information. The report also acknowledged the vital role GPO plays in providing free permanent public access to authentic government information in tangible formats through its Federal Depository Library Program (FDLP) and to authentic government information in electronic formats via GPO’s Federal Digital System (FDSys).

However, Recommendation 4 states: “GPO and Congress should explore alternative funding models for the Federal Digital System in order to ensure a stable and sufficient funding source.” Among the models recommended are “…reimbursement for services; fees for end users; dedicated appropriations; and/or an automatic charge to agencies, depending on size, to encourage agencies to take advantage of GPO’s existing infrastructure and cover the cost of the services being provided by GPO.”

Just as the Obama Administration supports the public’s right to “free access over the Internet to scientific journal articles arising from taxpayer-funded research,” the Administration must support the creation of “stable and sufficient funding” to ensure free permanent public access to authentic government information arising from the work of taxpayer-funded Executive, Congressional, and Judicial Branch agencies.

Notes:

  1. NAPA report, “Rebooting the Government Printing Office: Keeping America Informed in the Digital Age.”
  2. CASSANDRA Letter to US Public Printer in response to the NAPA Report.
  3. Expanding Public Access to the Results of Federally Funded Research. John P. Holdren, Director of the White House Office of Science and Technology Policy (OSTP).
  4. White House response to "We The People" petition "Increasing Public Access to the Results of Scientific Research"
  5. Government Accountability Office (GAO), Information Management: National Technical Information Service's Dissemination of Technical Reports Needs Congressional Attention. GAO-13-99, November 19, 2012. Context on the GAO report from FGI.
  6. GPO's Federal Digital System (FDsys): http://fdsys.gov
  7. PACER: http://www.pacer.gov
  8. National Technical Reports Library (NTRL): http://ntrl.ntis.gov
  9. USAtrade: https://www.usatradeonline.gov
  10. Federal Depository Library Program (FDLP). http://fdlp.gov

CASSANDRA writes letter to Public Printer regarding the NAPA report

Last month the National Association of Public Administration (NAPA) released a report entitled "Rebooting the Government Printing Office: Keeping America Informed in the Digital Age" -- FGI responded with an analysis of the report and were particularly disturbed by recommendation #4 which said that GPO should consider "cost recovery" for access to FDsys.

A group of long-time government information librarians writing under the moniker of CASSANDRA (Concerned Government Information Professionals), have co-written a letter to Public Printer Davita Vance-Cooks offering their strong support for NAPA's conclusion that "free access to government information is both an important tenet of a democracy and a critical responsibility" while calling into question the same recommendation #4.

With CASSANDRA's permission (FYI, both Jim Jacobs and James Jacobs are signatories to this letter), we've posted the letter here for public knowledge and so that others may also write letters to the Public Printer and cite this letter in support of free permanent public access to authentic government information now and in the long-term.



NAPA releases report on GPO

The National Academy Of Public Administration has released its report on the Government Printing Office.

  • Rebooting The Government Printing Office: Keeping America Informed in the Digital Age, A Report by a Panel of the National Academy Of Public Administration for the U.S. Congress, Congressional Research Service, and the Government Printing Office. National Academy Of Public Administration, Washington, DC (January 2013).

    Congress mandated that the National Academy of Public Administration (the Academy) conduct a broad operational review of GPO. The Academy formed a five-member Panel of Fellows to conduct a ten-month study of the agency’s current role, its operations, and its future direction.

The report contains 27 finding and 15 recommendations. Depository libraries will be particularly interested in three findings:

  • III-3: Preservation of the Legacy (Tangible) Government Collection
  • III-4: Preservation of the Digital Government Collection
  • III-5: Government Information Dissemination and Access

The report repeats many of the tropes about the digital government information that have become familiar over the years. Some of these bear repeating and others are more questionable.

Perhaps the most troubling suggestion in the report is GPO should consider "cost recovery" for access to FDsys:

Now may be the time for GPO to revisit charging the public for access to FDsys content. The Academy convened a forum of experts on printing and publishing where this topic was discussed extensively. Participants noted that technologies for online payments have progressed to the point that they cost very little to administer. Also, the public is becoming accustomed to paying fees for government services that used to be free (such as admittance to National Parks). Rather than charge a publication price, GPO could explore charging a small user fee to recoup the cost of providing access to government information on FDsys, or allowing users to view documents for free, and charging for document downloads. Forum participants also discussed the possibility of GPO exploring opportunities for repackaging files and content in different ways and making them available for sale to the public.

This model (as the report notes) was tried before with GPO Access and failed. We would argue that it failed not because the "technologies of online payments" were inadequate at the time, but because attempting to charge fees for information that was also available without fees was a fundamentally flawed approach. (We have written about this issue many times. See for example: Government Information in the Digital Age: The Once and Future Federal Depository Library Program and Privatization of GPO, Defunding of FDsys, and the Future of the FDLP.)

There is much more in the report and it deserves careful scrutiny.

The future of libraries and the value of libraries is content -- including free content

Today I was re-reading an article from a few years ago and was struck by how prescient it was in providing a formula for the success of libraries. Here are some of its main points:

  • Find a niche with growth potential (serve a community).
  • Organize information to make it useful
  • The internet is a distribution channel -- not a product (add value!)
  • Turn words into math (sophisticated mathematical formulas can find patterns in content and make it more discoverable)
  • Separate the signal from the noise (Type the word "jaguar" into Google's search engine and you'll get 64 million results. Fix this!)
  • Computers can't do everything (humans indexers and editors make the difference that algorithms cannot)
  • Print's not dead, it just needs online help

There you go!

Oh, I left out two things: One of the main rules was "Treat content like patented material" and the article was not about libraries but about Westlaw which has a successful "business model" of doing what libraries don't do anymore because "it is all on the web" and "someone else is doing that" etc.: selecting, acquiring, organizing, and preserving information and providing discovery of, access to, and service for that information. And they built their business model on using Free information. Most libraries do not have to make money which means they could do this for less cost and deliver the results to more people for free. But libraries would have to make the case that this is a better, more equitable, more democratic model than relying on the private sector. And they'd have to have leaders with the vision to build a 21st century library. Westlaw and others did it in the 20th Century. Google could never have built Google books without all the work libraries provided by building collections. What libraries have this vision today for future generations?

Read all about it:

Live blogging Fall '12 Depository Library Conference. Hashtag is #dlcf12

The 2012 Depository Library Council meeting and conference is upon us. It's sure to be 4 days packed with educational sessions and discussions about the FDLP Forecast Study and the future of the FDLP. Here's the conference schedule.

As in past years, there will be live blogging of the conference for those not able to make it to Washington DC. This year I'm asking as many people as possible to use the twitter hashtag #DLCf12 or #DLC12f (some other conference is using the #dlc12 tag unfortunately). You'll be able to follow along below or directly on the twitter site.

I'm also told that there will be virtual attendance for the four daily sessions on the FDLP Forecast Study (pre-registration is required):

  • Monday, October 15, 2012 at 2:00 pm Eastern: Methodology, Study Phases, and State Forecast
  • Tuesday, October 16, 2012 at 2:00 pm Eastern: State Forecast and State Focused Action Plans
  • Wednesday, October 17, 2012 at 2:00 pm Eastern: Library Forecasts
  • Thursday, October 18, 2012 at 2:00 pm Eastern: Summary Discussion and Future Roles


FDLP CRS Report: Useful with Reservations #FDLP

We have had a chance to review the new Congressional Research Service (CRS) Report Federal Depository Library Program: Issues for Congress (Petersen) available at from the Federation of American Scientists, Project On Government Secrecy web site.

While we believe it serves as a useful overview of the Federal Depository Library Program (FDLP), the report has a few significant problems. Members of Congress should consider the following before using this report as a basis for modifying the FDLP:

Report appears to take Ithaka S+R report at face value

Pages 6-11 of the CRS report concern the findings of the Government Printing Office (GPO) commissioned Ithaka S+R FDLP Report (Housewright) and GPO's ultimate rejection of the report. We are concerned that CRS has taken Ithaka's conclusions at face value and have not considered the many criticisms of the Ithaka report. Some of these criticisms included:

  • The report made broad statements about users without sufficient consultation with actual end users.
  • The report focused on the value of the program to libraries and not to users.
  • The report apparently ignored corrections from law librarians and others so that errors in draft documents carried over to final documents.
  • The report excluded serious discussion of digital deposit and local digital collections of federal information.
  • The report failed to account for risks of implementing its recommendations.

We wrote extensively during the Ithaka S+R report period. We were not alone. A complete set of comments that Ithaka S+R received on its project web site is available from GPO. Yet the CRS report authors do not appear to have considered the public comments that questioned a number of Ithaka S+R's findings.

Another curiosity is CRS's omission of GPO's reasoning for rejecting the Ithaka S+R report. The authors simply note that "GPO did not provide a detailed, publicly available explication of its decision." It seems to us that it would have been appropriate and useful for CRS to have contacted Superintendent of Documents (SuDoc) Mary Alice Baish and interviewed her about GPO's rejection of the Ithaka S+R report. In doing so, CRS could have expanded the existing public record with more details from GPO as to why the report was unacceptable and given the report an additional depth of understanding. Given the GPO rejection of the Ithaka S+R report and the amount of criticism of the report from the library community, CRS's reliance on the report results in a description of the FDLP that is both limited and slanted.

Threats to access to digital government information

Pages 13-14 of the CRS report address "Access to Digital Government Information." The section concludes with the following:

The use of the FDLP Electronic Collection may raise the following concerns in the context of digital information:

  • Where do FDLP Electronic Collection data reside?
  • Are current data management protocols sufficient to ensure no loss of data availability, and assured access?
  • Are those protocols similar in GPO, other federal agencies, and non governmental partners that provide content?
  • What backup, and information distribution and assurance policies, are in place?

Although these are legitimate questions, CRS left out bigger problems within which these questions are merely details of implementation. These bigger problems stem from the GPO-centric model of the FDLP in which GPO has usurped from libraries the roles of both preservation and access. By replacing libraries, GPO has endangered the long-term future of information preservation and free-public access to that information in many ways. Three of the most important of those are, uncurated access, the term we call "silent withdrawals," and the very real potential of inadequate funding of GPO along with the complementary danger of replacing of free access with fee-based access.

Uncurated access

Access to government information has been a key tenet of the FDLP for 200 years. CRS averred that fact when they stated, "emergence of digital delivery of government information outside the FDLP program may offer increased access to government information to those who might not be able to visit depository libraries." But the key point missed by CRS is the idea of uncurated access. By only discussing access, but not preservation, CRS ignores the processes carried out by depository institutions to *preserve* govt information. We have said many times on FGI that access today does not equal access in the long-term. Libraries have begun to put processes in place to assure long-term digital access (University of North Texas Digital Library, LOCKSS-USDOCS, Archive-it collections, End-of-term crawls etc). Librarians can and should continue their curatorial responsibilities in the digital realm. We can't expect GPO and other government agencies -- especially in this budget crisis climate -- to have the long-term vision necessary to assure long-term preservation. Curation and content control will be key issues going forward. These issues were merely glossed over by the report.

Silent Withdrawals

One of the many strengths of a distributed depository system is the way its very structure protects information from intentional or unintentional loss, censorship, or erasure. Without this protection, information can too easily be withdrawn "silently" -- that is, without public announcement or review. That FDLP works is evident when one compares information in the depository system to information not in the depository system. The number of documents that have been sent to depository libraries and later withdrawn is relatively small and the reasons for the recalls are usually not controversial.

Contrast this to information that has been withdrawn from the web, reclassified by agencies, and documents that have had open access restricted by agencies after their release:

The reason for the success of the depository system is that it has checks and balances and procedures that must be followed when an agency wishes to withdraw a publication ("ID 72" GPO 2005). In the world of physical deposit of print documents, withdrawal of a previously deposited document requires the compliance of tens or even hundres of libraries that actually have physical possession and control of copies. While depository librarians have a legal obligation to comply with withdrawal and destroy orders, there have been cases where this step triggered complaints about unreasonable withdrawal requests. Such questioning has led agencies to withdraw requests that seemed based on embarrassment or paranoia rather than error or true security needs.

One noteworthy example of this comes from 2001 when the CIA put pressure on the Department of State to destroy already-printed volumes of the Foreign Relations of the United States, 1964-1968, V. 16, Cyprus, Greece, and Turkey. But those volumes were in the possession of GPO and slated for deposit with FDLP libraries (Aftergood). The volumes were not destroyed and were distributed (S 1.1:964-68/v.16).

Another example comes from 2004 when the Justice Department demanded that depositories destroy copies of five publications that dealt with, among other things, how citizens can retrieve items confiscated by the government. The American Library Association objected, the Justice department rescinded its order, and GPO allowed libraries to keep copies and also replaced copies already destroyed. (Lee)

GPO's policy does have good procedures to prevent "silent withdrawals" even of information that is not physically deposited with libraries. But when GPO does not deposit digital copies with libraries, depositories are cut out of the procedures and an important safeguard is missing. Withdrawal decisions and their execution stay wholly within the federal government -- making it easier for the government to remove items from public access. The "LOCKSS-USDocs" private LOCKSS network project is beginning to replace this safeguard, but more work is needed to ensure digital deposit with more libraries in order to guard against silent withdrawals.

Budget Problems

The current GPO-centric model of digital access described, and apparently unquestioned, by CRS has a single point of failure. If Congress decides it is no longer worthwhile to adequately fund information dissemination in general or GPO in particular, users and libraries will lose access to material unique to GPO's servers. Even the maintenance of so-called "persistent" URLs (PURLs) could be endangered by something as simple as inadequate funding.

Digital information requires long-term, consistent funding. Neither digital information preservation nor access can be accomplished passively: both require constant attention and renewal and resources. Even budget cutbacks can cause loss of information or loss of access to information. The single-point-of-failure GPO-centric model of preservation and access is a system in which even inadequate funding means loss of information.

Reduced funding can also lead to privatization of government information access. This can occur if the fee-based private-sector takes over the delivery of services that GPO drops because of inadequate funding. It can also occur if Congress mandates that GPO use a fee-for-service model. In both cases, free access will be lost and people and libraries may be unable to afford adequate access. (Jacobs)

An April 10, 2012 Federal Times demonstrates that GPO is already feeling a lot of pain:

At risk of needing a congressional bailout 18 months ago, the Government Printing Office slashed its workforce, cut employee benefits, rented out excess office space and took other steps to stabilize its finances.

To make ends meet, GPO is also focusing on money-making activities like making secure credentials for the FBI. At its heart, the FDLP is a cost center. It has no opportunity to make GPO profit. This is right and proper, but will continue to make the FDLP a tempting target in future budget reductions. (Jacobs)

Summary

Any discussion of disruptions in user access needs to acknowledge the above facts. As long as digital storage is centralized in GPO, free and permanent access is only a Congressional Act away from being disable or terminated. The report does ask a key question: what solutions might create a more robust FDLP that is better equipped to meet the demands of providing government information to American citizens." We at FGI and many allies in the FDLP community have been working on that question (see Letter to Deputy CTO Noveck: "Open Government Publications," Rethinking the Cloud, and Achieving a collaborative FDLP future to contextualize the issues involved).

The report written by Petersen, Manning and Bailey provides a useful historic overview of the FDLP. We feel that it somewhat mischaracterizes recent efforts at building consensus. Most seriously, the report leaves out major barriers to free, permanant public access to government information that MUST be addressed in any meaningful reform effort.

References:

Digital #FDLP: Louisiana GODORT keynote address

[UPDATE 3/25/12: for those of you on ipads/iphones (which don't play well with flash), I've attached the PDF version below. JRJ]

I just got back from Shreveport, LA (and boy are my arms tired :-)). I was honored to be invited as the keynote speaker at the Spring 2012 meeting of the Louisiana Government Documents Round Table (LA GODORT) during the 2012 LA Library Association annual conference. We had a great conversation about the future of government information and the FDLP -- and I reminded everyone to submit their FDLP library forecast survey!! I showed a few case studies describing ways that librarians could build digital govt documents collections (including Everyday Electronic Materials (EEMs), LOCKSS-USDOCS, and Archive-it). Thanks again to Miriam Childs, Stephanie Braunstein and the rest of the LA GODORTers for a wonderful time!

Achieving a collaborative FDLP future (#FDLP)

Note: This post is a longer response to a recent comments left on the LJ article written by Jim A. Jacobs and Melody Kelly.

Recently, Library Journal published a short article that Melody Kelly and I wrote (The Future of the FDLP: From Conversation to Confrontation). In such a short opinion piece, we did not have the space to document our arguments and expand on our reasoning. Since the article was posted, it has received a few comments. So we are using this post to expand on our LJ piece a bit and re-post here our replies to those comments.

As we said in LJ, the discussions about shared regional depositories have morphed from a conversation into a confrontation. This was our main point. As Daniel Cornwall said here recently, just as GPO needs to work with FDLP, so all FDLP libraries should be working with GPO, not working against it. We are concerned that the tactics adopted by ARL and many of the association's prominent members are more confrontational than cooperative and that such tactics may harm GPO at a time when it needs our support.

Some of the issues involved are complex and confusing and we imagine that many non-FDLP librarians -- and even some in the FDLP community -- find it difficult to follow the sometimes arcane, legal details of the arguments on both sides. This is particularly true when attempts are made outside the current GPO rules to redefine the scope or procedures of an existing Regional.

Because the FDLP is based upon federal law in Title 44 of the US Code, any changes in the role of the regional depository libraries would require legislative revision and will therefore be slow. As the FDLP community knows from past experience, legislative revisions to Title 44 must be carefully orchestrated, may take several Congressional sessions, will require the support of the entire FDLP community working with GPO and Congress, and is probably unlikely. But there is much that can be done, and done now -- by individual libraries and groups of libraries and by GPO that will move us all forward toward our shared goals of permanent no-fee access to government information regardless of format.

Moving forward together.

In comments to the LJ article, Bill Sudduth and John Burger said that the article has inaccuracies, innuendos, and fabrications. Mr. Sudduth says that we raise "narrow issues" and straw-man arguments and implies that we have irrelevant standards of "satisfaction." Strong words! But we believe that both Mr. Sudduth and Mr. Burger badly misinterpret our concerns. Here is our reply to their comments:

The comments of Mr. Burger and Mr. Sudduth imply that we object to the goals of the ASERL proposal, but we do not. We share their goals. We have expressed concerns about whether or not the ASERL plan will accomplish its goals. They have not addressed what we see as legitimate concerns.

As government information librarians who have for years been urging GPO and FDLP libraries to move to a digital FDLP, we support the goals of ASERL to improve bibliographic control of our collections and to provide digital access to older printed materials and we share the frustration of not being able to move forward more quickly. But as digital librarians who have created and managed government-information digital-library projects and who have a combined professional experience of over 40 years of designing, building, supervising, and evaluating digital library and digital preservation projects of all kinds, we also question some of the specific means ASERL is proposing to reach those goals.

The good news is that ASERL can accomplish the majority of its goals (create better inventories of their collections, increase and enhance cataloging, digitize documents to provide "additional access points") without GPO approval. They can also build their Centers of Excellence under existing FDLP procedures for Shared Housing Agreements.

The only thing they cannot do without GPO's (or, indeed, the Joint Committee on Printing's) consent, is weed the existing print collections in the regional depositories. Although Mr. Burger says that ASERL does not "advocate" "wholesale weeding" or replacing tangible copies with digital surrogates, we believe the plan will permit just that. Indeed, the ASERL Implementation Plan explicitly allows for "the region" to have "at least two complete cataloged sets of print publications." To us, this explicitly permits a reduction from twelve copies to two. Although the Plan allows for the possibility of retaining more than two, it does not require more than two nor does it justify how only two copies might be adequate.

We are further led to this conclusion because for years we have read how ARL libraries advocate weeding their collections, minimizing the number of paper copies, and using digital surrogates to replace paper. (See for example Burger; Schonfeld Documents for a Digital Democracy; ARL; Russell 2003; and Russell 2004.) Now that ASERL has a concrete proposal that could apparently do just that, we are concerned that the plan lacks the necessary safeguards that would ensure it can meet the needs of users for both paper and authenticated digital copies.

We suggest that ASERL continue to move forward now on its project goals that do not require GPO's approval, and simultaneously work with GPO and the entire FDLP community to resolve the legitimate, long-term issues regarding digitization and national collection preservation. This will benefit not just ASERL but all FDLP libraries and all users of government information, both now and in the future.

--Posted by Jim Jacobs on December 18, 2011 07:45:40PM

Concerns about weeding.

As noted above, we are concerned that the ASERL proposal in particular will result in weeding from regional depositories. One of the issues we raised in LJ was that we do not yet have adequate information to justify reducing the number of paper copies in the FDLP system. We believe that we need to be cautious about weeding and should determine how many paper copies are needed to ensure both preservation and access. There are studies that address this issue (e.g., Schonfeld What to Withdraw, Schottlaender, Yano) but we believe there are two reasons that they do not provide adequate information to apply them to our FDLP paper collections.

First, these studies mostly focus on substituting digital surrogates for paper journals articles. While scholarly journals are a relatively homogeneous body of literature about which we can make generalizations, government publications present a very heterogeneous body of literature about which it is difficult to generalize. By focusing on journals, they do not address the particular qualities of government publications and our ability to produce even adequate digital copies of them. These qualities include:

  • We lack adequate bibliographic control and granularity of descriptive information for much of our collections making it more difficult to control, preserve, and provide access to digitized collections;
  • Government publications come in a wide variety of sizes and shapes and bindings (and non-bindings) and types and have many serials and multi-volume sets and looseleaf updates making them difficult to digitize adequately at a reasonable cost (GPO, 2004);
  • Many publications are old and have brittle and yellowed paper which will be more difficult to accurately digitize;
  • Many publications have tables of statistical information which is difficult or expensive to digitize accurately. (See below for more information about this);
  • Many publications also have charts, graphs, photographs, drawings, models, and other types of images and too little is known about how to digitize them accurately.

Second, these studies rely on having a perfect digital copy as a preservation copy. For reasons we explain below ("concerns about digitization"), we believe it is premature to assume we can create such digital copies for government publications.

When we consider weeding our valuable collections we need to consider access as well as preservation. It is not just about keeping an emergency copy-of-last-resort in a vault or about Title 44 legal requirements. It is about keeping an adequate number of working, usable, loanable copies geographically near their users. We do not think this is a controversial position. Even Mr. Burger says libraries need to retain an adequate number of paper copies for direct user examination. So the issue is, How do we determine what "an adequate number" is?

To summarize, we believe we should be cautious about weeding and discarding paper copies. We do not believe this is a contentious issue, but we do believe that any given project that involves any weeding of Regionals (which were, after all, established to guarantee preservation and access of these materials) should have adequate accountability and be approved by GPO. We believe that we should move cautiously so that we avoid actions that we will regret later. We are reminded of what Judith Russell, the chair of the ASERL Deans FDLP Steering Committee, said when she was Superintendent of Documents at GPO:

"Many years ago GPO turned over its historical collection to the National Archives and almost immediately we began to regret the absence of a tangible collection. We have decided to re-establish a comprehensive collection of tangible and electronic documents as a collection of last resort for the program, and the new organization will dedicate staff resources to that effort." (Russell, 2003)

Concerns about digitization.

Another concern we expressed in the LJ article was that we do not yet have enough research to guarantee we can make accurate, usable digital copies of government publications.

We are particularly concerned with the accuracy of Optical Character Recognition (OCR) of the many statistical tables in government publications. The key study of this issue was done at Yale (Green, Linden) and demonstrates the difficulty and expense of accurately digitizing statistical tables. Shafait has studied the difficulty of just locating tables during scanning. Bicknese noted the difficulty of OCR scanning of tables and so excluded tables from a sample in testing the accuracy of Making of America OCR scans. In trying to determine OCR accuracy, Blando found that many tables were illegible and generally useless. In a 2004 study, Joseph examined quality of images and figures in journals scanned by Elsevier after removing paper copy journals to a remote location. He found that 73.6% of the issues had at least one image with unacceptable quality. In a test of OCR accuracy for searching only, the Harvard LDI team excluded 19th century materials because they assumed that failure rates would be higher for them because of the lower contrast of older printed pages. Schonfeld, in a study of "What to Withdraw," notes that too little is known about the share of the intellectual content of charts, graphs, data tables, photographs, drawings, models, and other types of images that is captured by existing scanning and format standards. And Tanner looked at the OCR accuracy of "groups of numbers" in scanned online newspapers, and found the accuracy only 64.1% for 19th century newspapers and only 59.3% for 17th and 18th century newspapers.

(It is important to understand that digital images of statistical tables -- if legible to the eye -- will probably serve the needs of many users and are welcome as an additional access point to government information. Our concern here is not about adding an access point that is at least as good as, but no better than, the paper copies. That can be done now without the approval of GPO. As noted above, we do have concerns about that, too, because even that low standard often results in illegible images (GPO 2004, Joseph) which means that users will need to have paper copies to consult and libraries will need paper copies to re-digitize to fix errors. But we are expressing a separate concern here.)

Our concern here is with the accuracy of the actual numbers if a user wanted to (for example) cut and paste those numbers. Our concern is that the tables contain a wealth of metadata about the numbers themselves and this information is rarely captured and associated with the number in a usable way. Searching for statistical information can be greatly hindered without accurate information about the numeric information.

In short, if all we want is to make simple images of our documents more available, then almost any legible scan will suffice. But, if we want to do more than create surrogates for paper, if we want to turn paper into usable and re-usable, machine-actionable digital objects, if we want to go beyond replicating paper in the digital world, we can only do so if we have a reliable way of creating accurate, functional digital objects -- and we do not yet have a single, reliable, affordable way of doing that (Green). That means that any scanning we do today will have to be re-done sometime in the future with new technologies that are capable of better, more accurate conversions of documents to usable digital objects. (This is not a hypothetical concern; we have already seen that the National Archives digitized documents as recently as the 1990s using then-current technology that makes them difficult to read online. [Marks]) And that means that we need to account for digitizing at least twice (and possibly multiple times) and we have to make sure we have enough usable print copies around for those re-digitizations. Saving enough copies to ensure we can do this is essential because several digitization methods, particularly high-quality digitization, effectively destroy the documents and older, more brittle documents are at a greater risk of destruction with any digitization.

In summary, we believe that digitization projects should account for preserving an adequate number of paper copies for re-digitization with better technologies in the future and for direct user examination when digitization is flawed or inaccurate. And we believe that, as we work toward a fully digital FDLP, we must aim for fully functional digital objects and not be satisfied with simple digital surrogates of paper.

Concerns about authentication.

Being able to authenticate digital documents is very important and not something that can be ignored until later or done retrospectively. It should be relatively easy for GPO to work with the FDLP community to create a Superintendent of Documents Policy Statement ("SOD") that would provide authentication credentials in the form of Preservation Description Information (Consultative Committee for Space Data Systems) when an FDLP library digitizes a legally deposited paper document (GPO 2004). Such a SOD could specify chain of custody and provenance through a documentable workflow, consistent metadata standards, and a digital signature. It could also specify that participating libraries would sign on as "digital partners" with the GPO and be recognized as such. Standards such as these would provide documented assurance to users of the authenticity of digitized documents and would provide the participating FDL host institution a framework for a continuing commitment to funding and staffing to ensure these digital collections are maintained as technologies evolve.

Moving forward now.

As noted above, there are many things that libraries can do today.

ASERL, for example, can enhance bibliographic access and improve its inventory of its existing collections without asking permission from GPO;

Libraries can scan documents to provide an additional access point today and many are doing so. (See the Digitization Projects Registry at the FDLP web site;)

Existing procedures allow libraries to house documents in shared facilities using "Selective Housing Agreements" (GPO, 2011);

GPO can work with the FDLP community to add and modify Superintendent of Documents Policy Statements to account for digitization, authentication, and digital deposit; and

The FDLP community can use existing studies that examine replacing print with digital surrogates as a starting point and contribute to this knowledge base by systemically researching the issues specific to our older, paper collections.

We can do all this without turning GPO into an adversary. To repeat what Daniel Cornwall said here on this same issue:


We at FGI yield to no one in our desire for a fully functional digital FDLP. We have been advocating that for seven years and are just as anxious -- if not more so -- as anyone to move forward to the next phase of the FDLP. Over the years, we have criticized GPO and its policies -- and will continue to do so when we believe such criticism is warranted. But our intention has always been to challenge GPO to do more and to do better, and to work with FDLP libraries, not work against them or arrogate responsibilities from them. Just as we want GPO to work with FDLP libraries, we want FDLP libraries to work with, not in opposition to, GPO. We at FGI believe that the future of the FDLP will be most secure if GPO and FDLP libraries work together to a common end.

We all in the FDLP community share a common set of goals and beliefs about the value of government information. We can do this together.

-- Melody Specht Kelly – Documents Librarian University of North Texas Libraries 1974 – 2001; Associate Dean of Libraries, 2001 – 2009. Adjunct Professor College of Information, 1984 – present: “Government Information Services.”

-- James A. Jacobs. Librarian Emeritus University of California San Diego 2006- present; Center For Research Libraries, Technical consultant for digital library certification and long-lived repositories 2008- present; Instructor ICPSR Summer Program "Providing Social Science Data Services: Strategies for Design and Operation," 1990- present; Data Services Librarian, University of California San Diego 1985-2006.

---

endnotes

Association of Research Libraries, "Future Directions for the Federal Depository Library Program." (December 2008).

Bicknese, Douglas A., Measuring the Accuracy of the OCR in the Making of America, Ann Arbor, Mich.,: University of Michigan, School of Information (1998)

Blando, Luis R., Junichi Kanai, Thomas A. Nartker, and Juan Gonzalez, "Prediction of OCR Accuracy."

Burger, John, Paul M. Gherman, and Flo Wilson, ASERL's Virtual Storage/Preservation Concept, ACRL Twelfth National Conference, Minneapolis, MN (April 2005).

Consultative Committee for Space Data Systems, Reference Model for an Open Archival Information System, (OAIS) CCSDS 650.0-B-1 BLUE BOOK, January 2002. CCSDS Secretariat (2002).

Green, Ann, Sandra K. Peterson, and Julie Linden. Supporting Economic Development Research: A Collaborative Project to Create Access to Statistical Sources Not Born Digital, A Report to the Andrew W. Mellon Foundation. New Haven, CT: Yale University (2005).

Joseph, Lura E., "Image and Figure Quality: A Study of Elsevier’s Earth and Planetary Sciences Electronic Journal Back File Package" Library Collections, Acquisitions, and Technical Services 30 (September 2006), 162-168.

LDI Project Team. Harvard University Library, Measuring Search Retrieval Accuracy of Uncorrected OCR: Findings from the Harvard-Radcliffe Online Historical Reference Shelf Digitization Project. (August 2001).

Linden, Julie, and Ann Green, "Don’t Leave the Data in the Dark", D-Lib Magazine, 12 (2006).

Marks, Joseph. "National Archives' first Wikipedian in residence to bring more holdings to the public", NextGov (07/11/2011).

Russell, Judith. Remarks by Judy Russell, 142nd ARL Membership Meeting, 142nd ARL Membership Meeting, Federal Relations Luncheon (May 15, 2003).

Russell, Judith C. Preservation And Authentication Of Government Information: Are We Ready For The 21st Century?, IS&T Archiving Conference, San Antonio, Texas, (April 23, 2004) The Society for Imaging Science and Technology.

Schonfeld, Roger C., and Ross Housewright, Documents for a Digital Democracy, Ithaka S+R (December 17, 2009).

Schonfeld, Roger C., and Ross Housewright, What to Withdraw: Print Collections Management in the Wake of Digitization, Ithaka S+R, (September 29, 2009).

Schottlaender, Brian E.C., Gary S. Lawrence, Cecily Johns, Claire Le Donne, and Laura Fosbender, Collection Management Strategies In A Digital Environment, A Project Of The Collection Management Initiative Of The University Of California Libraries, Final Report to the Andrew W. Mellon Foundation. University of California, Office of the President, Office of Systemwide Library Planning (January 2004).

Shafait, Faisal, and Ray Smith. "Table Detection in Heterogeneous Documents", 9th IAPR Workshop on Document Analysis Systems, DAS'10. Boston, MA, USA, (June 2010).

Tanner, Simon, Trevor Muñoz, and Pich Hemy Ros, "Measuring Mass Text Digitization Quality and Usefulness", D-Lib Magazine, 15 (2009).

U.S. Government Printing Office, Office of the Superintendent of Documents. Legal Requirements & Program Regulations of the Federal Depository Library Program. Washington, D.C. U.S. Government Printing Office (June 2011).

U.S. Government Printing Office. Report on the Meeting of Experts on Digital Preservation: Metadata Specifications, Washington, D.C.: U.S. Government Printing Office (14 June 2004).

Yano, Candace Arai, Z.J. Max Shen, and Stephen Chan, Optimizing the Number of Copies for Print Preservation of Research Journals, Berkeley, CA: University of California Berkeley, Industrial Engineering & Operations Research (October 2008).

Update: Jim Jacobs made some small changes to the above item for clarity on January 2, 2012.

Syndicate content Syndicate content