It is that time of year when we tend to look back at what has happened over the last 12 months and look forward to what we anticipate in the coming year. The urge to look back and see what has happened comes, usually, not so much from nostalgia as from a desire to evaluate recent activities: Have we accomplished something, or stood still, or regressed? And the urge to look forward is an opportunity to understand the challenges that are coming at us and imagine the opportunities to do better in the coming year.
It is a little late for a traditional end-of-year/beginning-of-year post, so, rather than looking at specific things that GPO, FDLP, and FDLP libraries have accomplished in the recent past or might accomplish in the near future, we’d like to twist the ritual a bit and take a slightly different tack. Today we will quickly examine how FDLP too often simply compares today to yesterday — and why this is a bad thing.
Looking Backward, Looking Forward
The way we see it, instead of evaluating recent activities in terms of their long-term future value (we would call this Looking Forward), FDLP too often tends to simply compare today to yesterday (Looking Backward). Looking Backwards does have one small, short-term advantage: it makes any improvement (even a tiny improvement) look good. But Looking Backwards has a much bigger disadvantage: When we Look Backward, we ignore the opportunity to imagine what we could do better, we ignore current technology trends and the opportunities for real change they provide. Together, Looking Backward and failing to Look Forward result in the FDLP standing still while the world around us changes.
We are not saying that GPO and FDLP never Look Forward, but we are saying that, when it creates and evaluates policies (even short-term policies) by Looking Backward, that can endanger the preservation of government information and the long-term, free, public access to that information. And that adversely affects the future of the FDLP.
This is particularly frustrating because a simple change in how we look at the future could make a huge difference in how we develop basic policies and procedures today. In shaping long-term change, this is, we believe, at least as important as, if not more important than abstract, long-term plans like GPO’s National Plan for the Future of the FDLP. If, instead of choosing to Look Backward, FDLP chose more often to Look Forward, we would have a very different view of our progress (or lack of progress) toward a truly digital FDLP.
This may sound abstract, so let’s look at two specific examples of how Looking Backward has created policies and procedures that are counter-productive and even harmful to the long-term FDLP. We’ll call them “Depositing objects not information” and “Digitizing Backwards.”
Example 1: Depositing objects not information
The Superintendent Of Documents policy for Dissemination and Distribution (SOD 301) explicitly limits what GPO will deposit into depository libraries to so-called “tangible” products. This policy can be judged a success only by looking backwards and evaluating it in terms of how well it adheres to old methods — continuing to send physical objects to depository libraries. By Looking Backwards and focusing on methods, depositing floppy disks or DVDs (a tiny improvement) seems like progress.
But the policy is a failure if we evaluate its outcomes. To evaluate the policy by Looking Forward we would ask if the outcomes of the policy match the long-term goals of the FDLP — not if the methods have remained unchanged. We would ask if the policy ensures the long-term preservation of digital government information and ensures that it can be accessible and usable in the future (in fact, this question should be asked of every policy decision!). Sadly, we know from experience that this policy has resulted in undesirable, counter-productive outcomes. It has complicated, inhibited, and in some cases prevented preservation and long-term access.1
Example 2: Digitizing Backwards
Although we at FGI have long supported digitization,2 there is one aspect of digitization (particularly mass-digitization) that we do find troubling and it provides another example of Looking Backward. We believe that it is only by Looking Backwards that many of our mass digitization projects seem even marginally acceptable. When we compare having any digital access to having no digital access, we are comparing the present to the past; we are Looking Backward. Viewed that way, even lousy, incomplete, inaccurate digitizations — and the equally incomplete and inaccurate metadata describing them — seem like an improvement. You can tell when someone is Looking Backwards when they say they are going to digitize on-the-cheap and the results will be “good enough.”3
Digitizing Backwards is digitizing by comparing the digital objects we create to their paper originals. Instead of this, we should Look Forward when we digitize and create digital objects that stand up to current and future expectations of our user communities. Digitizing Backwards has produced digital objects that already fall short of users’ expectations in many ways. Here are two examples.
Screen size vs. Page size. Most mass digitization projects produce eye-readable page-images as their primary product. These are usually served up as either one-page-at-a-time images on websites, or as downloadable PDFs composed of images of the original book’s pages. These are little more than digital photocopies. One major problem with these is that they do not fit well on most computing devices that we use today. Few book pages can be reproduced at full size at an adequate, readable resolution on current displays. In general, book pages are larger and have higher resolution for reading than most of the devices we use today. While the goal of producing page-images is to reproduce the original book’s presentation, that goal fails on our current devices. Trying to read these page-images of most books on the screens most of us use most of the time is more like reading a newspaper on a microfilm reader than comfortably reading a book. It is the book-equivalent to TV’s evil pan and scan of wide-screen movies. The 2015 Horizon Report characterizes this kind of PDF delivery of content as 1990s technology and notes that the trend is to smaller screens.4 The trend for the future is not fixed-size page-images, but flexible design and responsive layout.5 Some of the larger tablets can handle most small format books, but the pages of most government documents are larger than most of our current screens. If we Look Backwards, producing page images seems to be a good idea. But when we Look Forward we realize that we are producing digital objects without regard to the real computing environments in which they are actually used by most people.6
Digital text for machines, but not for humans. When a digitization project produces digital text using Optical Character Recoginition (OCR) software (not all bother to do so), the digital text is little more than a “bag of words.” Although this does open up lots of wonderful new kinds of uses of books, most of these uses are aimed at computers, not humans (e.g., full-text indexing, large scale computational analysis of texts, “distant reading”7). These are good and important features and, if we only Look Backwards, having some digital text (even if it is inaccurate and incomplete and lacks the original layout that imparts meaning) is a vast improvement over having no digital text. But if we Look Forward, we realize that the digital text we create this way is often unreadable by humans because it is unsuitable for displaying as even plain text, much less as an e-book. Some digital libraries understand that their digital text is not good enough to be used by humans directly and simply do not provide a way to see that text. Others provide the text no matter how reader-unfriendly it is. Neither approach anticipates what users already want: e-books, digital text that can be copied and pasted accurately, tables of statistics that can be found(!) and loaded into spreadsheets accurately and easily, and so forth.8
If GPO chose to Look Forward when it designed policies and procedures, it would focus on what is important and relevant today to achieve its long-term goal: “To provide for no-fee ready and permanent public access to Federal Government information, now and for future generations.”9 This would mean giving up on plans and policies that merely preserve old methods that are irrelevant (and even counter-productive) in the 21st century. It would mean creating new plans and policies that strive to reach GPO’s long-term goal and that deliver desirable outcomes.
Using the same two examples from above, what kind of policies might FDLP have if we Looked Forward instead of Backward?
- Digital Deposit, not “tangible” deposit. First, GPO would update its current SOD 301 policy that results in prohibiting the deposit of preservable digital information. Instead of focusing on whether or not GPO is transmitting bits on a physical medium or not, the policy would base the decision on the long-term usability and preservability of the digital information.10 GPO could use the Internet (e.g., FTP, HTTP, etc.) to transfer those bits to FDLP libraries. GPO would identify and distribute preservable digital government information to those FDLP libraries that were willing and able to accept it, preserve it, and make it available to their communities. After repeatedly doing no more than asking about digital deposit in the FDLP biennial survey, it would be a major step Forward for GPO to provide a digital-deposit alternative.11 This would also transform the current (Backward Looking) concept of an “Electronic Depository” (i.e., a library into which nothing is actually deposited when it “opts for digital-only publications”12 ) into an actual digital library with full control over preservation and access of its own collection. When GPO joined LOCKSS and started assisting the LOCKSS-USDOCS project in preserving content harvested from fdsys.gov, it showed that it can deposit digital documents now, without changing Title 44 or getting JCP approval; to do so, it only has to choose to Look Forward.
- Set a higher digitization standard. One of the criteria that GPO set in its Regional Discard Policy is that a publication is only eligible for discard if there is a copy in FDsys “in a format that meets the standards of the Superintendent of Documents.” The exact meaning of this is unclear and GPO has not clarified how it will implement this language. The wording of the policy does not specify any shared, community, technical standard, but relies instead on unspecified Superintendent of Documents “standards.” The policy is (apparently) designed to be narrow and Backward-Looking — focusing only on the suitability of files as “digital surrogates” for the original paper copies rather than on their future as fully functional digital-objects. In spite of all this, GPO could use its implementation of the Regional Discard Policy as an opportunity to Look Forward in defining its own standards for digital objects. First, it could (at minimum) pledge to conform to The Digital-Surrogate Seal of Approval (DSSOA). This would protect the FDLP from relying on inaccurate and incomplete digitizations as surrogates for discarded books. (One study of the HathiTrust, for example, discovered that 35% of volumes examined were unusable as surrogates.13) Second, it could work with the community to develop standards for fully-functional, machine-actionable, flexible, re-usable digital objects. It could go even further by specifying standards for how these digital objects will be preserved in a shared preservation ecosystem.14
Some will say that digital deposit is unnecessary because GPO will take care of preservation and access. Unfortunately, this is Wishful Thinking (a good companion to Looking Backward). GPO’s legislative mandate is limited to providing “an electronic storage facility” and “a system of online access” without any explicit mention of historical documents or long-term preservation [44 USC §4101]; Title 44 also specifically authorizes GPO to charge fees for access [44 USC §4102], which it has done before and still does by selling e-books through its bookstore. (Many of these e-books, which are within scope of the FDLP, are only available via subscription services like Apple books, Kindle, and Overdrive, and the pub/mobi files are NOT available via fdsys.gov.) It is both unsafe and unwise to rely on such a vulnerable mandate to preserve and provide long-term free public access to all the information that our communities need in the current political and economic climate. FDLP is designed as a cooperative system of shared responsibility. GPO still needs the help of FDLP libraries. We can do more collectively than GPO can do on its own.
Some will say that we should not worry about the current quality of digitizations because we will re-digitize when the technology warrants it. We would, indeed, be less concerned about the digitization problems if libraries had large digitization budgets and if they recognized incomplete and inaccurate digitizations as simply quick-and-dirty, first-generation products designed as a stop-gap until complete, accurate, re-usable digitization could be completed. But, as we all know, libraries do not have large digitization budgets, and we do not know of any project that has ever promoted itself as an incomplete, inaccurate, first-generation, temporary solution. To complicate the problem, most FDLP policy discussions in the last several years have focused on discarding paper collections after digitization and have justified such policies on a lack of physical or financial resources (or both) — a position and a justification that will, over time, decrease the likelihood of creating better digitizations exponentially. It cannot be logically argued that we do not have any resources to digitize properly now, but we will later.
Looking Forward is also about more than digitization. It is also about born-digital government information — an issue that is much, much bigger than digitizing the entire FDLP Historic Collections.15 Looking Forward is a different way of approaching the challenges we face. It puts the information (long-term preservation) and users (long-term, free, public access and usability of the information) first, and puts past methods and policies in their place as context within which we will make Forward Looking decisions.
Some will undoubtedly complain that our suggestions are too idealistic or too impractical, or even unnecessary because GPO will choose to do the right thing if we just don’t ask too many questions.16 Are we being idealistic? Well, yes, but we believe that the FDLP needs idealism in order to design a 21st century library ecosystem. We believe that it is actually more pragmatic than idealistic to envision a future in which libraries are relevant to users. Looking Backwards pretty much guarantees that libraries will be out-of-date soon, even if it seems (when we compare today to yesterday) that we are making progress. We believe that it is unrealistic to accept shrinking library budgets by downsizing and trying to justify our cuts by Looking Backward. We believe that we will not get increased budgets if we do not Look Forward and present a true vision of a 21st century FDLP. Looking Forward is the only way we can provide the collections and services our communities expect from us.
- FDLP libraries have found that it has been difficult and, in some cases, even impossible to provide long-term access to many of the “tangible” digital products deposited through the FDLP. Of course, it is the producing agencies, not GPO, that are responsible for creating these digital products in a way that make them difficult to preserve and use over time. But it is GPO that used this policy to prevent the deposit of digital objects that are preservable, usable, and re-usable for the long term. Examples of the difficulty created by these products and some of the big projects to deal with them include:”CD-ROM analysis project” by Hernandez and Byrnes, Spring 2004 Depository Library Council Meeting, St. Louis, MO, April 21, 2004; Creating Virtual CD-ROM Collections by Woods and Brown, International Journal of Digital Curation, 4(2), 184–198; Government Information in Legacy Formats by Gano and Linden, D-Lib Magazine, 13(7/8); Preserving Long-Term Access To United States Government Documents In Legacy Digital Formats by Kam A. Woods. Regional depository CD/DVD database at University of Kentucky. Virtual CD-ROM / Floppy Disk Library at Indiana University; and Virtualizing the CIC Floppy Disk Project: an Experiment in Digital Preservation Using Emulation by Geoffrey Brown.
- As regular readers of FGI will know, we have always encouraged digitization as a key to moving into the 21st century digital FDLP, though we have opposed policies of discarding the FDLP Historic Collections without taking due care to ensure long-term preservation and access to the content of those collections. Our position has always been that digitizing is good and that it can be done today without new policies that encourage discarding paper copies or that endanger long-term preservation and access.
- The cost situation is worse than we have space to describe here. “Digitization” is only one step of many that must be taken to provide access, and each of those steps comes with a cost. A claim that a library “must” digitize because it does not have enough money to keep its paper collections is a red flag that the library will Look Backwards to justify its minimal investment in digitization. See the “Costs” section of Wait! Don’t Digitize and Discard!
- The 2015 Horizon Report says that academic and research libraries recognize that new “reading habits…favor small screens and formats” and that, although “PDFs have been a common way to access digital content since the 1990s, they seem cumbersome when compared to the EPUB 3 format, or e-books, which are a viable option for reading on devices smaller than tablets.” (Horizon Report, 2015 Library Edition, page 18).
- See, for example: The joys and pains of type on screen by Raquel Calonge,
Medium (Oct 30, 2015); and A Library in the Palm of Your Hand: Mobile Services in Top 100 University Libraries by Yan Quan Liu, Information Technology and Libraries Vol 34, No 2 (2015); and Meeting Researchers Where They Start: Streamlining Access to Scholarly Resources, by Roger C. Schonfeld, Ithaka S+R Issue Brief (March 26, 2015).
- This is a complex issue and we do not mean to trivialize it. Books are designed differently and are used by different people for different purposes. Our requirements for reading a novel are not the same as our requirements for consulting a volume of the Census of Population and Housing. But that is part of our point: Mass-digitization projects typically use a one-size-fits-all model — and that model is the wrong one for many, many government publications. To be clear, page-image digitizations are useful, but our communities need more than this. (See the section on Looking Forward for details.)
- See, for example: Gooding, P. (2013). Mass digitization and the garbage dump: The conflicting needs of quantitative and qualitative methods. Literary and Linguistic Computing, 28(3), 425–431.
- To be clear, we are discussing here the creation of digital objects that will be useful as digital objects, not just as surrogates for paper documents. We have written elsewhere about the requirements of using digitizations as surrogates for paper documents (The Digital-Surrogate Seal of Approval: a Consumer-oriented Standard) — and much of that paper deals with the importance of surrogates and with the need for completeness and accuracy. But here we are focusing on the need to create digital objects that are not just digital photocopies of paper documents, but fully-functional, machine-actionable, flexible, re-usable digital objects. The problem of lack of human-usable digital text is a corollary to the problem of relying only on page-images for human use. When page-images are not suitable for reading, we should provide an alternative — and mass-digitization projects rarely create such an alternative. But this issue goes further, because, when we create digital text that is suitable for reading by humans, a by-product is digital objects that are even better than “bag of words” texts for computational analysis. In this case, Looking Forward to anticipate human readers of digital text aligns nicely with Looking Forward to anticipate more advanced, more accurate machine use of digital text. In short, our goal when we create digital objects should not be to emulate the past or even to solve some specific short-term problem — that would be Looking Backward. The goal should be to create digital objects that can be used equally well with today’s and tomorrow’s technologies — for humans and machines. We should create digital documents that can be used equally well on laptops, tablets, phones, and whatever tomorrow brings — documents that people can use, not just read. Note that Looking Forward is not predicting the future. It would be irresponsible to think that we can accurately know what future technologies will be. Looking Forward in this context is about creating digital objects that are open and flexible so that they can be easily accommodated by or reformatted for each new generation of technology. This contrasts with creating objects that are locked into a current file-format, software, or other technology. See also: Loose, Falling Characters and Sentences: The Persistence of the OCR Problem in Digital Repository E-Books by Diana Kichuk, Portal: Libraries and the Academy v.15, no.1 (2015): 59–91 [subscription required; see also our article about Kichuk’s research]; Trevor Owens’ article All Digital Objects Are Born Digital Objects in The Signal (May 15, 2012); and Endnote 1 above.
- National Plan for the Future of the FDLP.
- Such a policy could even have a positive effect on the those few “tagible” electronic products still in production. The Government Publishing Office could use its influence to help persuade (and even produce) tangible products that contained preservable, long-term-usable information for those agencies that need to distribute information on “tangible” media.
- Although GPO has not asked questions related to digital deposit consistently in its biennial surveys of FDLP libraries, when it has asked the question, the results have shown substantial (if far from universal) interest in digital deposit — even in the absence of specific GPO proposals or policies. In 2006, more than a quarter of the depository community was willing to try storing and serving electronic materials locally. In the 2007 biennial survey, 37.78% said that they would want to receive files by digital deposit if GPO offered them. In the 2011 survey, 87 FDLP libraries said they are already housing at least some digital depository publications on library servers. (The 2013 survey dropped all questions relating to digital deposit but one question was added again in 2015, for which the results are not yet available.) In short, libraries understand and want digital deposit!
- “All or Mostly Online Federal Depository Libraries” are defined in “FDLP Requirements and Guidance”. While these depositories provide valuable access to digital FDLP materials through their catalogs, they only point to documents, they do not actually control any content, and therefore are “depository” libraries in name only.
- Conway, Paul. 2013. Preserving Imperfection: Assessing the Incidence of Digital Imaging Error in HathiTrust.
- What would that shared preservation ecosystem look like? We’ve been writing about that for some time. See, for example: Critical GPO systems and the FDLP cloud.
- Born-Digital U.S. Federal Government Information: Preservation and Access. 2014. Prepared by James A. Jacobs for Leviathan: Libraries and Government Information in the Era of Big Data.
- DLC responds to open letter regarding the new Regional discard policy
James A. Jacobs is Librarian Emeritus, University of California San Diego. He has more than 20 years experience working with digital information, digital services, and digital library collections. He is a technical consultant and advisor to the Center for Research Libraries in the auditing and certification of digital repositories using the Trusted Repository Audit Checklist (TRAC) and related CRL criteria. He served as Data Services Librarian at the University of California San Diego from 1985 to 2006 and co-taught the ICPSR summer workshop, “Providing Social Science Data Services: Strategies for Design and Operation” from 1990 to 2012. He is a co-founder of Free Government Information.
James R. Jacobs is the Federal Government Information Librarian at Stanford University’s Cecil B. Green Library and program lead for the LOCKSS-USDOCS program. He is a member of the Government Documents Roundtable (GODORT) of the American Library Association and has served on Depository Library Council to the Public Printer, including as DLC Chair from 2011 – 2012. He is co-founder of Free Government Information and Radical Reference and serves on the board of Question Copyright, a 501(c)(3) organization that promotes better public understanding of the history and effects of copyright, and encourages the development of alternatives to information monopolies.