Home » Posts tagged 'Regional discard policy'
Tag Archives: Regional discard policy
In a recent thread on the Govdoc-l mailing list about a Congressional Publications Hub (or “pub hub” — more of the thread here), one commenter said that The American Memory project’s digital surrogates of the pre-Congressional Record publications "probably aren’t salvageable" because the TIFFs were captured at 300 ppi resolution and then converted to 2-bit bitonal black and white and that most of the text is too faded or pixelated to be accurately interpreted by optical character recognition (OCR) software. He concluded that this was "Kind of a shame."
It is indeed a "shame" that many of the American Memory Project’s "digital surrogates" probably are not salvageable. But the real shame is that we keep making the same mistakes with the same bad assumptions today that we did 10-15 years ago in regard to digitization projects.
The mistake we keep making is thinking that we’ve learned our lesson and are doing things correctly today, that our next digitizations will serve future users better than our last digitizations serve current users. We are making a series of bad assumptions.
- We assume, because today’s digitization technologies are so much better than yesterday’s technologies, that today’s digitizations will not become tomorrow’s obsolete, unsalvageable rejects.
- We assume, because we have good guidelines (like Federal Agencies Digital Guidelines Initiative (FADGI)) for digitization, that the digitizations we make today will be the "best" by conforming to the guidelines.
- We assume, because we have experience of making "bad" digitizations, that we will not make those mistakes any more and will only make "good" digitizations.
Why are these assumptions wrong?
- Yes, digitization technologies have improved a lot, but that does not mean that they will stop improving. We will, inevitably, have new digitization techniques tomorrow that we do not have today. That means that, in the future, when we look back at the digitizations we are doing today, we will once again marvel at the primitive technologies and wish we had better digitizations.
- Yes, we have good guidelines for digitization but we overlook the fact that they are just guidelines not guarantees of perfection, or even guarantees of future usability. Those guidelines offer a range of options for different starting points (e.g., different kinds of originals: color vs. B&W, images vs. text, old paper vs. new paper, etc.) and different end-purposes (e.g., page-images and OCR require different specs) and for different users and uses (e.g. searching vs reading, reading vs. computational analysis). There is no "best" digitization format. There is only a guideline for matching a given corpus with a given purpose and, particularly in mass-digitization projects, the given corpus is not uniform and the end-point purpose is either unspecified or vague. And, too often, mass-digitization projects are compelled to choose a less-than-ideal, one-size-does-not-fit-all, compromise standard in order to meet the demands of budget constraints rather than the ideals of the "best" digitization.
- Yes, we have experiences of past "bad" digitizations so that we could, theoretically, avoid making the same mistakes, but we overlook the fact that use-cases change over time, users become more sophisticated, user-technologies advance and improve. We try to avoid making past mistakes, but, in doing so, we make new mistakes. Mass digitization projects seldom "look forward" to future uses. They too often "look backward" to old models of use — to page-images and flawed OCR — because those are improvements over the past, not advances for the future. But those decisions are only "improvements" when we compare them to print — or more accurately, comparing physical access to and distribution of print vs digital access to and distribution over the Internet. When we compare those choices to future needs, they look like bad choices: page-images that are useless on higher-definition displays or smaller, hand-held devices; OCR that is inaccurate and misleading; digital text that loses the meaning imparted by layout and structure of the original presentation; digital text that lacks markup for repurposing; and digital objects that lack fine-grained markup and metadata that are necessary for accurate and precise search results finer than volume or page level. (There are good examples of digitization projects that make the right decisions, but these are mostly small, specialized projects; mass digitization projects rarely if ever make the right decisions.) Worse, we compound previous mistakes when we digitize microfilm copies of paper originals thus carrying over limitations from the last-generation technology.
So, yes, it is a shame that we have bad digitizations now. But not just in the sense of regrettable or unfortunate. More in the sense of humiliating and shameful. The real "shame" is that FDLP libraries are accepting the GPO Regional Discard policy that will result in fewer paper copies. That means fewer copies to consult when bad digitizations are inadequate, incomplete, or unusable as "surrogates"; and fewer copies to use for re-digitization when the bad digitizations fail to meet evolving requirements of users.
We could, of course, rely on the private sector (which understands the value of acquiring and building digital collections) for future access. We do this to save the expense of digitizing well and acquiring and building our own public domain digital collections. But by doing so, we do not save money in the long-run; we merely lock our libraries into the perpetual tradeoff of paying every year for subscription access or losing access.
There’s something that has been sticking in my craw for quite some time. That something is the term “flexibility” that has been used as a bludgeon by regional FDLP libraries to push the Government Publishing Office (GPO) to create its potentially disastrous regional discard policy. Over the last 5 years at least, some FDLP librarians — primarily those in Regional libraries — have argued that, because of dire space issues at their libraries, they need “flexibility” to manage their collections. In other words, they want to discard documents to gain floor space. Regionals have argued that Title 44 of the US Code, the underlying law of the FDLP, does not give them this “flexibility.”
It’s always bothered me that this demand for “flexibility” has come from a few regionals but the policy change will affect the whole FDLP. When GPO asked regionals what they wanted to do, more than half of the 47 current Regionals said they wanted to retain their current tangible collections and sixty percent said they wanted to continue building their tangible collections. When, in the same survey, GPO asked which of 60 specific titles Regionals might want to discard, only two titles were selected by more than a third of regionals.
So, if a few Regionals want to get rid of a few titles, why do we need a policy that turns the FDLP commitment to preservation upside down and encourages rather than prohibits discarding at all 47 Regionals?
It seems to me that there are three problems with the argument that Regionals need “flexibility:”
- The FDLP system already has flexibility. There are two kinds of depository libraries: Regional libraries that are required to receive all publications that GPO distributes in order to ensure long term preservation and access to the entire FDLP corpus and support the work of Selective libraries in their state or area, and Selective libraries that tailor their collections to match their size and the needs of their local communities and which may withdraw documents they’ve selected after 5 years’ retention. It is the very rules that the Discard Policy circumvents (Title 44 and the The Federal Depository Library Handbook) that create and support the flexibility of the system as a whole. The retention requirement of regionals is the very reason that all selective FDLP libraries can discard and manage their collections “flexibly.”
- Flexibility is built into the FDLP. Indeed, the words “flexibility” and “flexible” are mentioned more than a dozen times in the FDLP Handbook. This new (mis)use of the term to mean only one thing — discarding paper copies by Regionals — is a red herring that implies that flexibility is needed (it is not) and does not exist (it does). If a few regionals need “flexibility” perhaps they should just become selectives.
- Giving Regionals the “flexibility” to discard parts of their collections actually reduces the flexibility of the system as a whole because it puts new burdens on the Selectives — thus reducing their flexibility.
Is the current Regional/Selective FDLP system perfect? No, there’s lots more work to be done by all FDLP libraries to assure preservation of the historic national collection and better support the program, and more that GPO could do to support cataloging and curation of the national collection. But I really wonder if the FDLP even needs this new designation of “preservation stewards” brought about by the introduction of the Federal Information Preservation Network (FIPNet) and the Regional Discard Policy. We already have 47 of them in the form of Regional libraries! If a few regionals choose to become selectives, FDLP would still have all those other Regionals (maybe as many as 40?). And we would also have those few former-regionals that would probably maintain most if not all of their historic collections. This would be much better for preservation and better for users than these temporary preservation stewards.