Home » Posts tagged 'digitization'
Tag Archives: digitization
In a press release today, The National Archives and Records Administration (NARA) announced a “new model for the preservation and accessibility of Presidential records.” The Obama Foundation has announced its commitment to fund the digitization of all of the unclassified Presidential records created during the administration of President Barack Obama and NARA says that, “Instead of constructing a building to house the textual and artifact records, existing NARA facilities will house the original materials.”
In this new model, NARA will administer neither a museum nor a traditional “Presidential Library,” and will instead focus its resources and personnel on preserving and making accessible the Presidential records of the 44th President of the United States in digital format to the greatest extent possible.
…Once the records are digitized, NARA will store and preserve the original paper records, as well as the artifacts, in an existing facility that meets NARA’s high standards for archival storage.
… In addition to the paper records, NARA received more than 250 terabytes of electronic records, including approximately 300 million emails, from the Obama White House. Together, these “born digital” and the digitized materials will represent the largest digital archive of Presidential records.
Gary Price at InfoDocket has the complete text of the NARA announcement.
In a recent thread on the Govdoc-l mailing list about a Congressional Publications Hub (or “pub hub” — more of the thread here), one commenter said that The American Memory project’s digital surrogates of the pre-Congressional Record publications "probably aren’t salvageable" because the TIFFs were captured at 300 ppi resolution and then converted to 2-bit bitonal black and white and that most of the text is too faded or pixelated to be accurately interpreted by optical character recognition (OCR) software. He concluded that this was "Kind of a shame."
It is indeed a "shame" that many of the American Memory Project’s "digital surrogates" probably are not salvageable. But the real shame is that we keep making the same mistakes with the same bad assumptions today that we did 10-15 years ago in regard to digitization projects.
The mistake we keep making is thinking that we’ve learned our lesson and are doing things correctly today, that our next digitizations will serve future users better than our last digitizations serve current users. We are making a series of bad assumptions.
- We assume, because today’s digitization technologies are so much better than yesterday’s technologies, that today’s digitizations will not become tomorrow’s obsolete, unsalvageable rejects.
- We assume, because we have good guidelines (like Federal Agencies Digital Guidelines Initiative (FADGI)) for digitization, that the digitizations we make today will be the "best" by conforming to the guidelines.
- We assume, because we have experience of making "bad" digitizations, that we will not make those mistakes any more and will only make "good" digitizations.
Why are these assumptions wrong?
- Yes, digitization technologies have improved a lot, but that does not mean that they will stop improving. We will, inevitably, have new digitization techniques tomorrow that we do not have today. That means that, in the future, when we look back at the digitizations we are doing today, we will once again marvel at the primitive technologies and wish we had better digitizations.
- Yes, we have good guidelines for digitization but we overlook the fact that they are just guidelines not guarantees of perfection, or even guarantees of future usability. Those guidelines offer a range of options for different starting points (e.g., different kinds of originals: color vs. B&W, images vs. text, old paper vs. new paper, etc.) and different end-purposes (e.g., page-images and OCR require different specs) and for different users and uses (e.g. searching vs reading, reading vs. computational analysis). There is no "best" digitization format. There is only a guideline for matching a given corpus with a given purpose and, particularly in mass-digitization projects, the given corpus is not uniform and the end-point purpose is either unspecified or vague. And, too often, mass-digitization projects are compelled to choose a less-than-ideal, one-size-does-not-fit-all, compromise standard in order to meet the demands of budget constraints rather than the ideals of the "best" digitization.
- Yes, we have experiences of past "bad" digitizations so that we could, theoretically, avoid making the same mistakes, but we overlook the fact that use-cases change over time, users become more sophisticated, user-technologies advance and improve. We try to avoid making past mistakes, but, in doing so, we make new mistakes. Mass digitization projects seldom "look forward" to future uses. They too often "look backward" to old models of use — to page-images and flawed OCR — because those are improvements over the past, not advances for the future. But those decisions are only "improvements" when we compare them to print — or more accurately, comparing physical access to and distribution of print vs digital access to and distribution over the Internet. When we compare those choices to future needs, they look like bad choices: page-images that are useless on higher-definition displays or smaller, hand-held devices; OCR that is inaccurate and misleading; digital text that loses the meaning imparted by layout and structure of the original presentation; digital text that lacks markup for repurposing; and digital objects that lack fine-grained markup and metadata that are necessary for accurate and precise search results finer than volume or page level. (There are good examples of digitization projects that make the right decisions, but these are mostly small, specialized projects; mass digitization projects rarely if ever make the right decisions.) Worse, we compound previous mistakes when we digitize microfilm copies of paper originals thus carrying over limitations from the last-generation technology.
So, yes, it is a shame that we have bad digitizations now. But not just in the sense of regrettable or unfortunate. More in the sense of humiliating and shameful. The real "shame" is that FDLP libraries are accepting the GPO Regional Discard policy that will result in fewer paper copies. That means fewer copies to consult when bad digitizations are inadequate, incomplete, or unusable as "surrogates"; and fewer copies to use for re-digitization when the bad digitizations fail to meet evolving requirements of users.
We could, of course, rely on the private sector (which understands the value of acquiring and building digital collections) for future access. We do this to save the expense of digitizing well and acquiring and building our own public domain digital collections. But by doing so, we do not save money in the long-run; we merely lock our libraries into the perpetual tradeoff of paying every year for subscription access or losing access.
Last month, the Government Publishing Office (GPO) released the National Plan for Access to U.S. Government Information: A Framework for a User-centric Service Approach to Permanent Public Access. The National Plan is the culmination of four years of study and planning activities conducted by GPO’s Library Services & Content Management (LSCM) in response to a range of factors that include directives from the Joint Committee on Printing (JCP) and the National Academy of Public Administration; seismic changes in government publishing and user information access practices; and the shifting mission of large academic research libraries.
For those interested in the background to the National Plan, I summarized some of the available information a few months ago. While a detailed development process is not included in the final document, GPO repeatedly solicited quantitative and qualitative data from depository libraries, most notably in its 2012 FDLP Forecast Study, as well as through the Biennial Survey process. GPO has already shared much of the information found in the National Plan in presentations to the community over the past year. As of this writing there is no public comment or feedback process; however, several of the sessions on the preliminary schedule for next month’s Depository Library Council virtual meeting pertain to the implementation of the National Plan, including presentations on public libraries, regional models, and the regional discard pilot project.
I recognize that there can be some hesitance in the depository librarian community in discussing a document like this in detail. After all, criticisms of the National Plan are functionally critiques of LSCM’s strategic direction, and by extension can be (mis)interpreted as criticisms of GPO and its leadership. In preemptive response, I agree with the FGI team: respectful, timely discourse makes our community stronger. I believe wholeheartedly that we all want a similar future: one in which government information is available for all to use and reuse, whenever and wherever it is needed. The vision and mission for the National Plan reflects this desire, as do the words and actions of the GPO staff who put the words into action. LSCM has been and continues to be uniquely positioned to coordinate and accomplish this work, and they have made commendable progress on many initiatives that will contribute to public access to government information for generations to come.
Like all FGI occasional contributors, I’m speaking only for myself, not my place of work, my library consortium, or the FGI team. But with that disclaimer out of the way, I think this document is an opportunity for depository librarians and others who care about future access to government information to identify where voices from the community can and should speak up to ensure that planned activities and initiatives are in alignment with the aspirational goals of sustaining permanent no-fee public access to government information. Our responsibility as a community to make sure that the promise of access is one that will be fully met through collaborative work with each other and engagement with GPO.
Structure and Format
GPO should be commended for producing a document that we can read, discuss, and share with others who care about government information. This is GPO’s plan for action and activities undertaken by LSCM: the National Plan contextualizes current priorities and initiatives, and provides a roadmap for where to expect LSCM’s focus to be going forward. It is also described as a ‘flexible framework,’ which suggests that the exact work to be conducted is yet to be determined, although several projects are underway and some are in the planning stages.
The core of the National Plan is the section of “Desired Outcomes and Actions,” which are based on a list of “Drivers of Change” that include the results of the 2012 FDLP Forecast Study, recommendations from the 2013 NAPA report on GPO commissioned by Congress, and a short but wide-ranging list of external influences. Each outcome is mapped to one of the “Principles of Government Information” adopted by GPO in 1996. Additional assumptions are also articulated that reflect the list of external influences.
The National Plan also presents three strategic priorities: lifecycle management of government information within LSCM to ensure permanent public access to digital government information; development of a sustainable structure for the FDLP; and the delivery of services that support depository libraries in providing accurate government information to the public in a timely fashion. While the strategic priorities relate to the “Drivers of Change,” they are not explicitly mapped to the vision and mission of the National Plan.
The language used throughout the National Plan is that of access rather than preservation. It is clear that enabling permanent public access to information is not the same as preserving information products, though the two go hand in hand. In general, the National Plan references concepts already in common usage in the community without further explanation. For example, there are no assumptions explicitly defining key terms like ‘access’ and ‘sustainability,’ but the concepts are used throughout the document.
To a certain extent, the National Plan is difficult to unpack and discuss because it is deeply non-specific. This lack of specificity has a particularly strong effect on action items pertaining to preservation. Of the six action items, three simply reference new programs (FIPNet, an LSCM Preservation Program, and a project to inventory “copies of record”), one pertains to access rather than preservation (working with partnerships to digitize the historical tangible collection), one relates to the development of guidelines, and one is to increase the profile of government information preservation at the national level. So although the reciprocal relationships between preservation and access are addressed in some ways, outcomes that reflect the government’s obligation to preserve its information are not fully articulated or supported.
Actions categorized as pertaining to right of access, dissemination of information, and authenticity are more specific, but the mapping of outcomes to principles is unclear. If this were to be the only public documentation guiding LSCM’s activities, then the community would have little insight into what GPO is trying to accomplish and why. As more detailed strategies and implementation plans are developed — I hope in consultation with the community at large — and disseminated, it should be possible to more confidently identify the extent to which a given action item will contribute to any given desired outcomes that can be mapped to shared goals and expectations.
The National Plan continues to frame depository libraries as supporters of public access rather than participants in the long-term management of government information, reflecting a broad and ongoing shift of framing libraries as service providers rather than collectors and organizers. Because the Regional discard policy has been approved and is currently in the implementation phase, we know that publications with authenticated digital versions in FDsys (and its successor, govinfo.gov) are eligible for Regional depository libraries to withdraw and discard under the oversight of the Superintendent of Documents. Other action items in the National Plan will lead to the ingest of more content into FDsys from depository libraries and third parties, and the authentication of this digital content, which makes more collections digitally accessible but also eligible for discard in print, a shift that could have a substantially negative effect on long-term access. An additional action item investigates the possibility that Regionals could decline to select certain materials in print/microformat altogether, and another identifies the development of requirements to facilitate pushing or depositing digital content to libraries.
While increased access to authenticated digital surrogates is a laudable measure for public access, taken as a whole the actions identified in the National Plan are framed by a continued shift of the responsibility for collection-building and preservation away from FDLP libraries, without introducing a clearly defined and workable alternative for the long-term preservation of print collections, and without adding the expectation of a meaningful role in digital preservation for these same institutions. (FIPNet is intended to fill this role, but as of this writing, this program is still mostly undefined.) The only action item directly addressing print collections in depository libraries is the development of collection care training for depository staff, and it is categorized as an action related to authenticity and integrity rather than preservation.
In general, changes to the FDLP are incorporated in the National Plan under the principle of disseminating government information, with a specified outcome of forming a sustainable network structure and governance process for the efficient management of depository collections and services. Depository libraries are only a small segment out of many potential public access channels, albeit a segment best poised to serve both marginalized and specialized users, and the National Plan identifies the need for LSCM to play a greater part in lifecycle management of information dissemination products within the federal government. However, under the National Plan, the alternatives for preservation outside of the depository library system are, at present, unclear.
Because the document is describing the role LSCM will adopt and the work it will accomplish, rather than a revised strategy for the FDLP as a program, the National Plan is not GPO’s definitive statement on the future of the FDLP. Based on this document, however, it seems reasonable to predict that GPO’s articulation of its vision for the future FDLP will reflect the priorities established in this document. With that understanding, presenting the National Plan as a document is in itself a significant step in the right direction because it gives the government information community a shared frame of reference in discussing GPO’s priorities and evaluating its accomplishments, and provides us with the opportunity to determine how our libraries and organizations, as well as the community as a whole, can respond to and engage with GPO initiatives as they move forward.
James A. Jacobs. “NAPA Releases Report on GPO.” http://freegovinfo.info/node/3862. Updated February 6, 2013.
James A. Jacobs and James R. Jacobs. “What You Need to Know About the New Discard Policy.” http://freegovinfo.info/node/10525. Updated November 30, 2015.
James R. Jacobs. “DLC Responds to Open Letter Regarding the New Regional Discard Policy” http://freegovinfo.info/node/10736. Updated January 18, 2016
Library Services & Content Management. “FDLP Forecast Study.” http://www.fdlp.gov/377-projects-active/1686-fdlp-forecast-study. Updated August 12, 2015.
—. “Federal Information Preservation Network.” http://www.fdlp.gov/project-list/federal-information-preservation-network. Updated April 13, 2015.
—. “Federal Information Preservation Network (FIPNet) – Answering Your Questions.” http://www.fdlp.gov/all-newsletters/featured-articles/2349-federal-information-preservation-network-fipnet-answering-your-questions. Updated December 21, 2015.
—. “JCP Approves Regional Discard Policy.” http://www.fdlp.gov/news-and-events/2403-jcp-approves-regional-discard-policy. Updated October 22, 2015.
National Academy of Public Administration. Rebooting the Government Printing Office: Keeping America Informed in the Digital Age. https://www.gpo.gov/pdfs/about/GPO_NAPA_Report_FINAL.pdf. January 2013.
Office of the Superintendent of Documents. National Plan for Access to U.S. Government Information: A Framework for a User-Centric Service Approach to Permanent Public Access. http://www.fdlp.gov/file-repository/about-the-fdlp/gpo-projects/national-plan-for-access-to-u-s-government-information/2700-national-plan-for-access-to-u-s-government-information-a-framework-for-a-user-centric-service-approach-to-permanent-public-access. February 2016.
Shari Laster. “Information Sharing and the National Plan.” http://freegovinfo.info/node/10569. Updated November 12, 2015.
—. “One Year Later…What’s Happening with Regionals and Discards?” http://freegovinfo.info/node/10285. Updated September 8, 2015.
GPO’s new Regional Discard Policy and GPO’s recent presentations about it are full of hopeful words and good intentions. We applaud GPO for having good intentions and high hopes, but we question if the Policy can meet those expectations.
- Introduction of the Policy and its Implementation
- One definite Goal. Some questionable objectives
- Preservation of and Access to Paper Copies
- Next steps
Here is what you need to know about the Discard Policy. GPO’s caveats and assurances about the new policy aside, there will no longer be any Regional Depositories for documents more than seven years old. It removes the requirement that there be access paper copies of all documents in the FDLP. It weakens the FDL Program by continuing the shift of responsibility away from FDLP members and toward GPO. It does not increase flexibility (as advocates of the policy claim), it shifts flexibility away from Selectives and gives it to Regionals. It puts new burdens on Selective Depositories. It establishes a new model for the preservation of paper copies of documents that is undocumented, unproven, and risky. It ignores long-term implications in favor of short-term benefits to a few large libraries. It makes GPO’s “guarantee” of long-term, free access to government information nothing more than a hollow promise.
We believe that the Policy actually weakens the FDLP and damages both access and preservation. We believe that the Policy provides no guarantee of meeting those expectations, and will make it more difficult to do so. Below, we explain why.
At the Fall Depository Library Council (DLC) meeting, GPO gave a general outline of how it will proceed to allow Regional Depository Libraries to start discarding paper copy documents.1 GPO has, so far, provided the following information about the Policy itself and how GPO intends to implement the policy:
- Government Publications Authorized for Discard by Regional Depository Libraries [draft policy, 07/09/2014]
- Vance-Cooks, Davita. [letter (July 10, 2015) from GPO to Gregg Harper, Chairman Joint Committee on Printing (JCP) requesting approval of policy to give regional Federal depository libraries the option to withdraw tangible depository materials]. and Harper, Gregg. [letter (August 5, 2015) to GPO] Both documents in one PDF file here.
- Council Session on Discards audio recording and presentation slides (10/20/2015)
For additional background, links, and commentary, see: Information sharing and the National Plan by Shari Laster.
The new policy has only one stated goal: To allow regional depository libraries the option to discard paper copies of government documents.2. To be clear, this is not a substitution of one format for another, like microfiche for paper. Regionals will not be required to uphold their Title 44 obligations to “retain at least one copy of all Government publications either in printed or microfacsimile form.” (44 U.S.C. §1912).
In addition to this specific goal, GPO has expressed a variety of objectives, which it apparently hopes the new policy will help accomplish. But GPO has been both inconsistent and vague in its expression of these objectives and how it will actually implement the policy.3 Six Regional Depositories will participate in a test of the policy in early 2016; presumably, this will produce more implementation details.
Some of GPO’s objectives (such as giving Regionals “the ability to expand their capability to serve the increasing number of remote users” [Vance-Cooks]) can be accomplished without the new policy.
Most of the objectives relate to giving Regionals the “flexibility” to discard paper copies of documents. GPO does not claim that this will have any positive effect for users. On the contrary, GPO acknowledges that regionals that are already relocating tangible collections to offsite storage are impairing the goals of the FDLP.4 GPO implies that Regionals will use resources that will be freed by discarding documents “to focus on the needs” of users of government information [Vance-Cooks]. But GPO does not specify what the resources are, or explain how it expects freed space to be reallocated to services or collections for users of government information, or require any such reallocation. Furthermore, some Regionals have admitted that any savings brought on by this policy will not go toward public service of government information, but will go toward their library’s central operating budget. Since the Policy does nothing to further such objectives, we should not read them as objectives of the Policy but as wishes of GPO.