Home » Search results for 'node/"Google Books"'

Search Results for: node/"Google Books"

Our mission

Free Government Information (FGI) is a place for initiating dialogue and building consensus among the various players (libraries, government agencies, non-profit organizations, researchers, journalists, etc.) who have a stake in the preservation of and perpetual free access to government information. FGI promotes free government information through collaboration, education, advocacy and research.

Creating Gov Doc “Libraries” in Google Books

Digitized Government Documents in Google Books has been written about quite a lot over here at the FGI (check them out) and I’d like to revisit this topic again but with a different focus.

I was searching for Civil War era government documents for a History Professor, and I realized that we did not own one of the documents he sought. Before suggesting that he interlibrary loan a copy of this document, I decided to search online for a full-text digitized version. Alas, it did not exist in the digital realm, but I did find some other digitized gov docs pertaining to his research needs in Google Books. We were both elated, he because I had found what he needed, and I because so many documents I found digitized on Google Books were the same documents we had lost to mold and water damage from Hurricane Rita!

Out of curiosity, I did a Google Book search for other types of government publications and found these gems:

Trial of the Conspirators, for the Assassination of President Lincoln

Illustrations of the Gross Morbid Anatomy of the Brain in the Insane (isn’t that a Cypress Hill song? Nevermind…) by the Government Hospital for the Insane.

How it Feels to be the Husband of a Suffragette
(not published by the Government Printing Office, but it is a book housed in the National American Woman Suffrage Association Collection in the Library of Congress).

Official Records of the Union and Confederate Navies in the War of the Rebellion

Most of these documents were scanned at large research universities or depositories, but the quality is not always decent andcan sometimes border on the illegible. I was quite amused when I discovered a staff person’s hand digitized on this document’s cover:

However, there are bigger snafus than a digitized librarian’s hand. For example, despite government documents being in the public domain, Google Books treats most post-1922 (i.e. post-copyright law) government documents as copyrighted material by only allowing a limited view! For more details, please read James Jacobs’ post on this issue.

Despite all these issues (which have yet to be resolved), I decided to take advantage of the access to full-text, pre-1922 government documents and create a McNeese Gov Docs “Library”account in Google Books for my depository. The account also allows you to subscribe to updates of its holdings via an RSS feed. I put a link to the library account and the RSS feed on my depository’s homepage and our “Gov Guides” wiki. I’ll add more of these interesting and old documents as I come across them, especially those pertaining to Louisiana or documents that were lost to Hurricane Rita.

Here are some tips for finding gov docs in Google Books: Use Advanced Search, and in the Publisher field, type in Govt OR GPO OR “Government Printing Office”. You can also search by agency, (i.e. “Department of the Interior”) by typing the name of the agency in the Author field.

Have fun exploring and building your own digital collections, but please let me know if you find some really cool gov docs, ok?

New Feature: Google Books/Fed Docs

FGI is pleased to announce a new occasional series that will examine how Google Books treats US Federal Documents. These posts will have titles that begin with "Google Books/Fed Docs".

We’re very pleased to have a guest researcher putting up these posting. Please give a warm FGI welcome to Julia Tryon, Government Documents Librarian of the Phillips Memorial Library at Providence College. Julia has started to gather statistics and other information about the tens of thousands of government documents that have been scanned by the Google Books projects. We posted her first govdoc-l message on this subject.

Julia has agreed to start blogging on this subject for FGI. Unlike our "Blogger of the Month" series, Julia will post whenever she finds something interesting at the intersection of US Federal Documents and Google Books until she feels that she’s exhausted the subject.

Take it away Julia!

Do libraries need more shelving? Isn’t everything digital?

We here at FGI have been making the argument against the destruction of physical collections in connection with digitization efforts for a long time (see e.g., Wait! Don’t Digitize and Discard! A White Paper on ALA COL Discussion Issue #1a and What You Need to Know about the New Discard Policy). So it’s nice to hear the same argument from Jeff MacKie-Mason, recently hired University Librarian and Chief Digital Scholarship Officer at UC Berkeley on his blog madLibbing: Muddling Along in the Information Age. Mackie-Mason clearly and succinctly points out the reasons that libraries still need physical collections: many digitized works are still in copyright and their digital surrogates are therefore not shareable online, print copies are easier to read with higher comprehension rates, there is “little or no confidence that we can guarantee long-term digital preservation” (emphasis his!), and current digital surrogates from large digitization projects are less than complete (we’ve pointed this out repeatedly e.g., in “‘An alarmingly casual indifference to accuracy and authenticity.’ What we know about digital surrogates.”). So we hope the next time your library weeds a government document under the assumption that it’s online, you’ll check the digital surrogate for completeness and at least start the discussion with your administrators about the need for a local digital archive to assure the preservation of the digital surrogate that you’re about to weed. It could mean the difference between access and frustration for your user community.

One huge misconception we face is that digitizing our collections means we don’t need the print anymore. For example, we are participants in the Google Books / HathiTrust project, and most of our 11 million regular volumes have been digitized.  Why not burn our print copies?

  1. For starters, about half of the collection is still in copyright. The HathiTrust collection can be searched, full-text, to find the existence of books, but we are not allowed to let people use the digital copy (with limited exceptions, e.g., for the blind, who can listen to a text-to-voice conversion). Decades before this need for our print copies goes away.
  2. Second, we are here not to build collections for their own sake, but to serve our faculty and students. And many of them vastly prefer doing their work from print copies. Those who read long monographs find it easier and their comprehension higher. Those who need to study large images or maps, in high resolution, or who want to see side-by-side page comparisons, need the print. And for many rare and historical documents, the materiality of the original document itself is of enormous importance for scholarship, from the marginal annotations to the construction of the volume.
  3. Next, we can have little or no confidence that we can guarantee long-term digital preservation. Digital storage has been around a relatively short time  In that time, formats change frequently.  Hardware and software to render digital formats changes. Bits on storage media rot.  Keeping bits and being able to find and access them in the future requires large annual expenditures, and those expenditures are getting larger as the amount of content we want to preserve grows enormously fast. Further, much of scholarly content currently is held on servers of for-profit companies, and we have no guarantee those companies will survive, or that they will take care to ensure that their archives of scholarly publications survive.
  4. The Google project has been very good, but it is not complete.  It does not scan fold-out pages, for example, which are in many scholarly books (maps, charts, tables).  We have discovered that sometimes they miss pages, or the quality is not readable.

So, for now, there is pretty much consensus among research scholars and librarians that we must keep print copies for preservation in all cases, and for continuing use in many cases.

via Do libraries need more shelving? Isn’t everything digital? – madLibbing.

COL’s FDLP Task Force Survey and FGI’s responses

[updated 4pm 1/7/14. I clarified a couple of statements. JRJ]

Last summer, the ALA Committee on Legislation (COL), Federal Depository Library Program (FDLP) Task Force released its FDLP report and recommendations. In response, FGI wrote a white paper “Wait! Don’t Digitize and Discard! A White Paper on ALA COL Discussion Issue #1a”. While agreeing with much of the report’s recommendations (avoid duplicative efforts, digital deposit etc), we took issue with some points and tried to unpack and give context to the problems and assumptions put forth by the task force.

COL asked the FDLP task force to continue for an additional year and their first order of business is an FDLP survey to gather input and feedback to “outline a process for ALA to bring together diverse opinions and to guide the Committee in its future consideration of policies in relationship to the FDLP.” While the survey was sent out to ALA divisions and round tables for official feedback, I think it important that ALL librarians with an interest in government information submit answers to the FDLP survey. Please submit your survey responses by February 14, 2014.

Also, please consider attending the COL meeting at ALA Midwinter on Saturday, January 25 from 10:30-11:30am in the Convention Center, room 107B.

Lastly, in the interest of public discussion, I thought our readers would be interested in seeing the 20 survey questions beforehand and our survey responses. Below are all the questions (starting with #3 as questions 1-2 are demographic) in bold as well as our submitted answers.

A comprehensive preservation plan includes digital documents supplemented with preserved tangible collections with a yet-to-be-determined number of full print collections, in controlled environments and in geographically dispersed locations. — NAPA report “Rebooting the Government Printing Office: Keeping America Informed in the Digital Age”

Identification of Materials – in order to implement a preservation plan, it will be critical to outline a process for identification and processing on the national level. This collaboration will require the broad participation of libraries, commercial and not for profit organizations, agencies, and associations.

3. Do all tangible materials within the FDLP need to be preserved?


I worry that this question is ambiguous and will result in a variety of answers which could too easily lead to misleading interpretations. To try to get around the ambiguity, let me say that I believe that there is no FDLP information that should be discarded or abandoned.

The community might well want to identify specific copies or editions or versions or formats of any given specific information content that need no longer be preserved because the information content is being approrpriately preserved. But, to do that, we need an accurate accounting of what information content exists, how many copies we have, the physical state of such materials, etc.

I would suggest that format (“tangible” or other) is not a useful criterion for selecting materials for preservation or discard. I would also suggest that information that exists only in paper copy should be preserved and, further, that every such work (edition?) should be preserved.

4. Realizing that not everything can be preserved immediately, what should be the process for determining the priority plan?

Is this question about “digitization” for preservation? Paper collections are being preserved now under long-time FDLP rules and procedures. If we’re talking about actual preservation, born digital should have priority.

I think this question may be confusing and conflating “digitization” with “preservation” and “historic” with “born-digital.”

Digitization of historic/paper publications, while providing better access, does not necessarily contribute to their preservation and digitization for preservation does not necessarily guarantee better (or any) access. For example, many publications that are scanned — e.g., those going through the google books project — are disbinded and destroyed without any guarantee that the digitization process has necessarily accurately or completely preserved the original content. Further, “digitization” encompasses a wide range of activities and digitizations may or may not meet quality standards for long-term preservation and use.

Also, many born-digital federal documents are arguably *more* in need of preservation action than paper documents — since paper documents are ostensibly already being preserved via the FDLP. The reason for this is that there is a single steward (such as a government agency) that has sole responsibility for preservation and access of those files and the files are therefore at risk of intentional or unintentional, poltical or bureaucratic or financial decisions that will lose or alter or discard that information.

I believe that any preservation plan and prioritization policy needs to have in place a process and framework for preserving an adequate number of physical paper copies as well as a process and framework for collecting, describing and preserving digitized paper and born-digital publications.

As for prioritizing what to digitize (in a NON-destructive manner!), I would give higher priority to those publications that have NOT been widely distributed or have not received any or adequate cataloging.

I would suggest that we should develop a number of criteria for determining priorities, not just a single criterion. For example, we might want to identify different categories of paper materials for digitization: some (plain text in a uniform format on good paper with clear print) are easier (and cheaper) to digitize more accurately, some (statistial publications, odd format publications, color and illustrated publications, older publications with thin paper/bleed-through print/etc., publications with non-uniform layout and fonts, etc.) are harder (and more expensive) to digitize accurately and completely. Setting priorities for diitization could take such a categorization into account along with other factors (condition of the paper copy, completeness and acuracy of existing metadata, known number of complete copies in paper and known number of complete paper copies needed for preservation-of-the-content (as opposed to simple access to the content), known number of paper copies needed for access, decisions about intent of digitizations (access, preservation, image-only or image+text, or image+text+reformatting (e.g. xml, tei, epub…?)…

Please read the following for more on these issues:

5. Who should be involved in preserving FDLP materials? GPO, FDLP libraries, commercial, and/or not for profit organizations?

GPO and FDLP libraries — with assistance from non-profits like the Internet Archive (which actually has official status as a library!) — should be the primary actors in any plan for preserving public domain government publications. Any digitization plan MUST include an agreement that digitizations will be freely and publicly available without subscription or fee and in DRM-free formats.

While commercial outfits may have experience in this area and may be consulted, I would strongly recommend against including commercial entities in any long-term preservation plan. Not only are commercial entities disinclined to do anything that does not support their bottom line, it goes against the spirit of the FDLP for public domain materials to be taken out of the public domain and only made available in subscription databases and/or commercial products. There is already ample evidence of private companies contracting with federal agencies to digitize content that is then privatized and taken out of the public domain to the detriment of the public who owns that information. See:

Preservation Methods – the processes for digitization and preservation are varied and some digitization is not necessarily preservation. There are a variety of projects that may contribute to a national preservation plan. The Task Force report affirms that there should be multiple locations and geographical distribution.

6. Is digitization a preservation standard or does it serve as a discovery/access resource or both?

“Digitization” is not a standard but a generic term that encompasses many different processes and procedures and can result in many different results of varying quality and suitability for different purposes. Even poor quality digitizations can be preserved, but that does not qualify them as “digital preservation” of the original information. “Digitization” is also only one step (the first step: creation) of a number of steps that would need to be taken (ingest, storage, data management, preservation and preservation planning, discovery/access/delivery, and service) in order to either provide access to or preservation of the digital objects created by any given digitization process. “Digitization” is, therefore not a useful concept on its own to address preservation or discovery or access.

To better address this question one needs to specify how an item will be digitized and for what purposes it will be digitized and develop an evaluation of the process to be used to determine if the output of the digitization meets the requirements.

Typically, today, most digitization projects (particularly large-scale projects) aim to provide access (not preservation) to page-images of the original books. Those scans — e.g., google books — offer pretty good (but not great) discovery (try and find a specific volume of any digitized serial in GBP and you’ll see what I mean about “pretty good” discovery), but many of the scans are of poor quality, with inaccurate or no OCR, blurred images and missing pages etc.

As we look to digitization as a process, we should evaluate what we want from that process and develop projects that match those goals. We should develop more projects that aim higher than simple access to digital images that are no better than (and, in some cases, not as good as) the original books. We should develop projects that would envision the possibilities of digital information, not just pictures of static information. This would include digitization that would enable reformatting the content for current and future devices and uses. All digital objects are born-digital objects. Perhaps most importantly, treating digitization of paper as no more than a surrogate for the original paper with no more functionality than the original is short-sighted at best and destructive at worst.

As noted earlier, there are various and varied levels of digitization. Even publications which are digitized to the highest current *digitization* standards should not necessarily be relied on as a copy of last resort. Without adequate attention to the unique qualities of individual documents, the digitizaiton may not fit the needs of all users and some users will continue to need access to paper publications. For example, the images of large size and color documents and documents with maps, tabular data and inserts may be less-usable than their paper originals.

7. Should there be different standards for copies – more rigorous for congressional materials and less for pamphlets for example?

NO. decisions about use of standards should not be made based on format of the original. Such a choice would imply that all pamphlets are less important to everyone forever than any congressional bound volumes, for example. this would be making an unwarranted judgement on the quality of content based on format and an unfounded judgement on the value of the content to unspecified users of the future.

Digitizations of paper should be undertaken to address needs of user communities of the future as well as the present. Short-term cost savings should not drive library decisions if it impedes long-term access, preservation, or usability of the information content. It is reasonable to assume that any publication that is worth scanning should be scanned at the highest quality standards that will lessen the likelihood of its needing to be re-scanned as digitization technologies continue to get better and user needs evolve.

8. What is the role of Regional FDLP libraries in preservation centers?

Regional FDLP libraries should be seen as the first best option in any preservation plan. Geographic distribution of both paper and digitized/born-digital publications will continue to be a necessary part of any plan going forward and regionals are best equipped to offer those services since they’re already set up and working. ALL regional libraries should be required to participate in or designate one library in their region to participate in LOCKSS-USDOCS as part of their depository responsibilities. This would fall under the current FDLP shared housing agreement concept.

Trusted Partners – the FDLP has a partnership program and the Task Force report notes that partnerships could be a critical component of a national preservation plan.

9. What are the qualifications of a trusted partner?

A trusted partner, at least in terms of digital preservation, is one that is built on OAIS principles, has a succession plan in place, and is not driven primarily by the profit motive. Consortia and other library-centric organizations should be seen as trusted partners as long as preservation AND free public access are inherent parts of their missions.

10. Can commercial and not for profit entities be considered a trusted partner?

Commercial entities will probably disqualify themselves as trusted partners if adqueate definitions of responsibility are in place; trusted partners should provide long-term, free public access and have a succession plan in place to describe what happens if they ever choose to break the partnership. So, in general, commerical entities will probably NOT be relied on as trusted partners as their missions are, by law, motivated first by profit rather than by public access, public service or information preservation. Non-profit entities can be trusted partners as long as preservation and public/free access are inherent parts of their missions.

11. What current initiatives exist that can contribute to partnerships? (LOCKSS and other initiatives).

FDLP libraries themselves, LOCKSS-USDOCS, Internet Archive, ASERL’s COE libraries, library consortia…

Registry and Identification – the FDLP has initiated a registry for digitization (http://registry.fdlp.gov) and this might be the basis for a preservation plan. The Task Force notes that cataloging tangible and online materials is still a critical component for any national efforts in discovering and accessing FDLP and other government information.

12. How should individual cataloging efforts be coordinated?

via GPO and the catalog of government publications (CGP).

13. How should commercial entities be incorporated with library efforts?

Commercial entities provide a useful and welcome *complement* to free public access entities — but they should never be seen as a substitute or replacement for free public preservation, access, and service.

Commercial entities should be encouraged to donate metadata toward the national registry and/or digitization projects. They should also be encouraged to deposit their content in collaborative archival services like LOCKSS-USDOCS for safekeeping.

Any national catalog (OCLC, CGP) which has links to publications in subscription services (e.g., Proquest Congressional database) should also include links to freely available digital copies.

14. What additional cataloging/identification projects exist that might contribute to a national effort?

Hathitrust registry of US federal government publications, ASERL collections of excellence, Internet Archive digitization efforts (be aware that IA has a complete set of historic Congressional publications (serial set, Congressional record, hearings etc) garnered from the N&O list a few years ago. They are just waiting for funding to digitize).

Broadening Expertise – in a distributed, electronic world of information, FDLP libraries are able to assist non-FDLP libraries and FDLP resources are more integrated with commercial information resources. The Task Force considers this an opportunity and challenge that will impact librarians and library workers regardless of type of library.

15. What are the professional development needs for librarians and library workers who may utilize FDLP information?

This growing idea that “all librarians are now documents librarians” really bothers me. It makes for a good bumper sticker, but are “all librarians engineering librarians”? Government information is a very specific area within LIS — which happens to touch on many subjects and disciplines — with specific and iteratively-built skill sets and knowledge base. Having a government information librarian on staff is critical to a library’s success in serving it’s community. Just as we shouldn’t expect all librarians to have in-depth knowledge of every subject and discipline, and shouldn’t expect every librarian to be a cataloger, and we shouldn’t expect all librarians to have in-depth knowledge of the workings of government and its information resources. At the same time, I have heard that a disturbingly large number of LIS programs are deprecating if not completely doing away with their government information curricula.

With that in mind, the documents community should first survey LIS programs to see what’s being taught, what are the requirements, what are the % of students taking government information courses, and whether or not LIS programs are *using* government information in their classes (no copyright!) for digitization, digital and physical preservation, indexing/discovery, text-mining, etc?

The documents community — as well as ALA as the accrediting organization! — then needs to create a model curriculum and require that ALL MSLIS programs have courses on government information to provide all librarians with basic familiarity about the FDLP program itself, FDSys and the FDLP core collection as well as basic knowledge of government information resources and collections at all levels of government at a minimum.

Within the documents community, there needs to be continuing education opportunities — that are open to ALL librarians — but for librarians to expand their govt information expertise and broaden their technological skill sets so that they’ll have at least a basic understanding of digitization, digital collection development, and other technologies to help them do their work in serving their communities. There needs to be more of “accidental government information librarian” webinars, but also in-person workshops similar to ICPSR’s 5-day workshops on data services.

Library administrations also need to be more supportive of the need for govt information librarians to travel to conferences (GODORT, DLC, etc) as that is where we learn from our colleagues and move the entire field forward.

16. How can expertise be spread to all librarians beyond FDLP designated librarians?

Every FDLP library ought to:

–reach out and make contact with other FDLP- and non-FDLP libraries in their area.
–Arrange viewings of GPO trainings for libraries in their city. There have been some very good ones.
–Consider site visits or virtual office hours for library staff at their institutions and around their cities.

Depositories ought to blog their reference questions. GIO chat service (http://govtinfo.org) should do that as well. This “seeds the cloud” and allows librarians and the public to more easily find government information resources.

Documents librarians should put in proposals to their state conferences.

Local GODORT chapters ought to consider emulating North Carolina’s Accidental Docs Librarian webinar series.

Government information librarians should have ongoing workshops within their own libraries.

17. How can core competencies related to government information be developed for all librarians?

See 15 and 16.

I believe GODORT is already working on core competencies within the GODORT Education Committee. The 21st Century Government Information initiative on WebJunction might be another place to look for core competencies.

The American Library Association has a vested interest in the development of skills, services, and advancement of the FDLP program. ALA’s role is to assist and support librarians and library workers who work with government information. ALA’s expertise contributes to national discussions and government policies and ALA can provide assistance in bringing together a variety of partners to advance a common goal.

18. How can ALA assist in the development of an FDLP preservation plan?

Adopt the Digital Surrogate Seal of Approval (DSSOA) and encourage individual libraries to do the same for their digitization projects. ALA can facilitate a national discussion and inventory of government documents. ALA can also advocate for a “government information librarian in every library” as a way to further the goals of a preservation plan as well as ongoing collection development and public service to library communities of all shapes and sizes.

19. How can ALA work with other association and entities to advance an FDLP preservation plan?

Lobby Congress for appropriate levels of funding and against long-term privatization of digital access. No cost should be for taxpayers, NOT agencies. Reach out to other organizations on the need for an FDLP preservation plan. Argue strongly for the continuing need for both local collections AND government information librarians in every library. Also advocate inclusion of historic government publications in consortial shared storage projects like the Western Regional Storage Trust (WEST).

20. What future actions should ALA pursue to advance an FDLP national preservation plan?

See #19. Sponsor (or help produce sponsorable projects) to investigate the digitization challenges of a heterogeneous collection such as FDLP’s and investigate the preservation, access, and usability requirements for the long term for such collections.

Accept that digitization is probably not the best approach for PRESERVING tangible documents. Consider microfilming or geographically dispersed high density storage facilities of last resort. ALA should prioritize preservation measures for born digital materials which seem to be decaying quickly through link rot.