Home » post

Category Archives: post

Archives

DOJ Gov Doc: Investigation of the Ferguson Police Department [March 4, 2015]

Report Summary:

…Ferguson’s law enforcement practices are shaped by the City’s focus on revenue rather than by public safety needs.  This emphasis on revenue has compromised the institutional character of Ferguson’s police department, contributing to a pattern of unconstitutional policing, and has also shaped its municipal court, leading to procedures that raise due process concerns and inflict unnecessary harm on members of the Ferguson community.  Further, Ferguson’s police and municipal court practices both reflect and exacerbate existing racial bias, including racial stereotypes. Ferguson’s own data establish clear racial disparities that adversely impact African Americans.  The evidence shows that discriminatory intent is part of the reason for these disparities.  Over time, Ferguson’s police and municipal court practices have sown deep mistrust between parts of the community and the police department, undermining law enforcement legitimacy among African Americans in particular.

full report

via GISIG: gov-info.tumblr.com
 

GPO Observes 154th Birthday With New Name, New Logo

logo_5415

GPO is observing its 154th anniversary with a new logo. Here is the press release:

FOR IMMEDIATE RELEASE: March 4, 2015 No. 15-04

GPO OBSERVES 154th BIRTHDAY WITH NEW NAME, NEW LOGO

WASHINGTON – The U.S. Government Publishing Office (GPO) marks its 154th anniversary of opening for business today. Since March 4, 1861, GPO has seen many changes as the agency continually adapted to changing technologies. In the ink-on-paper era, this meant moving from hand-set to machine typesetting, from slower to high-speed presses, and from hand to automated bookbinding. While these changes were significant for their time, they pale by comparison with the transformation that accompanied GPO’s adoption of electronic information technologies, which began over 50 years ago with a plan to develop of a new system of computer-based typesetting. By the early 1980’s this system had completely supplanted machine-based hot metal typesetting. By the early 1990’s, the databases generated by GPO’s typesetting system were uploaded to the Internet via the agency’s first Web site, GPO Access, vastly expanding the agency’s information dissemination capabilities. Those functions continue today with GPO’s Federal Digital System (FDsys, at www.fdsys.gov ) on a more complex and comprehensive scale, which last year registered its one billionth document download.

As a result of these sweeping technology changes, GPO is now fundamentally different from what it was as recently as a generation ago. It is smaller, leaner, and equipped with digital production capabilities that are the bedrock of the information systems relied upon daily by Congress, Federal agencies, and the public to ensure open and transparent Government in the digital era. As GPO Director Davita Vance-Cooks has pointed out, GPO is not just for printing anymore. Late last year, Congress and the President recognized GPO’s technology transformation by changing the agency’s name to the Government Publishing Office.

GPO’s new name provides an opportunity to introduce a new, modern logo representative of the 21st century. Based on the Lubalin Graph typeface, the G forms an arrow pointing forward, showing the direction the agency is moving. The arrow points to the P, which stands for publishing and conveys the significance of the communication services GPO provides today. The new logo will be phased in throughout the agency.

New logo: http://www.gpo.gov/newsroom-media/logo.htm

“GPO is on the move as a publishing operation. With publishing as our new middle name, GPO is offering a broad range of products and services to Federal agencies, ranging from conventional print to digital apps, ebooks, and bulk data downloads,” said GPO Director Davita Vance-Cooks. “In our mission to Keep American Informed, we will continue to adapt to the new technologies that the Government and the public have come to expect from us.”

GPO is the Federal Government’s official, digital, secure resource for producing, procuring, cataloging, indexing, authenticating, disseminating, and preserving the official information products of the U.S. Government. The GPO is responsible for the production and distribution of information products and services for all three branches of the Federal Government, including U.S. passports for the Department of State as well as the official publications of Congress, the White House, and other Federal agencies in digital and print formats. GPO provides for permanent public access to Federal Government information at no charge through our Federal Digital System (www.fdsys.gov), partnerships with approximately 1,200 libraries nationwide participating in the Federal Depository Library Program, and our secure online bookstore. For more information, please visit www.gpo.gov

Docs of the week: Ferguson Grand Jury, 100 years of INS annual reports, and the historic Moynihan Report

Hands Up Don't Shoot Ferguson protests

by Flickr user LightBrigading used w permission. Creative Commons BY-NC-2.0 license

Here at Stanford libraries, my colleague Kris Kasianovitz and I are busy putting context to the *massive* haystack that is the Internet — and we could use some help (want to be a lostdocs collector?!)! Below are just a few of the documents we’ve collected in the last week, stored in our Stanford Digital Repository and made accessible through our library catalog.

1)The Negro family, the case for national action AKA the Moynihan Report. This document came to me from a recent New Yorker article “Don’t Be Like That: Does black culture need to be reformed?” by Kelefa Sanneh. The article, a book review of a new anthology called “The Cultural Matrix: Understanding Black Youth,” contextualized the sociology and cultural history of being black in America, describing in detail the ground-breaking work of Daniel Patrick Moynihan, trained as a sociologist and well known later as the liberal Senator from NY. As Sanneh notes, the Moynihan Report — which was originally printed in a run of 100 with 99 of them locked in a vault — was leaked to the press causing the Johnson administration to release the entire document. Moynihan’s overarching theme was “the deterioration of the Negro family” and he called for a national program to “strengthen the Negro family.”

2) Annual Report of the Immigration and Naturalization Service. This one started out as a research consultation. A student wanted to analyze this report over the 100+ years that it’s been published. She found that the Immigration and Naturalization Service had digitized their historic run, but for some reason had taken the link down from their site and not restored it for over 2 weeks. I contacted INS and got the digitized documents restored, then downloaded them, deposited them in SDR and had the purl added to our bibliographic record. The added benefit to collecting this digital annual report is that it makes it easier for future users to access this important annual report chock full of important statistics — our paper collection is shelved in several different areas of the US documents collection as INS has shifted around over the years (causing its call# to change over time) among different agencies from Treasury (call# T21.1:) to Labor (call# L3.1: and L6.1:) to Justice (call# J21.1:) to Homeland Security (call# HS4.200).

3) Documents from the Ferguson Grand Jury. Ferguson has been in the news over the last year because of the fatal shooting of African American youth Michael brown by police officer Darren Wilson and the ensuing protests it sparked. This important historic series of 105 Missouri state documents from the Grand Jury were released via Freedom of Information requests from CNN. Some of our government information colleagues around the country wondered online how to collect and preserve these documents for posterity and future researchers. Luckily, SUL is one library able to collect and preserve historically important born-digital government documents.

The overwhelming majority of state, local, US and international government documents these days are born-digital. Here at Stanford libraries, we continue to look for ways to maintain and expand both our historic and born-digital documents collections. Self-deposit will no doubt be one strategy among several (including Web archiving, LOCKSS and future initiatives) as we look to serve the information needs of citizens, faculty, students and researchers.

“An alarmingly casual indifference to accuracy and authenticity.” What we know about digital surrogates

In a new article in Portal, Diana Kichuk examines the reliability and accuracy of digital text extracted from printed books in five digital libraries: the Internet Archive, Project Gutenberg, the HathiTrust, Google Books, and the Digital Public Library of America. She focuses particularly on the accuracy and utility of the digital text for reading in e-book formats and on the accuracy of metadata derived from extracted text.

This study, along with a couple of others cited below, are very relevant to the repeated calls by some within the Federal Depository Library Program to digitize and discard the historic FDLP paper collections. These studies, even though they do not focus on government publications, provide examples, data, and standards that should be critical to review before the depository community implements discarding policies that will have irreversible effects.

* * *

Kichuk’s article is well worth reading in its entirety as she identifies many problems with digital text created during digitization of paper books by OCR (Optical Character Recognition) technologies, and she gives specific examples. The two most important problems that she highlights are that digitized texts often fail to accurately represent the original, and that the metadata that is automatically created from such text is too often woefully inaccurate. These problems have real effects on libraries and library users. Readers will find it difficult to accurately identify and even find the books they are looking for in digital libraries and libraries will find it difficult to confidently attribute authenticity and provenance to digitized books.

Kichuk says that digitized text versions of print books are often unrecognizable as surrogates for the print book and it may be “misleading at best” to refer to them even as “equivalent” to the original. Although she only examined a small number of e-books (approximately seventy-five), she found “abundant evidence” of OCR problems that suggest to her the likelihood of widespread and endemic problems.

A 2012 report by the HathiTrust Research Center reinforces Kichuk’s findings. That study found that 84.9 percent of the volumes it examined had one or more OCR errors, 11% of the pages had one or more errors, and the average number of errors per volume was 156 (HathiTrust, Update on February 2012 Activities March 9, 2012).

* * *

Most of the examples we have of current-generation digitization projects, particularly mass-digitization projects, provide access to digital “page images” (essentially pictures of pages) of books in addition to OCR’d digital text. So, to get a more complete picture of the state of digitization it is instructive to compare Kichuk’s study of OCR’d text to a study by Paul Conway of page images in the HathiTrust.

Fully one-quarter of the 1000 volumes examined by Conway contained at least one page image whose content was “unreadable.” Only 64.9% of the volumes examined were considered accurate and complete enough to be considered “reliably intelligible surrogates.” Presumably, that means more than 35% of the volumes examined were not reliable surrogates.

Conway’s study reinforces the findings of the Center for Research Libraries when it certified HathiTrust as a Trusted Digital Repository in 2011. (Full disclosure: I was part of the team that audited HT.) CRL said explicitly that, although some libraries will want to discard print copies of books that are in HT, “the quality assurance measures for HathiTrust digital content do not yet support this goal.”

Currently, and despite significant efforts to identify and correct systemic problems in digitization, HathiTrust only attests to the integrity of the transferred file, and not to the completeness of the original digitization effort. This may impact institutions’ workflow for print archiving and divestiture. (Certification Report on the HathiTrust Digital Repository).

* * *

Together, these reports provide some solid (if preliminary) data which should help libraries make informed decisions. Specifically, all these studies show that it would be risky to use digitized copies of FDLP historic collections as reliable surrogates for the original paper copies. That means it would be risky to discard original paper copies of documents simply because they had been digitized.

Although Conway suggests, as others have, that libraries (and users) may have to accept incomplete, inaccurate page images as a “new norm” and accept that they are not faithful copies, he also realizes that “questions remain about the advisability of withdrawing from libraries the hard-copy original volumes that are the sources of the surrogates.”

Kichuk goes further in her conclusions. She wisely envisions that the “uncorrected, often unreadable, raw OCR text” that most mass-digitization projects produce today, will be inadequate for future, more sophisticated uses. She looks specifically to a future when users will want and expect ebooks created from digitized text. She warns that current digitization standards, coupled with insufficient funding, are not creating text that is accurate or complete enough to meet the needs of users in the near future. And she recognizes that librarians are not stepping up to correct this situation. She describes “an alarmingly casual indifference to accuracy and authenticity” of OCR’d text and says that this “willful blindness” to the OCR problem is suppressing any sense of urgency to remedy the problem.

She concludes from her small sample that there should be a more systematic review by the digital repository community prior to the development of a new digitized e-book standard, especially for metadata and text file formats.

I agree with Kichuk and Conway and CRL that more work needs to be done before libraries discard their paper collections. Librarians and their communities need to have a better understanding of the quality of page images and digitized text that digitization projects produce. With that in mind, James R. Jacobs and I addressed this very problem in 2013 and suggested a new standard for the quality of page images — which we call the “digital Surrogate Seal of Approval” (DSSOA)) in 2013:

Libraries that are concerned about their future and their role in the information ecosystem should look to the future needs of users when evaluating digitization projects.

FDLP libraries have a special obligation to the country to preserve the historic collections in their charge. It would be irresponsible to discard the complete, original record of our democracy and preserve only an incomplete, inaccurate record it.

Beyond the Numbers

Document of the day: Beyond the Numbers at the Bureau of Labor Statistics.

The wonderful Scout Report a the University of Wisconsin-Madison, highlighted this gem in its most recent newsletter. The Scout’s description says it all:

For readers who love stats and facts, Beyond the Numbers, which is published biweekly by the Bureau of Labor Statistics, will provide hours of fresh insights on a range of topics. The home page always features the latest update, as well as three recent articles (available in PDF format), such as “Understanding health plan types: What’s in a name?” However, the real meat of the site can be found by browsing the Archive, which takes readers to topics dating all the way back to 1996 when the feature was first published. The archives can be browsed in chronological order. They can also be searched utilizing seven distinct themes, including employment & unemployment, global economy, regional economics, and others.

You can subscribe to the Scout Report here.