Home » Articles posted by James A Jacobs

Author Archives: James A Jacobs

Archives

Senate to publish bulk data in XML

Big News!

Senate Joins House In Publishing Legislative Information In Modern Formats, by Daniel Schuman. Congressional Data Coalition (December 18, 2014).

There’s big news from today’s Legislative Branch Bulk Data Task Force meeting. The United States Senate announced it would begin publishing text and summary information for Senate legislation, going back to the 113th Congress, in bulk XML. It would join the House of Representatives, which already does this. Both chambers also expect to have bill status information available online in XML format as well, but a little later on in the year.

There is more good news, too. Read Daniel’s complete report at the link above.

GPO Is Now The Government Publishing Office

Press Release from GPO

GPO Is Now The Government Publishing Office

FOR IMMEDIATE RELEASE: December 17, 2014 No. 14-27

GPO IS NOW THE GOVERNMENT PUBLISHING OFFICE

WASHINGTON – An agency whose mission has been producing, publishing, and recording our Nation’s history has made some history of its own. Section 1301 of H.R. 83, the legislation providing consolidated and further continuing appropriations for FY 2015 that was recently passed by Congress and signed into law last night by President Barack Obama, changes the name of the Government Printing Office to the Government Publishing Office. Publishing reflects the increasingly prominent role that GPO plays in providing access to Government information in digital formats through the agency’s Federal Digital System, apps, eBooks, and related technologies. The information needs of Congress, Federal agencies, and the public have evolved beyond only print and GPO has transformed itself to meet its customers’ needs.

Link to H.R. 83: http://www.gpo.gov/fdsys/pkg/BILLS-113hr83enr/pdf/BILLS-113hr83enr.pdf

“This is a historic day for GPO. Publishing defines a broad range of services that includes print, digital, and future technological advancements. The name Government Publishing Office better reflects the services that GPO currently provides and will provide in the future,” said Davita Vance-Cooks, who now holds the title of Director of the Government Publishing Office, the agency’s chief executive officer. “I appreciate the efforts of the Members of Congress for their support and understanding GPO’s transformation. GPO will continue to meet the information needs of Congress, Federal agencies, and the public and carry out our mission of Keeping America Informed.”

GPO opened its doors on March 4, 1861, the same day Abraham Lincoln was sworn into office as President of the United States. Since that day, GPO employees have produced our country’s most important documents such as the preliminary version of the Emancipation Proclamation, The Warren Commission Report, The 9-11 Commission Report, the U.S. passport, the Federal Budget, and all Congressional materials.

GPO is the Federal Government’s official, digital, secure resource for producing, procuring, cataloging, indexing, authenticating, disseminating, and preserving the official information products of the U.S. Government. The GPO is responsible for the production and distribution of information products and services for all three branches of the Federal Government, including U.S. passports for the Department of State as well as the official publications of Congress, the White House, and other Federal agencies in digital and print formats. GPO provides for permanent public access to Federal Government information at no charge through our Federal Digital System (www.fdsys.gov), partnerships with approximately 1,200 libraries nationwide participating in the Federal Depository Library Program, and our secure online bookstore. For more information, please visit www.gpo.gov

The Official Senate CIA Torture Report

Update


GPO has released an official version of the “THE SENATE CIA REPORT” as Senate Report 113-228. The digital version is available on GPO’s Federal Digital System (FDsys):
http://www.gpo.gov/fdsys/pkg/CRPT-113srpt288/pdf/CRPT-113srpt288.pdf
The print version is available for purchase at GPO’s retail and online bookstore for $29.
http://bookstore.gpo.gov/products/sku/052-071-01571-0

This is a single-volume, 712 page version. It contains:

Letter of Transmittal to Senate from Chairman Feinstein — i
Foreword of Chairman Feinstein — iii
Findings and Conclusions — x
Executive Summary — 1
Additional Views of Senator Rockefeller — 500
Additional Views of Senator Wyden — 503
Additional Views of Senator Udall of Colorado — 506
Additional Views of Senator Heinrich — 510
Additional Views of Senator King — 512
Additional Views of Senator Collins — 515
Minority Views of Vice Chairman Chambliss, Senators Burr, Risch, Coats, Rubio, and Coburn — 520
Minority Views of Senator Coburn, Vice Chairman Chambliss, Senators Burr, Risch, Coats, and Rubio — 678
Minority Views of Senators Risch, Coats, and Rubio — 682

GPO Press Release:

FOR IMMEDIATE RELEASE: December 15, 2014

GPO RELEASES THE OFFICAL DIGITAL & PRINT VERSIONS OF THE SENATE CIA REPORT

WASHINGTON – – The U.S. Government Printing Office (GPO) makes available the official and authentic digital and print versions of the Report of the Senate Select Committee on Intelligence Committee Study of the Central Intelligence Agency’s Detention and Interrogation Program, together with a forward by Chairman Feinstein and Additional and Minority Views (Senate Report 113-288).

This document comprises the declassified Executive Summary and Findings and Conclusions, including declassified additional and minority views. The full classified report will be maintained by the Committee and has been provided to the Executive Branch for dissemination to all relevant agencies.


The release of the Senate’s Study of the CIA’s Detention and Interrogation Program presents some interesting issues for government documents collections.

Issues

There are 3 separate documents and they are easily findable on the web on different web sites, but not all sites have all 3 documents and the the different copies of the individual documents are not the same.

The “official” copies are (at least today) listed on the home page of Senate Committee’s web site [see below)], but are not listed on the Committee’s Publications Page or its Press Release page – perhaps because the report is not an official committee document with an assigned “Document” or “Report” number. Presumably it will not be in FDsys unless or until it gets an official Document or Report designation.

(Why isn’t it “official”? The report was initially intended to be a full committee report. In 2009 the Committee voted 14–1 to initiate the study. But in 2009 Republicans on the Committee withdrew from active participation in the study.)

My speculation is that the different PDF files that you can find on the web are slightly different because each one was produced by scanning a paper copy with different software. I do not know if the Committee only distributed a paper copy but I do know that even its own PDF copy is (apparently) a scanned copy. (You can tell because, if you try to copy the text from the PDF, you will discover that it is badly OCR’d (optical character recognition) text. For example, the digital text of names of Senators is sometimes badly converted: Chambliss becomes “CHAMBUSS” and Rubio becomes “Rvbio”). The official copies were created using Adobe PDF Scan Library 3.1 and ScandAll PRO V2.0.12.

Official Reports and Statements

The Senate Select Committee on Intelligence currently has links to three documents on its home page.

The CIA has its own responses to the report, currently listed on its Reports page.

Other official statements.

Unofficial Copies

A web search for the title of the title (“Committee Study of the Central Intelligence Agency’s Detention and Interrogation Program”) leads to many sites with copies. Many of these are, apparently directly from the Committee site, but at least one news organization (the New York Times) evidently made its own scanned copy and digitized text version of the main report.

  • The New York Times has a PDF copy [108.4MB, 528 pages] and a plain text copy. The PDF version was created using Acrobat 11.0.9 Paper Capture Plug-in and Xerox WorkCentre 5150. Both are stored with an Amazon cloud service.

Timeline

ProPublica has created a useful timeline to put the report in perspective.

FDLP Library Actions

What can FDLP Libraries (or any library) do to ensure that their uses will be able to find and get unaltered, official, copies in the future? Just relying on the web may not be adequate, secure, consistent, transparent, or guaranteed. There are several issues. The existing links to even the official documents may not be stable. The official digital copies are only digital surrogates of the original paper copy. There are already other alternative digital surrogates available. The quality of the surrogates varies and the links to those copies may also not be stable.

I suggest the following actions by libraries:

  • Get copies of the official digital versions directly from the Committee web site as soon as possible (see links above).
  • Create a digital “hash” or “checksum” of the documents you download. (See a list of various tools and a discussion of checksums for preservation, if you are unfamiliar with the concepts.)
  • Catalog your copies and include them in your OPAC or other official library inventory and discovery databases. Include adequate metadata that describes how, when, and where you got your copies.
  • Ideally, you should store your copies in a Trusted Digital Repository. Unfortunately, there are, as yet, very few certified TDRs. Short of that, be sure that you have copies stored in more than one geographic location and that you have a way of verifying over time (using the checksum) that the files you stored have not been altered or corrupted.

Visiting Assistant Professor Uses Public Data to Transform How New Yorkers See Their City

Ben Wellington uses his popular I Quant NY blog to dig into public New York city data posted by city agencies. He digs deep into the the data to post about the city’s filthiest fast-food chains, mapped out how half of Manhattan is within four blocks of a Starbucks and determined which neighborhoods boast the most trees. “His mission for the blog is simple: to change government policy by using open data.”

Some recent posts:

  • Is the NYPD About to Start Ticketing More Cyclists Due to a Mathematical Error?
  • The Hot Spots of New York: A Coverage Map of NYC’s Free WiFi Payphones
  • Colorfully Decoding Manhattan’s Address System
  • Found: The Brooklyn Residence that’s Farthest from the Subway
  • Affordable Housing Without Representation
  • Found: The Manhattan Apartment that’s the Farthest from any Subway
  • You’ll Never Guess the Cleanest Fast Food Joint in NYC
  • Fecal Map NYC: The Worst Places to Swim in the City

Portland Newspaper Uses Census Data for Searchable Map

The Oregonian has collected some high-level data points for census tracts in the Portland metro area and used them to create an interactive searchable map of Portland. Census tracts boundaries are drawn to create areas with a population generally between 1,200 and 8,000, with a target of 4,000.