Home » Doc of the day

Category Archives: Doc of the day

Our mission

Free Government Information (FGI) is a place for initiating dialogue and building consensus among the various players (libraries, government agencies, non-profit organizations, researchers, journalists, etc.) who have a stake in the preservation of and perpetual free access to government information. FGI promotes free government information through collaboration, education, advocacy and research.

Data is plural newsletter posts 2 amazing govinfo datasets: House Comm witnesses and 1900 census immigrant populations

I love my Data Is Plural newsletter, Jeremy Singer-Vine’s weekly newsletter of useful/curious datasets! You can check out his archive from 2015(!) to present and also explore the archive as a google spreadsheet or as Markdown files (a dataset of interesting datasets :-)).

Today’s edition was especially good on the govinfo front: 2 really awesome datasets on House Committee witnesses (1971 – 2016) and a high-resolution transcription and CSV file of the Census Bureau’s 1900 report on immigrant populations. Check them out, and don’t forget to subscribe to the Data is plural newsletter!

House committee witnesses. Political scientists Lauren C. Bell and J.D. Rackey have compiled a spreadsheet of 435,000+ people testifying before the US House of Representatives from 1971 to 2016. They began with a text file scraped from a ProQuest database, provided by the authors of a dataset that focused on social scientists’ testimony (DIP 2020.12.23). Then, they determined each witness’s first and last name; type of organization; the committee, date, title, and summary of the relevant hearing; and more.

Immigrant populations in 1900. The 1900 US Census’s public report includes a table counting the foreign-born residents of each state and territory — overall and disaggregated into a few dozen origins, which range from subdivisions of countries (Poland is split into “Austrian,” “German,” “Russian,” and “unknown” columns) to entire continents (“Africa”). It’s officially available as a low-resolution PDF. Reporters at Stacker, however, recently transcribed it into a CSV file for easier use.

Integrated 2020 Redistricting Data (PL94-171) from CISER

Thanks to the Cornell Center for Social Sciences (nee CISER) for posting the Census bureau’s Integrated 2020 Redistricting Data (PL94-171).

“On August 12, 2021, the Census Bureau released the Public Law 94-171 data, also known as Redistricting Data, in four (4) parts per state. Users who want to have the complete redistricting dataset for a state in one file have to integrate these four parts of the Census Bureau files.

We’ve integrated the four parts and made them available in convenient ready-to-use formats — SAS, SPSS, STATA, and CSV. We’ve also made available SAS, STATA, and SPSS programs to read the CSV files, label the variables, and assign variables their correct type (as per the data dictionary).”

DCinbox: amazing collection of Congressional e-newsletters

As many of our readers know, government information includes critical but often “grey” or ephemeral information including communications between our elected officials and their constituents. Here’s a very cool project called DCinbox, a database of Congressional e-newsletters. Lindsey Cormack, professor of politic at Stevens Institute of Technology, has been collecting Congressional e-newsletters since 2009. There are nearly 90,000 unique e-newsletters in the database — which is both searchable and available as a full dataset! This is a rich dataset that can help analyze partisan differences and ideology in all kinds of policy matters.

Congressional e-newsletters. For more than a decade, political scientist Lindsey Cormack’s DCinbox project has collected “every official e-newsletter sent by sitting members of the U.S. House and Senate.” You can search the corpus online and also download all the emails as a series of CSV files, grouped by month. For each of the 130,000+ mailings, the files provide the date, subject, body, and sender’s Bioguide ID. (April 2020 was the highest-volume month, with more than 2,300 messages, nearly all of them mentioning the coronavirus.)

HT to Data is Plural 2021.03.03 edition. Please subscribe to their weekly newsletter and see all of the datasets that they have highlighted in previous newsletters!

Library of Congress Completes Digitization of 23 Early Presidential Collections

This is awesome! The Library of Congress has just finished a 20 year(!) project to digitize the papers of the Presidents from George Washington to Calvin Coolidge. I hope GPO is going to catalog these collections so that the records get into library catalogs!

The Library of Congress has completed a more than two decade-long initiative to digitize the papers of nearly two dozen early presidents. The Library holds the papers of 23 presidents from George Washington to Calvin Coolidge, all of which have been digitized and are now available online.

The Library plans to highlight each presidential collection on social media in the weeks leading up to the next presidential inauguration on Jan. 20, 2021.

Full Set of Presidential Collections

Reclaim the Records “mother-of-all-FOIA requests” and NARA’s new digitization partnership site

An activist group called Reclaim the Records recently submitted the “mother-of-all-FOIA requests” asking for billions of pages scanned through NARA’s public-private digitization partnership program. Here’s the twitter thread describing it:

Well now at least NARA has put up a page showing all of their digitization partners and what publications/record groups these organizations are scanning. It looks mostly to be ancestry, fold3 and familysearch, but there are other groups like the National Archives of Korea, National Collection of Aerial Photography (UK), and NOAA (Logbooks from 19th century naval ships and expeditions!).

From what I can tell, these scans seem to be going into NARA’s catalog and are freely available! Thanks NARA and also BIG thanks Reclaim the Records for making a big public deal about NARA’s public-private partnership program and making sure that the public is aware of those BILLIONS of scanned pages.

Archives