Home » Doc of the day
Category Archives: Doc of the day
Data is plural newsletter posts 2 amazing govinfo datasets: House Comm witnesses and 1900 census immigrant populations
I love my Data Is Plural newsletter, Jeremy Singer-Vine’s weekly newsletter of useful/curious datasets! You can check out his archive from 2015(!) to present and also explore the archive as a google spreadsheet or as Markdown files (a dataset of interesting datasets :-)).
Today’s edition was especially good on the govinfo front: 2 really awesome datasets on House Committee witnesses (1971 – 2016) and a high-resolution transcription and CSV file of the Census Bureau’s 1900 report on immigrant populations. Check them out, and don’t forget to subscribe to the Data is plural newsletter!
House committee witnesses. Political scientists Lauren C. Bell and J.D. Rackey have compiled a spreadsheet of 435,000+ people testifying before the US House of Representatives from 1971 to 2016. They began with a text file scraped from a ProQuest database, provided by the authors of a dataset that focused on social scientists’ testimony (DIP 2020.12.23). Then, they determined each witness’s first and last name; type of organization; the committee, date, title, and summary of the relevant hearing; and more.
Immigrant populations in 1900. The 1900 US Census’s public report includes a table counting the foreign-born residents of each state and territory — overall and disaggregated into a few dozen origins, which range from subdivisions of countries (Poland is split into “Austrian,” “German,” “Russian,” and “unknown” columns) to entire continents (“Africa”). It’s officially available as a low-resolution PDF. Reporters at Stacker, however, recently transcribed it into a CSV file for easier use.
Thanks to the Cornell Center for Social Sciences (nee CISER) for posting the Census bureau’s Integrated 2020 Redistricting Data (PL94-171).
“On August 12, 2021, the Census Bureau released the Public Law 94-171 data, also known as Redistricting Data, in four (4) parts per state. Users who want to have the complete redistricting dataset for a state in one file have to integrate these four parts of the Census Bureau files.
We’ve integrated the four parts and made them available in convenient ready-to-use formats — SAS, SPSS, STATA, and CSV. We’ve also made available SAS, STATA, and SPSS programs to read the CSV files, label the variables, and assign variables their correct type (as per the data dictionary).”
This is awesome! The Library of Congress has just finished a 20 year(!) project to digitize the papers of the Presidents from George Washington to Calvin Coolidge. I hope GPO is going to catalog these collections so that the records get into library catalogs!
The Library of Congress has completed a more than two decade-long initiative to digitize the papers of nearly two dozen early presidents. The Library holds the papers of 23 presidents from George Washington to Calvin Coolidge, all of which have been digitized and are now available online.
The Library plans to highlight each presidential collection on social media in the weeks leading up to the next presidential inauguration on Jan. 20, 2021.
Full Set of Presidential Collections
- Papers of President George Washington (1732-1799)
- Papers of President Thomas Jefferson (1743-1826)
- Papers of President James Madison (1751-1836)
- Papers of President James Monroe (1758-1831)
- Papers of President Andrew Jackson (1767-1845)
- Papers of President Martin Van Buren (1782-1862)
- Papers of President William Henry Harrison (1773-1841)
- Papers of President John Tyler (1790-1862)
- Papers of President James K. Polk (1795-1849)
- Papers of President Zachary Taylor (1784-1850)
- Papers of President Franklin Pierce (1804-1869)
- Papers of President Abraham Lincoln (1809-1865)
- Papers of President Andrew Johnson (1808-1875)
- Papers of President Ulysses S. Grant (1822-1885)
- Papers of President James A. Garfield (1831-1881)
- Papers of President Chester A. Arthur (1829-1886)
- Papers of President Grover Cleveland (1837-1908)
- Papers of President Benjamin Harrison (1833-1901)
- Papers of President William McKinley (1843-1901)
- Papers of President Theodore Roosevelt (1858-1919)
- Papers of President William Howard Taft (1857-1930)
- Papers of President Woodrow Wilson (1856-1924)
- Papers of President Calvin Coolidge (1872-1933)
An activist group called Reclaim the Records recently submitted the “mother-of-all-FOIA requests” asking for billions of pages scanned through NARA’s public-private digitization partnership program. Here’s the twitter thread describing it:
Hey! @USNatArchives just did something new, and really good for transparency!
And we think it maybe *might* be because of that "mother-of-all-FOIA requests" we just filed with them. 😌
So, okay, remember how we filed this on October 14th? Well…
— Reclaim The Records (@ReclaimTheRecs) November 19, 2020
Well now at least NARA has put up a page showing all of their digitization partners and what publications/record groups these organizations are scanning. It looks mostly to be ancestry, fold3 and familysearch, but there are other groups like the National Archives of Korea, National Collection of Aerial Photography (UK), and NOAA (Logbooks from 19th century naval ships and expeditions!).
From what I can tell, these scans seem to be going into NARA’s catalog and are freely available! Thanks NARA and also BIG thanks Reclaim the Records for making a big public deal about NARA’s public-private partnership program and making sure that the public is aware of those BILLIONS of scanned pages.