Home » Posts tagged 'Datasets'
Tag Archives: Datasets
Here’s an interesting dataset to bookmark. After a 10-year moratorium on earmarks, the House Appropriations Committee recently released PDF tables of fiscal year 2022 congressionally directed spending projects. But those PDFs aren’t actually usable for any sort of deeper data analysis. So the Congress Project at the Bipartisan Policy Center has just released the Congressionally Directed Spending FY2022 Dataset. It’s a database of all the FY22 “Final Funded Projects” in H.R. 2471, with some additional member data included (the original tables often only include last name). Check out the Congress Project’s blog post for more on how they extracted and cleaned the data. Good work by the Bipartisan Policy Center!
Data is plural newsletter posts 2 amazing govinfo datasets: House Comm witnesses and 1900 census immigrant populations
I love my Data Is Plural newsletter, Jeremy Singer-Vine’s weekly newsletter of useful/curious datasets! You can check out his archive from 2015(!) to present and also explore the archive as a google spreadsheet or as Markdown files (a dataset of interesting datasets :-)).
Today’s edition was especially good on the govinfo front: 2 really awesome datasets on House Committee witnesses (1971 – 2016) and a high-resolution transcription and CSV file of the Census Bureau’s 1900 report on immigrant populations. Check them out, and don’t forget to subscribe to the Data is plural newsletter!
House committee witnesses. Political scientists Lauren C. Bell and J.D. Rackey have compiled a spreadsheet of 435,000+ people testifying before the US House of Representatives from 1971 to 2016. They began with a text file scraped from a ProQuest database, provided by the authors of a dataset that focused on social scientists’ testimony (DIP 2020.12.23). Then, they determined each witness’s first and last name; type of organization; the committee, date, title, and summary of the relevant hearing; and more.
Immigrant populations in 1900. The 1900 US Census’s public report includes a table counting the foreign-born residents of each state and territory — overall and disaggregated into a few dozen origins, which range from subdivisions of countries (Poland is split into “Austrian,” “German,” “Russian,” and “unknown” columns) to entire continents (“Africa”). It’s officially available as a low-resolution PDF. Reporters at Stacker, however, recently transcribed it into a CSV file for easier use.