Home » Posts tagged 'visualization'
Tag Archives: visualization
Here’s an interesting little GIF that Lazaro Gamio (@LazaroGamio) posted to twitter recently. The visualization shows the historical Congressional district boundaries of Maryland’s 3rd district, from 1789-2017. this district is one of the most gerrymandered in the country. The last few years are particularly startling. As one commenter put it, the later district shape “looks like a Rorschach test!”
Playing around with historical congressional district boundaries: Maryland's 3rd district, from 1789-2017 -> pic.twitter.com/nGOU3vcn1W
— Lazaro Gamio (@LazaroGamio) February 23, 2017
Death and Taxes – A Graphical Visualization of the Federal Budget
Death and Taxes is a large representational graph and poster of the federal budget. It contains over 500 programs and departments and almost every program that receives over 200 million dollars annually. The data is straight from the president’s 2009 budget request and will be debated, amended, and approved by Congress to begin the fiscal year. All of the item circles are proportional in size to their spending totals and the percentage change from 2008 is included to spot trends and disproportion.
Recently this graphic was posted on the “The Gavel” blog of the Speaker of the House (What 3.6 Million Jobs Lost Over 13 Months Looks Like, by “Karina,” February 6th, 2009). It shows the number of jobs lost (and recovered) in the recessions of 1990, 2001, and 2008. By juxtaposing the three time periods over each other starting with the peak job month and showing employment change by month, it gives a startling comparison that highlights the severity of the current situation. It also implies that we will have a long time to wait before we reach our previous peak month.
I was curious about this graph and did a little follow up that I share below. Data librarians may find this a bit tedious, but for those who have never used raw data, it may be useful as an illustration of the difference between “data” (the raw numbers that you put into statistical software) and “statistics” (the human-viewable tables and graphs that we see in publications).
Unfortunately, as is often the case with statistical information, the source given for the graphic is incomplete: simply “Bureau of Labor Statistics.” I could not find the graphic itself on the bls.gov site, so I assume that the chart was constructed from BLS data, specifically, the Current Population Survey or the Current Employment Statistics Survey. These two surveys count employment differently — one is a survey of individuals and the other is a survey of employers.
There is a similar, but not identical, chart (“Percent change in total nonfarm employment, from beginning of recession) in the January 2009 (released February 6, 2009) Current Employment Statistics Highlights, Monthly (Bureau of Labor Statistics), so my guess is that someone at the Speaker’s office built the chart from the raw CES data.
Just out of curiosity, I went to the CES “Most Requested Statistics webpage and downloaded “Total Nonfarm Employment – CES0000000001” for 1990 through the end of 2008. Raw data suitable for analysis even look “raw,” not even like a statistical table:
1990,Jan,109151 1990,Feb,109396 1990,Mar,109611 1990,Apr,109651 1990,May,109800 1990,Jun,109817 1990,Jul,109775 1990,Aug,109567 1990,Sep,109485 1990,Oct,109324 1990,Nov,109180 1990,Dec,109120 1991,Jan,109001 1991,Feb,108695 1991,Mar,108535 1991,Apr,108324 1991,May,108196 1991,Jun,108283 ...
Of course, it is relatively easy, using statistical software, to construct tables and graphs from raw data. Here, for example, is a published statistical table with essentially the same raw information (but from CPS, not CES) that I downloaded. (See the full table from Employment from the BLS household and payroll surveys: summary of recent trends, February 6, 2009).
By using the raw data to create a graph, one can tell a story that has more impact than just a table of numbers. It is relatively easy to get these data into statistical software. I used Excel and Stata to create a small time-series data file. I organized it by month (from month “1” to month “48”) with each row of the data file having data for 3 recessions. The first row has data for the first month of the three recessions, the second row has data for the second month, etc. The CES data has employment totals in millions. For example, the employment for 2008:
138152 138080 137936 137814 137654 137517 137356 137228 137053 136732 136352 135755 135178
I had to compute a new variable for each recession: the cumulative number of jobs lost. So, for example, 2008:
138152 0 138080 -72 137936 -216 137814 -338 137654 -498 137517 -635 137356 -796 137228 -924 137053 -1099 136732 -1420 136352 -1800 135755 -2397 135178 -2974
The first 12 months with all three recessions (v1, v2, v3) and the computed variables (1990, 2001, 2008) look like this:
month v1 1990 v2 2001 v3 2008 1 109817 0 132530 0 138152 0 2 109775 -42 132500 -30 138080 -72 3 109567 -250 132219 -311 137936 -216 4 109485 -332 132175 -355 137814 -338 5 109324 -493 132047 -483 137654 -498 6 109180 -637 131922 -608 137517 -635 7 109120 -697 131762 -768 137356 -796 8 109001 -816 131518 -1012 137228 -924 9 108695 -1122 131193 -1337 137053 -1099 10 108535 -1282 130901 -1629 136732 -1420 11 108324 -1493 130723 -1807 136352 -1800 12 108196 -1621 130591 -1939 135755 -2397
Here is a complete tab-separated-values version of the data file I constructed. Then I used Stata to build a graph and it looks very much like the one at the Speaker’s Blog.
Of course, when one tells one story, one leaves out other stories. This graphic doesn’t show that the starting points of the recessions were different:
1990: 109 million
2001: 132 million
2008: 138 million
Open re-usable government information
One could use the raw data to tell a lot of different stories and analyze the data in many different ways. And that brings me to the connection between all this and why we need to be sure that government information is not just “free as in beer” but also “free as in open.”
It is important for statistical agencies to publish statistics to help us understand their raw data. But, it is also essential that they provide us with the raw data so that we can better understand their statistics and do our own analyses. Most of the statistical agencies of the U.S. government do an excellent job of making their raw data easily available. In fact, the rest of government would do well to use statistical agencies as a model for instantiating their information in usable and re-usable formats (in addition to any publishing and presentation of their data/information) so that the information, whether it is text or images or video or sound or numbers, can be used, reused, analyzed, stored, and preserved.
President Obama’s inaugural speech has generated some interesting examples of how technology can be applied to government information when the information is freely available for use and re-use and not locked into government databases or proprietary formats. It is a small piece of text with a lot of public interest and high visibility and, therefore, ripe for these kinds of demonstrations and experiments. Of course, to make use of the information, we have to actually have a copy of it. Imagine what would happen if all government information was actually distributed in open formats to libraries so that we could build collections that were index-able, search-able, visually browsable, and analyzable in interesting ways. Imagine freeing government information from its .gov silos and integrating it with non-government information in digital collections created for particular virtual communities of interest. Imagine the future of digital collections that are as easily re-usable as this small bit of text.
Check out these examples!
- Inaugural Words: 1789 to the Present, New York Times. “A look at the language of presidential inaugural addresses. The most-used words in each address appear in [an] interactive chart…, sized by number of uses. Words highlighted in yellow were used significantly more in this inaugural address than average.”
- Visual of the Inaugural Address, ProPublica. [Compare this to the NYT version. Stop words matter!]
- Search Inside Obama’s Inaugural Speech. Delve Networks. “We invite you to experience President Obama’s inaugural speech using our search inside technology. To do this, type what you’re looking for into the player searchbar above. A heatmap will show you where information related to your topic appears in the speech. You can move your mouse over the heatmap to see the matches. Click to jump to that place in the speech.”
Happy new year to all of you. Whether you are on vacation and peeking at the news, or reading this as you just get back to work, here is something interesting and fun to see:
Wattenberg is a computer scientist and new media artist. He is the founding manager of IBM’s Visual Communication Lab, which researches new forms of visualization and how they can enable better collaboration.
Check out his many projects (e.g., Name Voyager, the Baby Name Wizard with data from the Social Security Administration, or history flow, visualizing the editing history of Wikipedia pages, or Many-Eyes, an experiment in open, public data visualization and analysis).
This is another good example of what interesting things can be done when we have complete access to information. When the raw data are free, we can do so much more than the single views of data provided by government agencies.
Read more about Wattenberg here:
He creates ways of seeing information, by Billy Baker, Boston Globe, December 29, 2008.