Recently this graphic was posted on the “The Gavel” blog of the Speaker of the House (What 3.6 Million Jobs Lost Over 13 Months Looks Like, by “Karina,” February 6th, 2009). It shows the number of jobs lost (and recovered) in the recessions of 1990, 2001, and 2008. By juxtaposing the three time periods over each other starting with the peak job month and showing employment change by month, it gives a startling comparison that highlights the severity of the current situation. It also implies that we will have a long time to wait before we reach our previous peak month.
I was curious about this graph and did a little follow up that I share below. Data librarians may find this a bit tedious, but for those who have never used raw data, it may be useful as an illustration of the difference between “data” (the raw numbers that you put into statistical software) and “statistics” (the human-viewable tables and graphs that we see in publications).
Unfortunately, as is often the case with statistical information, the source given for the graphic is incomplete: simply “Bureau of Labor Statistics.” I could not find the graphic itself on the bls.gov site, so I assume that the chart was constructed from BLS data, specifically, the Current Population Survey or the Current Employment Statistics Survey. These two surveys count employment differently — one is a survey of individuals and the other is a survey of employers.
There is a similar, but not identical, chart (“Percent change in total nonfarm employment, from beginning of recession) in the January 2009 (released February 6, 2009) Current Employment Statistics Highlights, Monthly (Bureau of Labor Statistics), so my guess is that someone at the Speaker’s office built the chart from the raw CES data.
Just out of curiosity, I went to the CES “Most Requested Statistics webpage and downloaded “Total Nonfarm Employment – CES0000000001″ for 1990 through the end of 2008. Raw data suitable for analysis even look “raw,” not even like a statistical table:
1990,Jan,109151 1990,Feb,109396 1990,Mar,109611 1990,Apr,109651 1990,May,109800 1990,Jun,109817 1990,Jul,109775 1990,Aug,109567 1990,Sep,109485 1990,Oct,109324 1990,Nov,109180 1990,Dec,109120 1991,Jan,109001 1991,Feb,108695 1991,Mar,108535 1991,Apr,108324 1991,May,108196 1991,Jun,108283 ...
Of course, it is relatively easy, using statistical software, to construct tables and graphs from raw data. Here, for example, is a published statistical table with essentially the same raw information (but from CPS, not CES) that I downloaded. (See the full table from Employment from the BLS household and payroll surveys: summary of recent trends, February 6, 2009).
By using the raw data to create a graph, one can tell a story that has more impact than just a table of numbers. It is relatively easy to get these data into statistical software. I used Excel and Stata to create a small time-series data file. I organized it by month (from month “1″ to month “48″) with each row of the data file having data for 3 recessions. The first row has data for the first month of the three recessions, the second row has data for the second month, etc. The CES data has employment totals in millions. For example, the employment for 2008:
138152 138080 137936 137814 137654 137517 137356 137228 137053 136732 136352 135755 135178
I had to compute a new variable for each recession: the cumulative number of jobs lost. So, for example, 2008:
138152 0 138080 -72 137936 -216 137814 -338 137654 -498 137517 -635 137356 -796 137228 -924 137053 -1099 136732 -1420 136352 -1800 135755 -2397 135178 -2974
The first 12 months with all three recessions (v1, v2, v3) and the computed variables (1990, 2001, 2008) look like this:
month v1 1990 v2 2001 v3 2008 1 109817 0 132530 0 138152 0 2 109775 -42 132500 -30 138080 -72 3 109567 -250 132219 -311 137936 -216 4 109485 -332 132175 -355 137814 -338 5 109324 -493 132047 -483 137654 -498 6 109180 -637 131922 -608 137517 -635 7 109120 -697 131762 -768 137356 -796 8 109001 -816 131518 -1012 137228 -924 9 108695 -1122 131193 -1337 137053 -1099 10 108535 -1282 130901 -1629 136732 -1420 11 108324 -1493 130723 -1807 136352 -1800 12 108196 -1621 130591 -1939 135755 -2397
Here is a complete tab-separated-values version of the data file I constructed. Then I used Stata to build a graph and it looks very much like the one at the Speaker’s Blog.
Of course, when one tells one story, one leaves out other stories. This graphic doesn’t show that the starting points of the recessions were different:
1990: 109 million
2001: 132 million
2008: 138 million
Open re-usable government information
One could use the raw data to tell a lot of different stories and analyze the data in many different ways. And that brings me to the connection between all this and why we need to be sure that government information is not just “free as in beer” but also “free as in open.”
It is important for statistical agencies to publish statistics to help us understand their raw data. But, it is also essential that they provide us with the raw data so that we can better understand their statistics and do our own analyses. Most of the statistical agencies of the U.S. government do an excellent job of making their raw data easily available. In fact, the rest of government would do well to use statistical agencies as a model for instantiating their information in usable and re-usable formats (in addition to any publishing and presentation of their data/information) so that the information, whether it is text or images or video or sound or numbers, can be used, reused, analyzed, stored, and preserved.
This work, unless otherwise expressly stated, is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 United States License.