Home » Posts tagged 'visualization'

Tag Archives: visualization

Our mission

Free Government Information (FGI) is a place for initiating dialogue and building consensus among the various players (libraries, government agencies, non-profit organizations, researchers, journalists, etc.) who have a stake in the preservation of and perpetual free access to government information. FGI promotes free government information through collaboration, education, advocacy and research.

Visualizing the gerrymandering of a Congressional district

Here’s an interesting little GIF that Lazaro Gamio (@LazaroGamio) posted to twitter recently. The visualization shows the historical Congressional district boundaries of Maryland’s 3rd district, from 1789-2017. this district is one of the most gerrymandered in the country. The last few years are particularly startling. As one commenter put it, the later district shape “looks like a Rorschach test!”

Death and Taxes

Death and Taxes – A Graphical Visualization of the Federal Budget

Death and Taxes is a large representational graph and poster of the federal budget. It contains over 500 programs and departments and almost every program that receives over 200 million dollars annually. The data is straight from the president’s 2009 budget request and will be debated, amended, and approved by Congress to begin the fiscal year. All of the item circles are proportional in size to their spending totals and the percentage change from 2008 is included to spot trends and disproportion.

Job losses in recessions visualized

Recently this graphic was posted on the “The Gavel” blog of the Speaker of the House (What 3.6 Million Jobs Lost Over 13 Months Looks Like, by “Karina,” February 6th, 2009). It shows the number of jobs lost (and recovered) in the recessions of 1990, 2001, and 2008. By juxtaposing the three time periods over each other starting with the peak job month and showing employment change by month, it gives a startling comparison that highlights the severity of the current situation. It also implies that we will have a long time to wait before we reach our previous peak month.

I was curious about this graph and did a little follow up that I share below. Data librarians may find this a bit tedious, but for those who have never used raw data, it may be useful as an illustration of the difference between “data” (the raw numbers that you put into statistical software) and “statistics” (the human-viewable tables and graphs that we see in publications).

Unfortunately, as is often the case with statistical information, the source given for the graphic is incomplete: simply “Bureau of Labor Statistics.” I could not find the graphic itself on the bls.gov site, so I assume that the chart was constructed from BLS data, specifically, the Current Population Survey or the Current Employment Statistics Survey. These two surveys count employment differently — one is a survey of individuals and the other is a survey of employers.

There is a similar, but not identical, chart (“Percent change in total nonfarm employment, from beginning of recession) in the January 2009 (released February 6, 2009) Current Employment Statistics Highlights, Monthly (Bureau of Labor Statistics), so my guess is that someone at the Speaker’s office built the chart from the raw CES data.

Just out of curiosity, I went to the CES “Most Requested Statistics webpage and downloaded “Total Nonfarm Employment – CES0000000001” for 1990 through the end of 2008. Raw data suitable for analysis even look “raw,” not even like a statistical table:


Of course, it is relatively easy, using statistical software, to construct tables and graphs from raw data. Here, for example, is a published statistical table with essentially the same raw information (but from CPS, not CES) that I downloaded. (See the full table from Employment from the BLS household and payroll surveys: summary of recent trends, February 6, 2009).

By using the raw data to create a graph, one can tell a story that has more impact than just a table of numbers. It is relatively easy to get these data into statistical software. I used Excel and Stata to create a small time-series data file. I organized it by month (from month “1” to month “48”) with each row of the data file having data for 3 recessions. The first row has data for the first month of the three recessions, the second row has data for the second month, etc. The CES data has employment totals in millions. For example, the employment for 2008:


I had to compute a new variable for each recession: the cumulative number of jobs lost. So, for example, 2008:

138152	 0
138080	-72
137936	-216
137814	-338
137654	-498
137517	-635
137356	-796
137228	-924
137053	-1099
136732	-1420
136352	-1800
135755	-2397
135178	-2974

The first 12 months with all three recessions (v1, v2, v3) and the computed variables (1990, 2001, 2008) look like this:

month   v1  1990    v2      2001    v3       2008
1   109817    0     132530    0     138152   0
2   109775  -42     132500  -30     138080  -72
3   109567  -250    132219  -311    137936  -216
4   109485  -332    132175  -355    137814  -338
5   109324  -493    132047  -483    137654  -498
6   109180  -637    131922  -608    137517  -635
7   109120  -697    131762  -768    137356  -796
8   109001  -816    131518  -1012   137228  -924
9   108695  -1122   131193  -1337   137053  -1099
10  108535  -1282   130901  -1629   136732  -1420
11  108324  -1493   130723  -1807   136352  -1800
12  108196  -1621   130591  -1939   135755  -2397

Here is a complete tab-separated-values version of the data file I constructed. Then I used Stata to build a graph and it looks very much like the one at the Speaker’s Blog.

Of course, when one tells one story, one leaves out other stories. This graphic doesn’t show that the starting points of the recessions were different:
1990: 109 million
2001: 132 million
2008: 138 million

Open re-usable government information

One could use the raw data to tell a lot of different stories and analyze the data in many different ways. And that brings me to the connection between all this and why we need to be sure that government information is not just “free as in beer” but also “free as in open.”

It is important for statistical agencies to publish statistics to help us understand their raw data. But, it is also essential that they provide us with the raw data so that we can better understand their statistics and do our own analyses. Most of the statistical agencies of the U.S. government do an excellent job of making their raw data easily available. In fact, the rest of government would do well to use statistical agencies as a model for instantiating their information in usable and re-usable formats (in addition to any publishing and presentation of their data/information) so that the information, whether it is text or images or video or sound or numbers, can be used, reused, analyzed, stored, and preserved.

Obama’s Inaugural Speech: visualized, video-searchable

President Obama’s inaugural speech has generated some interesting examples of how technology can be applied to government information when the information is freely available for use and re-use and not locked into government databases or proprietary formats. It is a small piece of text with a lot of public interest and high visibility and, therefore, ripe for these kinds of demonstrations and experiments. Of course, to make use of the information, we have to actually have a copy of it. Imagine what would happen if all government information was actually distributed in open formats to libraries so that we could build collections that were index-able, search-able, visually browsable, and analyzable in interesting ways. Imagine freeing government information from its .gov silos and integrating it with non-government information in digital collections created for particular virtual communities of interest. Imagine the future of digital collections that are as easily re-usable as this small bit of text.

Check out these examples!

  • Inaugural Words: 1789 to the Present, New York Times. “A look at the language of presidential inaugural addresses. The most-used words in each address appear in [an] interactive chart…, sized by number of uses. Words highlighted in yellow were used significantly more in this inaugural address than average.”
  • Visual of the Inaugural Address, ProPublica. [Compare this to the NYT version. Stop words matter!]
  • Search Inside Obama’s Inaugural Speech. Delve Networks. “We invite you to experience President Obama’s inaugural speech using our search inside technology. To do this, type what you’re looking for into the player searchbar above. A heatmap will show you where information related to your topic appears in the speech. You can move your mouse over the heatmap to see the matches. Click to jump to that place in the speech.”

Happy New Year!

Happy new year to all of you. Whether you are on vacation and peeking at the news, or reading this as you just get back to work, here is something interesting and fun to see:

Martin Wattenberg. Data visualization. Media art. Collective intelligence.

Wattenberg is a computer scientist and new media artist. He is the founding manager of IBM’s Visual Communication Lab, which researches new forms of visualization and how they can enable better collaboration.

Check out his many projects (e.g., Name Voyager, the Baby Name Wizard with data from the Social Security Administration, or history flow, visualizing the editing history of Wikipedia pages, or Many-Eyes, an experiment in open, public data visualization and analysis).

This is another good example of what interesting things can be done when we have complete access to information. When the raw data are free, we can do so much more than the single views of data provided by government agencies.

Read more about Wattenberg here:

He creates ways of seeing information, by Billy Baker, Boston Globe, December 29, 2008.