In my first post, I wrote about making information useful for ordinary people. It's been a pleasure and an honor to guest blog here for the past month, and as the month of October is nearly gone, I figure it seems fitting to come back to this subject as my reign as "Blogger of the Month" comes to an end.
Large numbers in particular are difficult to comprehend and the world of government information is full of them: earmarks range from hundreds of thousands to tens of millions of dollars, Barack Obama's fundraising totals have eclipsed six-hundred million dollars, and the $700 billion dollar bailout package had pundits scrambling to describe things that cost $700 billion. The difficulty of explaining just how big some of these numbers are was seen to an absurd end when CNN presented a calculation as to how many McDonald's apple pies could be purchased for each US citizen with such a sum.
One of the most useful ways of putting information in context that I've seen involving government information or anything else are the sparklines at watchdog.net:
These graphics show the statistics of each lawmaker in context, as well as the general shape of the distribution of Congress as a whole. Knowing that a congressperson requested $147 million in earmarks may sound like a lot, but seeing that it puts them outside of the top 100 may provide some useful and much needed context to these numbers. The shape of the line also shows if there is a smooth trend or a sharp jump with a small handful of lawmakers raising or spending drastically more than others.
Hopefully more and more presentations of government information will follow the lead of the terrific watchdog.net and attempt to surround information with relative context so that government information isn't simply available, but understandable.
We are only one week away from Election Day, and with record turnout expected, there are no doubt still a number of people that have no idea where their polling place is or exactly what will be on their ballot.
Campaigns and PACs pay large sums of money to vendors that sell information on district boundaries, and even the US House of Representatives uses a commercial vendor to provide the data that powers their "Who is my Representative?" service. There is no reason why this information should be this difficult to obtain.
The Voting Information Project encourages Boards of Elections to standardize and share their voting information including what is on the ballot, where the polling locations are, and the boundaries for all the various jurisdictions. So far only a handful of states (Iowa, Kansas, Maryland, Minnesota, Missouri, Montana, North Carolina, North Dakota, and Ohio) as well as Los Angeles County have published the requested information.
Voting information is some of the most important information to help the average citizen participate in our democracy, and the Voting Information Project is doing important work to ensure that this information is as open and widespread as possible. The states already participating should be applauded and the remaining states should be sure that by the time the next election season rolls around, they too are participating fully in the Voting Information Project.
For more information on the efforts of the Voting Information Project: visit their website.
There is a lot of talk about making data accessible via APIs, but there is also a lot of confusion about what this means, how to do it, and why it is beneficial when the average citizen cannot make heads or tails of an API.
API stands for "Application Programming Interface" but typically what we are discussing when we talk about APIs around data is a way to access data in a machine readable format. A machine readable format is something that is more or less understandable by a computer program, so that it may be used to present data in new and interesting ways.
The house.gov website has a listing of all representatives by state but a computer program has no way of knowing how to understand this listing. A more useful listing might look like an excel (or CSV) file that listed each congressperson's name in the first column, state in the second, and so on.
This is the fundamental advantage of an API, it makes data available in a way that a computer program can understand so that more complicated things can be done by such a program. (eg. draw a map with states colored according to their representatives' party affiliations)
A side effect of this computer readable format is that it is possible to ask more useful and specific questions of the data. When you go to the above house.gov site it is possible to get a listing of all Representatives, but it is impossible to say "show me all representatives that are Democrats from North Carolina" or "show me all representatives named John." With an API this kind of query is typically very simple, as an example in the Sunlight Labs API this could be done by going to a URL like http://services.sunlightfoundation.com/api/legislators.get?state=NC&part....
It is the availability of these APIs that have allowed all sorts of interesting sites that combine data from multiple sources known as "mashups." One of the earliest and most popular examples was a site called HousingMaps that combines Craigslist housing data with Google maps.
A handful of APIs exist to help make government data more accessible, through which it is now possible to make mashups using government data.
A rich sampling of them includes:
- Sunlight Labs API
- Capitol Words API
- FollowTheMoney API
- GovTrack.us API
- MapLight.org API
- NYTimes Campaign Finance API
- OpenSecrets API
- Project Vote Smart API
- Watchdog.net API
All of these can be used to pull the information available from these sites and do new and interesting things with it and even combine it with data from other sites to provide a more in-depth view than any single site or dataset can hope to offer.
Earmarks have been making a lot of news lately, John McCain has enjoyed talking about them on the campaign trail and in debates in particular. Especially since the "Bridge to Nowhere" was also raised as a campaign issue as the McCain campaign brought up Gov. Sarah Palin's rejection of the earmark as an example of her stand against status quo politics. (The Obama campaign has since countered that her opposition was not as total as the McCain campaign sought to imply.)
For those uninitiated to this new piece of political jargon, an earmark is in essence a line item in a budget bill that sets aside money for a specific project. Typically Senators and Representatives request projects that will have some positive impact on the district or state they represent, although this is not always the case. They are also known as "pet projects" or "pork barrel spending" or the colorful "boondoggle."
From a government information perspective earmarks are a funny beast, Congresspeople historically rely on them for reelection, sending out mail to their constituency informing them of the 2,000 new jobs brought in by the defense contract landed by a local company or the $7,000,000 renovation to the local hospital. At the same time, lawmakers are spending taxpayer money on these earmarks, so they often don't like being tagged with a line like "spent $4 billion dollars of taxpayer money on pet projects." There have been new requirements in the last year or so about disclosure, but the lack of a clear definition makes getting a definitive listing of earmarks that people can agree upon a daunting task.
For these reasons and many others it is very difficult to get an accurate listing of earmarks from Congress. One organization with a great track record of digging up and recording federal earmarks is Taxpayers for Common Sense a non-partisan group that gathers the earmarks in the various spending bills and publishes them in Microsoft Excel files.
TCS does excellent work gathering this data, and it has enabled other projects such as EarmarkWatch, a joint project between Sunlight Foundation and Taxpayers that was started last year as an experiment in giving citizens a way to see what federal money was being spent on and to research interesting connections between lawmakers and the companies they were rewarding contracts too. As a result of this data being available EarmarkWatch users have gone the extra mile and helped to visualize Earmarks on a Google map.
Another recent project worth mentioning is the Seattle Times' The Favor Factory, a catalog of defense earmarks searchable by state, recipient, or Congressperson similar to EarmarkWatch.
The public attention earmarks have received is due in part to the information on a relatively technical part of the budget process being made more accessible. As a result of this publicity Earmark reform seems likely whichever candidate wins. McCain has made campaign pledges to eliminate earmark spending, and Obama has been forced to talk about them and offer up a moratorium. (The Seattle Times as part of Favor Factory has a good synopsis of where the candidates stand.)
The New York Times has just announced an API that makes available the data they have gleaned from the Federal Election Commission's electronic filings for the presidential candidates.
"The initial version of the Campaign Finance API offers overall figures for presidential candidates, as well as state-by-state and ZIP code totals for specific candidates. In addition, the API supports a contributor name search using any of the following parameters: first name, last name and ZIP code."
This allows people with the appropriate technical skills to build mashups and other web services that take a look at donations by individual or by area with relative ease. In essence it is now possible for web developers to create views on this valuable data that previously would have involved digging through millions of FEC electronic filings.
It should also be possible for researchers with moderate technical knowledge to analyze the individual contributions going to candidates to perform statistical and other analysis on what makes for a very interesting dataset.
The New York times providing this service is certainly a positive step towards helping people make use of what is one of the richest (pun not intended) datasets the federal government has to offer.
Greetings all, I'm honored to be guest blogging this month here at FGI.
I'm a web developer at Sunlight Labs, which involves the development of sites and projects that aim to enable citizens and journalists to more easily access government data.
Those of us that spend our days wrestling with government data often spend a lot of time talking about the data that should be available but isn't. An issue of equal if not greater importance is how to make the already available data useful to a general audience. Anyone that has dealt with raw data from any government agency knows that simply passing government data along is typically not sufficient.
One example of a project done here at Sunlight that emphasizes making some of the complex outputs of the federal government meaningful to the average citizen is Capitol Words - a site that provides a daily and monthly view of what the most commonly used word in the Congressional Record was.
The Congressional Record is the official journal of the daily proceedings of Congress. It is mandated by Article I, Section 5 of the Constitution, which emphasizes how essential it is for the people to know what their legislators are up to. It is even published today in a digital format. Unfortunately, today it is far too large to be of any real benefit to the general public.
Capitol Words was born out of a suggestion that it would be interesting to see simply the "word of the day" as a way of getting a sense of what was on Congress' mind. By giving the average citizen a window into what Congress is doing, it is possible that they will become more engaged then they otherwise might have been. Some citizens may even be inspired to dig deeper and look at the Congressional Record.
Simple presentations of government information such as the popular tag cloud, or even just a simple word, can provide access to data that may be freely available, but is still not accessible to the general public. As great as it is to see more and more government data being made available, hopefully people will also develop new and interesting ways to present government information in a manner useful to all citizens.