Three open government access advocates (Sunlight Foundation developer Eric Mill, GovTrack.us founder Josh Tauberer and New York Times developer Derek Willis) have put the United States Code on Github.
- The United States (Code) is on Github, by Alex Howard, O'Reilly Radar (December 6, 2012).
This fall, a trio of open government developers took it upon themselves to do what custodians of the U.S. Code and laws in the Library of Congress could have done years ago: published data and scrapers for legislation in Congress from THOMAS.gov in the public domain. The data at github.com/unitedstates is published using an "unlicense" and updated nightly.
..."It would be fantastic if the relevant bodies published this data themselves and made these datasets and scrapers unnecessary," said Mill, in an email interview. "It would increase the information's accuracy and timeliness, and probably its breadth. It would certainly save us a lot of work!"
Perhaps even more importantly, the project has released its computer code so that others will be able to scrape Thomas to build their own datasets of legislative data. The computer code also includes a U.S Code parser, which is significant because none of various formats in which the government produces the U.S. Code are suitable for easy reuse.
I also think it is fantastic that these developers understand the difference between putting information on the web in various hard-to-use, hard-to-preserve, and often hard-to-parse formats and actually publishing the data so that it can be easily obtained, used, and re-used. As Mill notes, publishing information makes scraping the web unnecessary, and publishing in open formats makes it much simpler to preserve information.
Eric Mill announced today on the openhouseproject mailing list that he and Josh Tauberer (of GovTrack.us) and Derek Willis have completed a milestone in their project to produce a public domain scraper and dataset from THOMAS.gov. Here is the text of his message with links:
I've been working for the last month or two with Josh Tauberer (of GovTrack.us http://govtrack.us/) and Derek Willis on a project to produce a public domain scraper and dataset from THOMAS.gov http://thomas.gov/, the official source for legislative information for the US Congress.
It's a reasonably well documented set of Python scripts, which you can find here: https://github.com/unitedstates/congress
We just hit a great milestone - it gets everything important that THOMAS has on bills, back to the year THOMAS starts (1973). We've published and documented https://github.com/unitedstates/congress/wiki all of this data in bulk, and I've worked it into Sunlight's pipeline, so that searches for bills in Scout https://scout.sunlightfoundation.com/search/federal_bills/freedom%20of%2... use data collected directly from this effort.
The data and code are all hosted on Github on a "unitedstates https://github.com/unitedstates/" organization, which is right now co-owned by me, Josh, and Derek - the intent is to have this all exist in a common space. To the extent that the code needs a license at all, I'm using a public domain "unlicense https://github.com/unitedstates/congress/blob/master/LICENSE" that should at least be sufficient for the US (other suggestions welcome).
There's other great stuff in this organization, too - Josh made an amazing donation of his legislator dataset https://github.com/unitedstates/congress-legislators, and converted it to YAML for easy reuse. I've worked that dataset into Sunlight's products already as well. I've also moved my legal citation extractor https://github.com/unitedstates/citation into this organization -- and my colleague Thom Neale has an in-progress parser for the US Code https://github.com/unitedstates/uscode, to convert it from binary typesetting codes into JSON.
Github's organization structure actually makes possible a very neat commons. I'm hoping this model proves useful, both for us and for the public.
-- Developer | sunlightfoundation.com
The Library of Congress unveiled a new Web search tool for bills and other Congressional records Wednesday that will eventually replace the 17-year-old Thomas.gov website.
- Congress.gov. Also see: About page.
Congress.gov makes federal United States legislative information freely available to the public. Launched Sept. 19, 2012, this version of the site is an initial beta release of Congress.gov, created as a successor to THOMAS.gov, the current public site for legislative information. The Congress.gov beta site contains legislation from the 107th Congress (2001) to the present, member of Congress profiles from the 93rd Congress (1973) to the present, and selected member profiles from the 80th through the 92nd Congresses (1947 to 1972). Over the next two years, Congress.gov will be adding information and features, eventually incorporating all of the information currently available on THOMAS.gov.
- Smartphone friendly, congressional search site unveiled, By Joseph Marks, NextGov (Sep 19, 2012).
- Congress launches THOMAS successor Congress.gov, by Daniel Schuman, Sunlight Foundation (Sept. 19, 2012)
What's noticeable about this evolving beta website, besides the major improvements in how people can search and understand legislative developments, is what's still missing: public comment on the design process and computer-friendly bulk access to the underlying data.
Here is another story:
- What Congress.gov Means for a Congressional API, by Nick Judd and Miranda Neubauer TechPresident (September 19 2012)
"I'm impressed," said Josh Tauberer, whose GovTrack scrapes data from THOMAS to provide it in a machine-readable form for other websites like OpenCongress, in an email. "From its new faceted search to its mobile-friendly HTML, they really hit the technology on the nail. And there's more explanation for people who aren't legislative pros. They may be slowly catching up to GovTrack.
"This new site shows that the LOC actually has the technical chops to implement raw data properly, which was a serious concern of mine before," Tauberer also wrote.
That said, Tauberer pointed out that the new site offers "no new actual information." House leadership has promised to offer access to the underlying data that fuels THOMAS and has repeatedly expressed a commitment to doing it. They just haven't committed to doing it during this Congress. And the lack of action on something that seems to them to be eminently doable has advocates kind of frustrated.
Gayle Osterberg, Director of Communications for the Library of Congress, seemed to indicate in an email that the Library of Congress is ready to cooperate. They just need Congress — meaning the House and Senate both — to give them the go-ahead.
- Congress.gov Beta: An Early Look at a New THOMAS, by Peggy Garvin, InfoToday, (September 27, 2012).
The Congress.gov beta is still in the early stages of incorporating existing THOMAS content and implementing the improved search functions that THOMAS users have been waiting for. The Law Library of Congress, which is managing the transition, is anxious to get your feedback and suggestions via its form at http://beta.congress.gov/survey.
Looking Forward to the THOMAS Beta Website, by Daniel Schuman, Sunlight Foundation (Sept. 14, 2012).
In the near future, Congress is expected to release a major upgrade to its aging legislative information website THOMAS. The long-overdue update is part of a much larger effort to "enhance the effectiveness of mission-critical systems," a response to significant public and internal pressure to improve congressional efficiency and transparency. The launch of "THOMAS Beta" is the first step towards developing what the Library of Congress describes as a completely "modern legislative information system" that will replace THOMAS and Congress' more sophisticated internal legislative tracking website "LIS" in FY 2014. Both THOMAS and LIS will stay online alongside the beta website for several years.
While THOMAS Beta has been shown to stakeholders inside Congress, as far as I am aware there has been no formal engagement process with the public to identify specifications, discuss wireframes, or generally make sure the site meets the public's needs.
The Congressional Research Service has published an update to its handy guide for finding current legislation and regulations:
- Researching Current Federal Legislation and Regulations: A Guide to Resources for Congressional Staff. by Jerry W. Mansfield, Congressional Research Service, RL33895 (August 31, 2012). Available from Federation of American Scientists.
For those experienced in legislative and regulatory searching there won't be anything new or surprising here, but it is a handy introduction and reference.
One thing I particularly liked was the comparison on p. 13 of the "Legislative Information System," which provides access to legislative information to Members of Congress and their staff, and THOMAS, which makes information on federal legislation freely available to the public. That's right, one system for Congress and a separate system for us ordinary folk.
Here is a sample:
|Best used for Finding the most complete legislative information||Best used for Working with constituents|
|Links from Bill Summary & Status display to CRS reports||No CRS reports|
|Links to Capitol Hill and selected outside sources of floor and committee schedule information.||Minimal links|
|Special advanced search capabilities||Advanced search capabilities only in Bill Summary & Status database|
Again, this won't be news to most of you, but it is a nice summary of what we are missing.
Time to contact your representatives!
- #FreeTHOMAS, by Daniel Schuman, Sunlight Foundation (June 4, 2012)
The better approach is for Congress to publish the data behind THOMAS. Government regularly does this elsewhere, and "bulk data" is responsible for clever new uses of information developed by citizens, journalists, and even the government itself.
In upcoming days, the House is likely to pass legislative language that pays lip service to releasing THOMAS data while putting the idea in a deep freeze. This would be a disaster. But it's not too late. Tell your representative that you want Congress to publish legislative data now.
FGI just signed the letter below written by the Sunlight Foundation asking Congress to improve public access to legislative information by directing the Library of Congress to make their Thomas database accessible in bulk format. If you and/or your organization believe that free access to Congressional information is of critical importance, please please consider adding your name to the list of signatories on the letter. Daniel Schuman, Sunlight Foundation's policy counsel and director of the Advisory Committee on Transparency, requests that people sign on by COB on Monday April 2nd. Interested people may also email Daniel at firstname.lastname@example.org) with how they would like to be identified on the letter. Daniel thanks you and so do we!
We are writing to ask you to improve public access to legislative information by directing the Library of Congress to publish the THOMAS database online. Congress created THOMAS with the mission of making federal legislation freely available to the public. While times have changed, and technologies have changed, THOMAS has not kept up.
As a result, millions of Americans access basic information about legislation and congressional actions through online information providers like GovTrack, OpenCongress, and Washington Watch. These free non-governmental websites are forced to rely on brittle programs to harvest information from THOMAS’s complex website. This harvesting is imperfect, expensive, and time consuming. The better approach -- which has been adopted by industry and many in government -- is to publish legislative information "in bulk" in addition to other means.
Bulk access would in essence make the entire legislative database available for download, instead of requiring users to gather information by visiting hundreds or thousands of web pages. It would make it easier for third parties to build innovative new tools, and ensure that Americans have the most accurate information at their fingertips. Congress already expressed its support for bulk access downloads in 2009, but the Library of Congress, which oversees THOMAS, has not acted. In the meantime, GPO, the executive branch, and the House of Representatives are already publishing information online in bulk.
The time has come for action. In this year's legislative branch appropriations bill, we urge you to direct the Library of Congress to implement bulk access to THOMAS within 120 days. The Library should also immediately create an advisory committee on improving public access to legislative information composed of people inside and outside of government. Congress should ensure that THOMAS lives up to its potential of making the legislative branch more open and transparent.
For more information, please contact Daniel Schuman, policy counsel, the Sunlight Foundation, at 202-742-1520 x 273 or email@example.com
Josh Tauberer has announced changes to his wonderful GovTrack website and service.
- We’ve made a few "tweaks" to GovTrack, by Josh Tauberer, GovTrack blog, (March 19, 2012).
...there are some things missing from GovTrack 2.0 that were in the old site, and if you need them you can still find them for now at http://legacy.govtrack.us, which continues to run the old site. I apologize for discontinuing some features, such as information on amendments, but with the site’s tiny budget I’m just not able to keep everything running at once...
GovTrack helps you find the status of U.S. federal legislation, voting records for the Senate and House of Representatives, information on Members of Congress, and congressional district maps.
Much of the information shown on GovTrack is assembled in an automated way from official government websites. primarily the website THOMAS which is the official website for the status of legislation run by the Library of Congress.
Hat Tip to InfoDocket!
Time once again for a selection of news and new resources that we hope will be an interest to the FGI community. The following posts are from INFOdocket.com (@infofodocket) where we compile and post new items daily. The oldest item in this roundup was posted on January 26, 2012.
From a blog post at Sunlight Foundation:
Yesterday, Rep. Bill Foster introduced a bill that would improve public access to legislative information. Specifically, H.R. 6289 calls for:
* Bulk access to THOMAS legislative summary and status data,
* The creation of an advisory committee that would issue recommendations on improving services provided by THOMAS, and
* The Library of Congress to work towards adding bulk access to the full text of legislation.
Rep. Foster Introduces Bill To Improve THOMAS, By Daniel Schuman, Sunlight Foundation (09/30/10).