thomas

The U.S. Code: freed from THOMAS

Three open government access advocates (Sunlight Foundation developer Eric Mill, GovTrack.us founder Josh Tauberer and New York Times developer Derek Willis) have put the United States Code on Github.

  • github.com/unitedstates
  • The United States (Code) is on Github, by Alex Howard, O'Reilly Radar (December 6, 2012).

    This fall, a trio of open government developers took it upon themselves to do what custodians of the U.S. Code and laws in the Library of Congress could have done years ago: published data and scrapers for legislation in Congress from THOMAS.gov in the public domain. The data at github.com/unitedstates is published using an "unlicense" and updated nightly.

    ..."It would be fantastic if the relevant bodies published this data themselves and made these datasets and scrapers unnecessary," said Mill, in an email interview. "It would increase the information's accuracy and timeliness, and probably its breadth. It would certainly save us a lot of work!"

Perhaps even more importantly, the project has released its computer code so that others will be able to scrape Thomas to build their own datasets of legislative data. The computer code also includes a U.S Code parser, which is significant because none of various formats in which the government produces the U.S. Code are suitable for easy reuse.

I also think it is fantastic that these developers understand the difference between putting information on the web in various hard-to-use, hard-to-preserve, and often hard-to-parse formats and actually publishing the data so that it can be easily obtained, used, and re-used. As Mill notes, publishing information makes scraping the web unnecessary, and publishing in open formats makes it much simpler to preserve information.

THOMAS bulk download!

Eric Mill announced today on the openhouseproject mailing list that he and Josh Tauberer (of GovTrack.us) and Derek Willis have completed a milestone in their project to produce a public domain scraper and dataset from THOMAS.gov. Here is the text of his message with links:

Hi all,

I've been working for the last month or two with Josh Tauberer (of GovTrack.us http://govtrack.us/) and Derek Willis on a project to produce a public domain scraper and dataset from THOMAS.gov http://thomas.gov/, the official source for legislative information for the US Congress.

It's a reasonably well documented set of Python scripts, which you can find here: https://github.com/unitedstates/congress

We just hit a great milestone - it gets everything important that THOMAS has on bills, back to the year THOMAS starts (1973). We've published and documented https://github.com/unitedstates/congress/wiki all of this data in bulk, and I've worked it into Sunlight's pipeline, so that searches for bills in Scout https://scout.sunlightfoundation.com/search/federal_bills/freedom%20of%2... use data collected directly from this effort.

The data and code are all hosted on Github on a "unitedstates https://github.com/unitedstates/" organization, which is right now co-owned by me, Josh, and Derek - the intent is to have this all exist in a common space. To the extent that the code needs a license at all, I'm using a public domain "unlicense https://github.com/unitedstates/congress/blob/master/LICENSE" that should at least be sufficient for the US (other suggestions welcome).

There's other great stuff in this organization, too - Josh made an amazing donation of his legislator dataset https://github.com/unitedstates/congress-legislators, and converted it to YAML for easy reuse. I've worked that dataset into Sunlight's products already as well. I've also moved my legal citation extractor https://github.com/unitedstates/citation into this organization -- and my colleague Thom Neale has an in-progress parser for the US Code https://github.com/unitedstates/uscode, to convert it from binary typesetting codes into JSON.

Github's organization structure actually makes possible a very neat commons. I'm hoping this model proves useful, both for us and for the public.

-- Eric

-- Developer | sunlightfoundation.com

Congress.gov, the new THOMAS, launched in beta

The Library of Congress unveiled a new Web search tool for bills and other Congressional records Wednesday that will eventually replace the 17-year-old Thomas.gov website.

  • Congress.gov. Also see: About page.

    Congress.gov makes federal United States legislative information freely available to the public. Launched Sept. 19, 2012, this version of the site is an initial beta release of Congress.gov, created as a successor to THOMAS.gov, the current public site for legislative information. The Congress.gov beta site contains legislation from the 107th Congress (2001) to the present, member of Congress profiles from the 93rd Congress (1973) to the present, and selected member profiles from the 80th through the 92nd Congresses (1947 to 1972). Over the next two years, Congress.gov will be adding information and features, eventually incorporating all of the information currently available on THOMAS.gov.

  • Smartphone friendly, congressional search site unveiled, By Joseph Marks, NextGov (Sep 19, 2012).
     
  • Congress launches THOMAS successor Congress.gov, by Daniel Schuman, Sunlight Foundation (Sept. 19, 2012)

    What's noticeable about this evolving beta website, besides the major improvements in how people can search and understand legislative developments, is what's still missing: public comment on the design process and computer-friendly bulk access to the underlying data.

Update:
Here is another story:

  • What Congress.gov Means for a Congressional API, by Nick Judd and Miranda Neubauer TechPresident (September 19 2012)

    "I'm impressed," said Josh Tauberer, whose GovTrack scrapes data from THOMAS to provide it in a machine-readable form for other websites like OpenCongress, in an email. "From its new faceted search to its mobile-friendly HTML, they really hit the technology on the nail. And there's more explanation for people who aren't legislative pros. They may be slowly catching up to GovTrack.

    "This new site shows that the LOC actually has the technical chops to implement raw data properly, which was a serious concern of mine before," Tauberer also wrote.

    That said, Tauberer pointed out that the new site offers "no new actual information." House leadership has promised to offer access to the underlying data that fuels THOMAS and has repeatedly expressed a commitment to doing it. They just haven't committed to doing it during this Congress. And the lack of action on something that seems to them to be eminently doable has advocates kind of frustrated.

    Gayle Osterberg, Director of Communications for the Library of Congress, seemed to indicate in an email that the Library of Congress is ready to cooperate. They just need Congress — meaning the House and Senate both — to give them the go-ahead.

Another update:

  • Congress.gov Beta: An Early Look at a New THOMAS, by Peggy Garvin, InfoToday, (September 27, 2012).

    The Congress.gov beta is still in the early stages of incorporating existing THOMAS content and implementing the improved search functions that THOMAS users have been waiting for. The Law Library of Congress, which is managing the transition, is anxious to get your feedback and suggestions via its form at http://beta.congress.gov/survey.

Sunlight on Thomas (Beta) and LIS and the future of Legislative Information

Looking Forward to the THOMAS Beta Website, by Daniel Schuman, Sunlight Foundation (Sept. 14, 2012).

In the near future, Congress is expected to release a major upgrade to its aging legislative information website THOMAS. The long-overdue update is part of a much larger effort to "enhance the effectiveness of mission-critical systems," a response to significant public and internal pressure to improve congressional efficiency and transparency. The launch of "THOMAS Beta" is the first step towards developing what the Library of Congress describes as a completely "modern legislative information system" that will replace THOMAS and Congress' more sophisticated internal legislative tracking website "LIS" in FY 2014. Both THOMAS and LIS will stay online alongside the beta website for several years.

While THOMAS Beta has been shown to stakeholders inside Congress, as far as I am aware there has been no formal engagement process with the public to identify specifications, discuss wireframes, or generally make sure the site meets the public's needs.

Comparing LIS and Thomas

The Congressional Research Service has published an update to its handy guide for finding current legislation and regulations:

For those experienced in legislative and regulatory searching there won't be anything new or surprising here, but it is a handy introduction and reference.

One thing I particularly liked was the comparison on p. 13 of the "Legislative Information System," which provides access to legislative information to Members of Congress and their staff, and THOMAS, which makes information on federal legislation freely available to the public. That's right, one system for Congress and a separate system for us ordinary folk.

Here is a sample:

LIS THOMAS
Best used for Finding the most complete legislative information Best used for Working with constituents
Links from Bill Summary & Status display to CRS reports No CRS reports
Links to Capitol Hill and selected outside sources of floor and committee schedule information. Minimal links
Special advanced search capabilities Advanced search capabilities only in Bill Summary & Status database

Again, this won't be news to most of you, but it is a nice summary of what we are missing.

#FreeTHOMAS

Time to contact your representatives!

  • #FreeTHOMAS, by Daniel Schuman, Sunlight Foundation (June 4, 2012)

    The better approach is for Congress to publish the data behind THOMAS. Government regularly does this elsewhere, and "bulk data" is responsible for clever new uses of information developed by citizens, journalists, and even the government itself.

    In upcoming days, the House is likely to pass legislative language that pays lip service to releasing THOMAS data while putting the idea in a deep freeze. This would be a disaster. But it's not too late. Tell your representative that you want Congress to publish legislative data now.

Help improve public access to Congressional/legislative information #FDLP

FGI just signed the letter below written by the Sunlight Foundation asking Congress to improve public access to legislative information by directing the Library of Congress to make their Thomas database accessible in bulk format. If you and/or your organization believe that free access to Congressional information is of critical importance, please please consider adding your name to the list of signatories on the letter. Daniel Schuman, Sunlight Foundation's policy counsel and director of the Advisory Committee on Transparency, requests that people sign on by COB on Monday April 2nd. Interested people may also email Daniel at dschuman@sunlightfoundation.com) with how they would like to be identified on the letter. Daniel thanks you and so do we!


Dear Congressman/Senator:

We are writing to ask you to improve public access to legislative information by directing the Library of Congress to publish the THOMAS database online. Congress created THOMAS with the mission of making federal legislation freely available to the public. While times have changed, and technologies have changed, THOMAS has not kept up.

As a result, millions of Americans access basic information about legislation and congressional actions through online information providers like GovTrack, OpenCongress, and Washington Watch. These free non-governmental websites are forced to rely on brittle programs to harvest information from THOMAS’s complex website. This harvesting is imperfect, expensive, and time consuming. The better approach -- which has been adopted by industry and many in government -- is to publish legislative information "in bulk" in addition to other means.

Bulk access would in essence make the entire legislative database available for download, instead of requiring users to gather information by visiting hundreds or thousands of web pages. It would make it easier for third parties to build innovative new tools, and ensure that Americans have the most accurate information at their fingertips. Congress already expressed its support for bulk access downloads in 2009, but the Library of Congress, which oversees THOMAS, has not acted. In the meantime, GPO, the executive branch, and the House of Representatives are already publishing information online in bulk.

The time has come for action. In this year's legislative branch appropriations bill, we urge you to direct the Library of Congress to implement bulk access to THOMAS within 120 days. The Library should also immediately create an advisory committee on improving public access to legislative information composed of people inside and outside of government. Congress should ensure that THOMAS lives up to its potential of making the legislative branch more open and transparent.

For more information, please contact Daniel Schuman, policy counsel, the Sunlight Foundation, at 202-742-1520 x 273 or dschuman@sunlightfoundation.com

GovTrack 2.0

Josh Tauberer has announced changes to his wonderful GovTrack website and service.

  • We’ve made a few "tweaks" to GovTrack, by Josh Tauberer, GovTrack blog, (March 19, 2012).

    ...there are some things missing from GovTrack 2.0 that were in the old site, and if you need them you can still find them for now at http://legacy.govtrack.us, which continues to run the old site. I apologize for discontinuing some features, such as information on amendments, but with the site’s tiny budget I’m just not able to keep everything running at once...

  • GovTrack.

    GovTrack helps you find the status of U.S. federal legislation, voting records for the Senate and House of Representatives, information on Members of Congress, and congressional district maps.

    Much of the information shown on GovTrack is assembled in an automated way from official government websites. primarily the website THOMAS which is the official website for the status of legislation run by the Library of Congress.

Hat Tip to InfoDocket!

A Roundup of Recent Government Info News and New Resources

Time once again for a selection of news and new resources that we hope will be an interest to the FGI community. The following posts are from INFOdocket.com (@infofodocket) where we compile and post new items daily. The oldest item in this roundup was posted on January 26, 2012.

1. President Requests $231,953,777 for Institute of Museum and Library Services (IMLS)

2. MEDLINE/PubMed: List of Serials Indexed for Online Users, 2012 Now Available in XML

3. South Dakota: State Archives Going Digital

4. Recently Launched iOS App: United Nations News Reader from the UN News Centre

5. Full Text of Prepared Testimony: Librarian of Congress, Public Printer, & Others Testify at House Appropriations Committee Hearing (re: FY 2013 Budget)

6. Montana: “New State Librarian Leads Digitization”

7. Government Information: A New Issue of the FDLP Connection Newsletter is Now Online (Vol. 2, Issue 2)

8. New Reference Resource: PACrimeStats.Info (Pennsylvania Crime Data)

9. EPA Releases New Interactive Tool with Information About Water Pollution Across the U.S.

10. FEMA Grant Helps Restore New Orleans’ Katrina-Damaged Archives

11. Listen Online: National Park Service Releases Historic Audio Recordings Made by Thomas Edison’s Recording Engineer

12. New Feature: The World Factbook Now Allows Users to Listen to the National Anthems of Most Countries

13. U.S. Congress: THOMAS Adds Direct Links to House Committee Hearings

14. New Document from NIH: Public Access Policy Implications

15. New Database: See Who’s Donating to Super PACs

16. LOCPix: New iOS App Provides Access to Digitized Photos from the Library of Congress

17. New Interactive Reference Resource: State Transportation Facts and Figures

18. U.S. Congress: Financial Contributions: MapLight Launches New Company Pages

19. Let’s Fly! FAA Launches Mobile Web App

20. New Search Tool from the IRS: Exempt Organizations Select Check

Bulk access to THOMAS and Legislation

From a blog post at Sunlight Foundation:

Yesterday, Rep. Bill Foster introduced a bill that would improve public access to legislative information. Specifically, H.R. 6289 calls for:

* Bulk access to THOMAS legislative summary and status data,
* The creation of an advisory committee that would issue recommendations on improving services provided by THOMAS, and
* The Library of Congress to work towards adding bulk access to the full text of legislation.

Rep. Foster Introduces Bill To Improve THOMAS, By Daniel Schuman, Sunlight Foundation (09/30/10).

Syndicate content Syndicate content