Home » Posts tagged 'bulk data' (Page 2)

Tag Archives: bulk data

Our mission

Free Government Information (FGI) is a place for initiating dialogue and building consensus among the various players (libraries, government agencies, non-profit organizations, researchers, journalists, etc.) who have a stake in the preservation of and perpetual free access to government information. FGI promotes free government information through collaboration, education, advocacy and research.

Bill summaries in XML

GPO and the Library of Congress are now making summaries of House bills available in XML format for bulk data download from GPO’s Federal Digital System (FDsys).

FOR IMMEDIATE RELEASE: February 4, 2014 No. 14-03


WASHINGTON – The U.S. Government Printing Office (GPO) has partnered with the Library of Congress (LOC) to make House of Representatives bill summaries available in XML format for bulk data download from GPO’s Federal Digital System (FDsys). Bill summaries are prepared by the LOC’s Congressional Research Service and describe the most significant provisions of a piece of legislation. They also detail the effects the legislative text may have on current law and Federal programs. The bill summaries are part of FDsys’ Bulk Data repository starting with the 113th Congress. Making House bill summaries available in XML permits data to be reused and repurposed for mobile web applications, data mashups, and other analytical tools by third party providers, which contributes to openness and transparency in Government. This project commenced at the direction of the House Appropriations Committee, and is in support of the task force on bulk data established by the House.

“GPO and the LOC have been important partners in working towards House Leadership’s goal of increasing transparency by making more data available in bulk data formats,” said Karen Haas, Clerk of the House. “The successful completion of the bill summaries project marks another positive step in that direction.”

GPO already makes House bills as well as the Federal Register, the Code of Federal Regulations, and other documents from the executive branch available in XML format for bulk data download.

Link to House bill summaries on FDsys: http://www.gpo.gov/fdsys/bulkdata/BILLSUM

THOMAS bulk download!

Eric Mill announced today on the openhouseproject mailing list that he and Josh Tauberer (of GovTrack.us) and Derek Willis have completed a milestone in their project to produce a public domain scraper and dataset from THOMAS.gov. Here is the text of his message with links:

Hi all,

I’ve been working for the last month or two with Josh Tauberer (of GovTrack.us http://govtrack.us/) and Derek Willis on a project to produce a public domain scraper and dataset from THOMAS.gov http://thomas.gov/, the official source for legislative information for the US Congress.

It’s a reasonably well documented set of Python scripts, which you can find here: https://github.com/unitedstates/congress

We just hit a great milestone – it gets everything important that THOMAS has on bills, back to the year THOMAS starts (1973). We’ve published and documented https://github.com/unitedstates/congress/wiki all of this data in bulk, and I’ve worked it into Sunlight’s pipeline, so that searches for bills in Scout https://scout.sunlightfoundation.com/search/federal_bills/freedom%20of%20information use data collected directly from this effort.

The data and code are all hosted on Github on a “unitedstates https://github.com/unitedstates/” organization, which is right now co-owned by me, Josh, and Derek – the intent is to have this all exist in a common space. To the extent that the code needs a license at all, I’m using a public domain “unlicense https://github.com/unitedstates/congress/blob/master/LICENSE” that should at least be sufficient for the US (other suggestions welcome).

There’s other great stuff in this organization, too – Josh made an amazing donation of his legislator dataset https://github.com/unitedstates/congress-legislators, and converted it to YAML for easy reuse. I’ve worked that dataset into Sunlight’s products already as well. I’ve also moved my legal citation extractor https://github.com/unitedstates/citation into this organization — and my colleague Thom Neale has an in-progress parser for the US Code https://github.com/unitedstates/uscode, to convert it from binary typesetting codes into JSON.

Github’s organization structure actually makes possible a very neat commons. I’m hoping this model proves useful, both for us and for the public.

— Eric

— Developer | sunlightfoundation.com

Josh Tauberer Gets Congress Into the 21st Century

This article reports on the importance of a bill that will enable Congress to provide bulk access to its legislative data. It also profiles one of the heroes of open-access to Congressional data, Josh Tauberer. As the Post says, Josh has prodded Congress and the result may be the “raw material for an Angie’s List or a Yelp for Congress, a way for modern users to evaluate lawmakers with the same kind of crowdsourced help that they use to evaluate lunch.”

This is a lot like how Carl Malamud got the SEC to put the EDGAR database online. (SEC’S EDGAR On Net, What Happened And Why, TAP-INFO, 30 Nov 1993).

Congressional data may soon be easier to use online, by David A. Fahrenthold, Washington Post, (June 8, 2012)..

Online, searching for a bill in Congress feels a little like time travel: Go looking for legislation, and you wind up in the Internet of 1995.

At Congress’s ’90s-vintage archive site, there’s no way to compare bills side by side. No tool to measure the success rate of a bill’s sponsor. And there’s certainly no way to leave a comment. Congress makes it hard for outside sites to do any of this, either, by refusing to give out bulk data on its bills in a user-friendly form.

On Friday, that might start to change.

update on bulk access to legislative information

Rep. Crenshaw backs down, loses control over bulk data issue, by Josh Tauberer, GovTrack.us (June 7, 2012).

The government data that makes GovTrack go has been the center of what looks like a failed political power play over the last week. Rep. Crenshaw, whose appropriations subcommittee issued a draft report last week that nearly halted access to “bulk data downloads,” now “agree[s] to free legislative information” according to a statement written jointly with House leaders yesterday.


Time to contact your representatives!

  • #FreeTHOMAS, by Daniel Schuman, Sunlight Foundation (June 4, 2012)

    The better approach is for Congress to publish the data behind THOMAS. Government regularly does this elsewhere, and “bulk data” is responsible for clever new uses of information developed by citizens, journalists, and even the government itself.

    In upcoming days, the House is likely to pass legislative language that pays lip service to releasing THOMAS data while putting the idea in a deep freeze. This would be a disaster. But it’s not too late. Tell your representative that you want Congress to publish legislative data now.