Data
Big Data, Environmental Data
Submitted by jajacobs on Tue, 2008-09-09 08:04.The Journal Nature has a special issue about "Big Data" with articles by Clifford Lynch, Cory Doctorow, and others. The whole issue is worth reading and is freely available online for a short time.
Coping with floods of data is now one of science's biggest challenges. In this Nature Special, we assess the need to complement smart science with smart searching; look at what the next Google will be; talk to the pioneering biologists who are trying to use wiki-type web pages to manage and interpret data; and recall that the first mass data crunchers were not computers, but the remarkable women of Harvard's Observatory.
In the area of government information, David Goldston, the former chief of staff of the House Committee on Science, writes about environmental data.
- Big data: Data wrangling by David Goldston, Nature 455, no. 7209 (September 3, 2008): 15.
He notes that there is no set of environmental indicators that is regularly updated -- something akin to economic statistics -- and that a report by the Heinz Center on the State of the Nation's Ecosystems (www.heinzcenter.org/ecosystems) is chock-full of lists of subject and geographical areas for which few if any data exist.
He calls attention to the The Data Quality Act, which "has been anathema to environmental groups, which have seen it as a way to stymie regulation. And it has been primarily invoked by corporations questioning studies that raise alarms about their products." (The act is less than half a page in a public law of more seven hundred pages (Public Law 106-554 Sec. 515; Statutes at Large volume 114, pages 2763A-153 to 2763A-154, available online as plain text and as pdf).
He also says that, "Even when instrumentation is regularly funded, as some kinds of satellites are, money is often lacking to maintain the data or to make them sufficiently accessible or digestible."
- jajacobs's blog
- Add new comment
- Email this blog
- 247 reads
District of Columbia provides live data feeds
Submitted by jajacobs on Fri, 2008-07-04 08:22.I had not seen this before, but it looks like a model for open government. The District of Columbia provides free access to "city operational data" (e.g., Demographics, Health Care, Environment, Human Services, Education, Economic Development, Public Safety) in a variety of formats including RSS (Atom) feeds, XML, CSV, and ESRI Shapefiles. The feeds are drawn from more than 150 data sets, ranging from the all- important crime reports to pothole complaints.
- jajacobs's blog
- Add new comment
- Email this blog
- 381 reads
Ruggles report on preservation and use of economic data liberated!
Submitted by jrjacobs on Sat, 2008-05-10 13:31.A few weeks back, we posted a story about an Atlantic article from November, 1967 called, "The National Data Center and Personal Privacy" in which was discussed the idea of a National Data Center, the precursor to Total Information Awareness. It was such a hot topic of the day that Congress held a hearing on computers and the invasion of privacy of US citizens (The computer and invasion of privacy. Hearings, Eighty-ninth Congress, second session. July 26, 27, and 28, 1966. by United States. Congress. House. Committee on Government Operations. Special Subcommittee on Invasion of Privacy.)
I started reading the hearing, and found that Yale Economics Professor Richard Ruggles (NYT obituary from 2001) had also testified before that hearing. So I started poking around about Ruggles, looking in WorldCat and Google Scholar. I found quite a few citations to a document entitled, Report of the Committee on the Preservation and Use of Economic Data submitted to the Social Science Research Council in 1965.
But for such a well-cited document that spawned a Congressional hearing and much worry in the mainstream press about computers and privacy, there were only 3 libraries in the whole country that held the report. Imagine that!
Well, I decided to liberate the report, so -- after much finagling! -- got a copy, scanned it, and uploaded it to the Internet Archive. Score one for the digital public domain!!
I hope to see more libraries listed as having a copy in WorldCat in the near future. And if you've got any fugitive documents laying around your hard drive, send them to us here at admin AT freegovinfo DOT info. We'll make sure they get up on the open Web safe and secure in the Internet Archive!!
- jrjacobs's blog
- Add new comment
- Email this blog
- 387 reads
NetSquared $100K mashup competition
Submitted by James Staub on Wed, 2008-03-19 06:26.TechSoup's NetSquared mashup challenge pays out a total $100K this year to the best of 122 submitted projects.
I'm going to have way too much fun browsing through the project proposals. Casting my votes, though, is already proving difficult... Using user downtime? Massive mashup calendars? Warnings for hidden corporate abuses every time I make an online purchase?
Ohboyohboyohboyohboy.
And of course, many of these envisioned projects would not even be imaginable were it not for widely available and mostly reliable government information data sources.
- James Staub's blog
- 1 comment
- Email this blog
- 404 reads
Scientists fear committee's dissolution will result in lost data
Submitted by jajacobs on Sun, 2008-01-13 07:22.Scientists oppose move to restrict satellite data, by Les Blumenthal, McClatchy Newspapers, The Tacoma, WA News Tribune, January 13, 2008
There is a little-noticed but influential government committee known as the Civil Applications Committee, which, under the jurisdiction of the U.S. Geological Survey, reviews civilian requests for classified reconnaissance information that can be useful to scientists studying volcanoes, forest fires, earthquakes and landslides, climate change, hurricanes, flooding and pollution. Now the Bush administration plans to abolish the committee and create an office within the Department of Homeland Security to review such requests.
Rep. Norm Dicks, chairman of the House appropriations subcommittee with control over the Geological Survey and is the senior member of the House Homeland Security Committee, said in a letter to administration officials:
"We believe the elimination of the civilian orientation of the Civil Applications Committee represents explicit harm in the near term to USGS and other civilian federal agencies, and it represents a potentially serious harm over the longer term to the constitutional protections U.S. citizens expect and deserve."
- jajacobs's blog
- Add new comment
- Email this blog
- 585 reads
FEC data available as a widget and API!
Submitted by jajacobs on Fri, 2007-08-24 10:44.No, The FEC isn't doing this; MAPLight.org is. But, the FEC is providing the data in an an open format with detailed documentation which makes this all possible (see Files by Election Cycle at the FEC site).
MAPLight.org is providing access to Federal Elections Commission (FEC) data through an API (application programming interface) that makes it easy for any Web developer to build their own site or software program that displays or shares up-to-date campaign contributions from the FEC (www.maplight.org/widgets/apis) and through widgets (www.maplight.org/widgets) that allow anyone to track presidential fundraising on their own blogs, social media sites, and personal Web sites.
Both services, the widgets and the API, are free and open source, so anyone can use or modify them as they see fit.
Here is an example of a widget (but you can customize for your own site, of course!).
The MAPLight.org presidential widget is the first of several more widgets that the organization will release. By September 15, MAPLight.org will release a widget for U.S. Congress, showing total campaign contributions for each candidate for Congress. By September 30, MAPLight.org will release its "Money and Votes" widget, revealing correlations between campaign contributions and votes for any bill in U.S. Congress. To be notified when MAPLight.org releases these widgets, visit www.maplight.org/participate/signup. MAPLight.org is a nonprofit, nonpartisan organization based in Berkeley, California. Its search engine at MAPLight.org illuminates the connection between money and politics (MAP) via an unprecedented database of campaign contributions and legislative outcomes.
- jajacobs's blog
- Add new comment
- Email this blog
- 663 reads
Data Sharing between Agencies: FDA/DOD and VA/DOD
Submitted by aewest on Tue, 2007-08-07 12:55.Two stories today on agencies sharing data:
1. FDA, Defense Department Share Data to Enhance Medical Product Safety Reviews
http://www.fda.gov/bbs/topics/NEWS/2007/NEW01675.html
2. DOD and VA open a new medical data spigot
http://govhealthit.com/article103423-08-03-07-Web
In both cases there are clear advantages to sharing the data. In the case of the FDA, they can get access to much larger pools of results on clinical trials and actual use of drugs and medical devices; in the case of the DOD/VA share, doctors will be able to get a better picture of their patients' overall health and care since the VA and DOD populations overlap substantially. One obvious advantage would be the ability to prevent bad drug interactions because doctors would know everything prescribed to their patients.
Differences in the two sharing projects are that the first will be designed as a shared structure from the ground up while the DOD/VA project will work with pre-existing systems. Initially, the VA/DOD systems will not be fully compatible across software, but in time the Bidirectional Health Information Exchange (BHIE) program will evolve into Clinical Data Repository/Health Data Repository (CHDR) which will allow direct input/querying/reporting of health data.
I think we can assume that breaches of patient data will occur, especially as the data is restructured and/or designed from the beginning to facilitate interoperability. After all, one of the agencies above is the VA. So, are the benefits (well-described data is also more easily published and potentially more easily located data) worth the risk of leaked patient data?
Many, many people take multiple medications every day - some which interact, some which have detrimental effects that only become apparent after usage in groups far larger than those included in clinical trials. At the same time, data is lost on a regular basis by many agencies (see GAO's Personal Information: Data Breaches...). Yet, evidence of actual harm from data breaches is limited (although GAO notes that absence of evidence doesn't equal evidence of absence). The GAO report on Personal Information says
For example, more than 570 data breaches were reported in the news media from January 2005 through December 2006, according to lists maintained by private groups that track reports of breaches. ... The extent to which data breaches have resulted in identity theft is not well known, largely because of the difficulty of determining the source of the data used to commit identity theft. However, available data and interviews with researchers, law enforcement officials, and industry representatives indicated that most breaches have not resulted in detected incidents of identity theft, particularly the unauthorized creation of new accounts. For example, in reviewing the 24 largest breaches reported in the media from January 2000 through June 2005, GAO found that 3 included evidence of resulting fraud on existing accounts and 1 included evidence of unauthorized creation of new accounts. For 18 of the breaches, no clear evidence had been uncovered linking them to identity theft; and for the remaining 2, there was not sufficient information to make a determination.
- aewest's blog
- Add new comment
- Email this blog
- 962 reads



Recent comments
1 day 6 hours ago
1 day 21 hours ago
3 days 5 hours ago
3 days 8 hours ago
4 days 22 hours ago
5 days 4 hours ago
5 days 21 hours ago
6 days 1 hour ago
1 week 1 day ago
1 week 2 days ago