Hot off the presses from the National Academies is this prepublication version of a report Frontiers in Massive Data Analysis. This is a really nice survey of much of the state of the art and current issues involved in "big data." Govt information librarians owe it to themselves to become well-versed as more and more researchers across many disciplines will become interested in govt information as a corpus to do larger analysis (I'm already getting questions about corpus research!).
For those of you that willl be in Washington DC next week, please consider attending the 2013 Legislative Data and Transparency Conference (RSVP required). There will be several interesting panels with House and external stakeholders like the Sunlight Foundation and the Cornell Legal Information Institute -- including a panel on electronic archiving and one on "missing data" and what to do about it ("missing" meaning not effectively on-line and digital, etc.).
The 2013 Legislative Data and Transparency Conference will take place on Wednesday, May 22, 2013, from 9 a.m. to 5 p.m. in the Capitol Visitor Center Auditorium. The conference brings together legislative branch agencies with data users and transparency advocates to discuss the use and future of legislative data. Topics include:
--Electronic legislative archiving
--XML and metadata standards
--Updates on beta.congress.gov
A story in the scholarly kitchen quotes a leaked draft of a new federal government data policy saying:
...every department and agency is directed to inventory all of its funded datasets and put them all into Data.gov to the extent practicable. This is basically a fundamental change from voluntary to mandatory inclusion and from "a few of your best" to "everything you have."
Read the complete article here:
- Leaked Data Policy Raises Monster STM Data Issues, by David Wojick, the scholarly kitchen (Jan 17, 2013)
...there is almost no STM research data in Data.gov, just a few bits and pieces...
All this may now change because the draft data policy takes a new approach to feeding Data.gov. Now, every department and agency is directed to inventory all of its funded datasets and put them all into Data.gov to the extent practicable. This is basically a fundamental change from voluntary to mandatory inclusion and from "a few of your best" to "everything you have."
The Bureau of Labor Statistics (BLS) has two new ("beta") tools for finding and visualizing statistical data:
The Inter-university Consortium for Political and Social Research (ICPSR), the large social-sciences data archive at the University of Michigan, is hosting a series of webcasts on October 1 through 3 that feature election data held in ICPSR's archives. The webcasts will show how to use them for analysis and teaching.
- Welcome to the 2012 ICPSR Data Fair!, ICPSR.
The event is designed for the social sciences data community at large including researchers, librarians, teaching faculty, students, and policymakers from around the world who are interested in the use of social science data.
The first day will provide an orientation to ICPSR's services, including a tutorial on navigating our newly redesigned Web site. Other topics will include the American National Election Studies, minority voting behavior, and using election data in classroom instruction.
The event is open to everyone, and will be conducted using GoToWebinar technology; you can watch each presentation from your computer without the need to download any software.
A schedule of sessions is available, including links to the sessions themselves.
The Census Bureau provides the Research Data Products page with links to new tools that make data more accessible and understandable. Bureau researchers also create new data products from existing data collections.
There are some very interesting services here! Check out the innovative "synthetic data" projects: Synthetic Survey of Income and Program Participation (a Beta version of synthetic microdata on individuals) and the Synthetic Longitudinal Business Database (Beta version of synthetic microdata on all U.S. establishments) as well as the more traditional: Small Area Income and Poverty Estimates Interactive Map Tool and Quarterly Workforce Indicators and much more!
- Research Data Products
- Demographic - People and Households
- Economic - Businesses
- Longitudinal Employer-Household Dynamics - Workforce
The federal government's data portal data.gov has a space for American cities to make their data available: cities.data.gov. Data from four cities, Chicago, New York, San Francisco, and Seattle, are available so far.
Showcasing the applications and opportunities for harnessing the power of open data across the nation. City officials and developers working together to help improve the information available to city residents. Data in Cities.Data.Gov is not federal data.
- We Want You: City Data Edition, by Nate Berg, The Atlantic Cities (Aug 02, 2012).
The new clearinghouse features thousands of openly accessible data streams, including information on building permits filed in these cities, a regularly updated feed of Seattle Fire Department 911 dispatches, budget documents and tons of maps of things like parks, film locations and building footprints.
Chicago has 1,826 data feeds on the site, New York has 1,087, Seattle has 711, and San Francisco has 310. The federal government has made 6,560 of their own available.
Finding raw data or the statistics generated from those data can be a daunting task. There is not "Books in Print" for data. Two recent developments should help.
- OpenMetadata.org Community Site Launched, by Christine Connors, Information Today (June 18, 2012).
A new web portal was announced at the recent IASSIST conference in Washington D.C. Designed to make working with metadata easier, OpenMetadata.org (OM) is the product of Metadata Technology North America and Integrated Data Management Services, which created the site to "facilitat[e] access to standards based innovative technologies for the management of socio-economic, scientific, and other statistical data." Though the site is still in its initial deployment, the goal is for it to become a go-to resource for discovery, access, and tools for using statistical metadata.
The site currently focuses on two metadata standards: the Data Documentation Initiative (DDI) and the Statistical Data and Metadata Exchange (SDMX) standard.
DataCatalogs.org aims to be the most comprehensive list of open data catalogs in the world. It is curated by a group of leading open data experts from around the world - including representatives from local, regional and national governments, international organisations such as the World Bank, and numerous NGOs.
See particularly: the OpenMetadata Survey Catalog, a portal aggregating information on surveys from data producers and archives around the globe. The catalog enables you to perform complex searches across studies and variables and browse through comprehensive metadata.
There are at least two ways to look at this story from National Journal's technology newsletter.
- Data, Data Everywhere, By Adam Mazmanian, Tech Daily Dose (May 16, 2012).
It's not clear why access to 600 gazillion terabytes (or thereabouts) of free, machine-readable data covering traffic accidents, copper smelting, phytoplankton cell counts and other fascinating, everyday topics have only inspired, at last count, 85 mobile apps.
One is that government hasn't found the right incentives to attract development of applications that make use of the wealth of government data in datasets that are more easily available than ever. This explanation is probably what drove the administration to host a "data pep rally... designed to stimulate interest in translating raw data into simple, navigable apps that consumers can use on mobile devices" today.
Another is that the whole idea of relying on the private sector to make information freely useable and useful (see, for example, The Federal Government Must Reimagine Its Role As An Information Provider) is not sufficient. This free-market approach to government information suggests limiting the role of governments to that of providing raw data to developers. This approach assumes that the market will turn that raw data into useful information products.
There is, I believe, reason to be concerned about the free-market approach to government information.
One reason is that, by reducing the role of government we will not gain better or more complete access to information; we will diminish and reduce our access to information. We can see that already with the Census Bureau's cancellation of the Statistical Abstract (see The demise of the Statistical Abstract and other critical Census titles.) With this model, the government stops producing useful information packages and the private sector does its best to fill the gap and charges a lot of money to do so. That has a lot of bad side effects, though. For one thing, it puts a cost barrier between the information and users. For another, to use the Statistical Abstract example, it is not even clear that the private sector can do more than imitate the product the government produced. (See all the tables in the StatAb that contain "unpublished" data from government agencies. For example, in section 2, "Births, Deaths, Marriages, and Divorces," I count 12 tables with unpublished data; in section 4, "Education," I count 32 tables with unpublished data. [counts from the 2012 Statistical Abstract].)
But there is another alternative. We could recognize that the government does have an important role in packaging raw data into meaningful packages of statistical tables, reports, views, and end-user-ready information. This makes sense for two reasons: First, it builds on the idea that information gathered and created by the government is public information and should be easily, freely, publicly usable by the public. That means that the government, which knows this information that it gathered and created best, should create the first package or product or view of that information. This is still, mostly, the default way governments behave for lots of government information. They use everything from press releases of current economic statistics, to amazingly useful reports like the Special Studies (P-23) series from the Census Bureau, to complex web sites like that at the The Bureau of Labor Statistics. Second, it makes sense because these government-produced information products will be better than any "pep rally" to attract others (private sector, public sector, and individual users) to dig into the raw data, to analyze the data, and to develop apps.
Building on our post from a couple of days ago, Outgoing U.S. Census Bureau director Robert Groves just posted on the Census Bureau Director's blog "A Future Without Key Social and Economic Statistics for the Country" in which he simply *blasts* the US House decision to pass the Webster-Lankford amendment to HR 5326 appropriations for the Departments of Commerce and Justice, Science, and Related Agencies for the fiscal year ending September 30, 2013
"This bill devastates the nation's statistical information about the status of the economy and the larger society. modern societies need current, detailed social and economic statistics. the US is losing them."
[HT to Alesia McManus at the "Save the Statistical Abstract" facebook group]