Hot off the presses from the National Academies is this prepublication version of a report Frontiers in Massive Data Analysis. This is a really nice survey of much of the state of the art and current issues involved in “big data.” Govt information librarians owe it to themselves to become well-versed as more and more researchers across many disciplines will become interested in govt information as a corpus to do larger analysis (I’m already getting questions about corpus research!).
From Facebook to Google searches to bookmarking a webpage in our browsers, today’s society has become one with an enormous amount of data. Some internet-based companies such as Yahoo! are even storing exabytes (10 to the 18 bytes) of data. Like these companies and the rest of the world, scientific communities are also generating large amounts of data-—mostly terabytes and in some cases near petabytes—from experiments, observations, and numerical simulation. However, the scientific community, along with defense enterprise, has been a leader in generating and using large data sets for many years. The issue that arises with this new type of large data is how to handle it—this includes sharing the data, enabling data security, working with different data formats and structures, dealing with the highly distributed data sources, and more.
Frontiers in Massive Data Analysis presents the Committee on the Analysis of Massive Data’s work to make sense of the current state of data analysis for mining of massive sets of data, to identify gaps in the current practice and to develop methods to fill these gaps. The committee thus examines the frontiers of research that is enabling the analysis of massive data which includes data representation and methods for including humans in the data-analysis loop. The report includes the committee’s recommendations, details concerning types of data that build into massive data, and information on the seven computational giants of massive data analysis.
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.