Home » Posts tagged 'Data' (Page 2)
Tag Archives: Data
- Introducing the ProPublica Data Store, by Scott Klein and Ryann Grochowski Jones, ProPublica (Feb. 26, 2014).
- The ProPublica Data Store.
The store includes links to open government data and links to raw, as-is datasets ProPublica has obtained from government sources. Data that ProPublica has assembled by scraping and assembling material from web sites and out of Acrobat documents and has cleaned or merged from different sources in a way that’s never been done before, are available for purchase. All these datasets are from a growing collection of the data ProPublica has used in its reporting.
For datasets that are the result of significant expenditures of ProPublica’s time and effort, they charge a reasonable one-time fee: In most cases, it’s $200 for journalists and $2,000 for academic researchers.
Today the Public Library of Science (PLOS) revised its already-strong data policy. As of March 3, 2014, authors in all PLOS journals will be required to provide a Data Availability Statement. Additionally, all underlying data must be publicly available in 1 of 3 places: the body of the manuscript, in the supporting information, or in a stable, public repository which provides an accession number of Digital Object Identifier (DOI). Kudos to PLOS for furthering public access to research data in the “pursuit of scientific advances.”
Access to research results, immediately and without restriction, has always been at the heart of PLOS’ mission and the wider Open Access movement. However, without similar access to the data underlying the findings, the article can be of limited use. For this reason, PLOS has always required that authors make their data available to other academic researchers who wish to replicate, reanalyze, or build upon the findings published in our journals.
In an effort to increase access to this data, we are now revising our data-sharing policy for all PLOS journals: authors must make all data publicly available, without restriction, immediately upon publication of the article. Beginning March 3rd, 2014, all authors who submit to a PLOS journal will be asked to provide a Data Availability Statement, describing where and how others can access each dataset that underlies the findings. This Data Availability Statement will be published on the first page of each article.
The National Climatic Data Center (NCDC) is a gold mine of weather and climate data. Land-based, marine, model, radar, weather balloon, satellite, and paleoclimatic are just a few of the types of datasets available. Want to learn more? Attend this webinar (which is actually the first in a 3-part series of webinars!) on Wednesday February 26, 2014 at 2pm Eastern / 11am Pacific.
The first webinar in a 3 part series, “NCDC-The World’s Largest Climate Data Archive” will be presented on Wednesday, February 26th at 2pm EST. Register today! An overview of the 3 NOAA data centers can be found in the webinar series announcement.
- Title: NCDC-The World’s Largest Climate Data Archive
- Date: Wednesday, February 26th
- Start time: 2pm EST
- Duration: 60 minutes
- Greg Hammer , Meteorologist, NCDC
- Scott Stephens, Meteorologist, NCDC
- Stuart Hinson, Meteorologist, NCDC
- Mara Sprain, MALS Librarian, NCDC
- Susan Osborne, Technical Writer and Communications Specialist, NCDC
Summary: NOAA’s National Climatic Data Center (NCDC) maintains the world’s largest climate data archive and provides climatological services and data to every sector of the United States economy and to users worldwide. Records in the archive range from paleoclimatic data, to centuries-old journals, to data less than an hour old. The Center’s mission is to preserve these data and make them available to the public, business, industry, government, and researchers.
Data come to NCDC from not only land-based stations but also from ships, buoys, weather balloons, radars, satellites, and even sophisticated weather and climate models. With these data, NCDC develops national and global datasets. The datasets are used to maximize the use of our climatic and natural resources while also minimizing the risks caused by climate variability and weather extremes. NCDC has a statutory mission to describe the climate of the United States, and it acts as the “Nation’s Scorekeeper” regarding the trends and anomalies of weather and climate. NCDC’s climate data have been used in a variety of applications including agriculture, air quality, construction, education, energy, engineering, forestry, health, insurance, landscape design, livestock management, manufacturing, national security, recreation and tourism, retailing, transportation, and water resources management.
Participation is free, however registration is required. Upon registering, an e-mail confirmation of registration will include instructions for joining the Webinar.
We recently became aware of a new(ish) app from the Sunlight Foundation. It is the Congress App and is available for both iOS and Android. We think anyone who is interested in keeping tabs on Congress and who owns a smartphone ought to download this app.
I (Daniel) have the Android version, which is divided into these sections:
- People (Representatives and Senators)
- The Floor
Because of the way that Congress itself chooses to disseminate information the public, bill information and vote information can be delayed. Although it is much easier to have the latest Congressional votes at your fingertips instead of digging to find them.
People is great. It was easy for me to add my Congressional delegation to a tracking list. For each Member of Congress you can do the following:
- Call their office
- Visit their website
- View their voting record
- See their sponsored bills
- View committees they are a part of
- See news from across the internet mentioning your Member of Congress.
As a full time information activist and an on and off political junkie and social justice person, I find this app incredibly helpful. I was also able to put it to immediate use.
In what could be a whole other post, the American Library Association (ALA) Washington Office is reporting that the secret negotiations on the Trans Pacific Partnership (TPP) has this bad news for the Public Domain:
If you use the public domain — which we all do — we’re all going to get stiffed, because there are proposals to lengthen the Berne-mandated terms from life + 50 years, to life + 70, or even life + 100 years.
There’s other bad news for copyright, including bad news for creators. There’s disturbing news on other fronts regarding the TPP, so I urge you to read the whole article.
I read ALA’s blog post right after installing the Congress App. So I used it to visit the websites of my two Senators and House member and send quick e-mails urging them to reject “fast tracking” the TPP and telling them I found ANY further extension of copyright terms unacceptable. I hope you’ll take the same message to your Congress people. You don’t have to use Sunlight’s app, but it does make it easier.
Moving back to the app itself, I wanted to remind you that free apps like these are only possible because Congressional information is publicly available. If Congress decided to go back into paper or only license its digital data to one vendor, we couldn’t have things like this.
Hot off the presses from the National Academies is this prepublication version of a report Frontiers in Massive Data Analysis. This is a really nice survey of much of the state of the art and current issues involved in “big data.” Govt information librarians owe it to themselves to become well-versed as more and more researchers across many disciplines will become interested in govt information as a corpus to do larger analysis (I’m already getting questions about corpus research!).
From Facebook to Google searches to bookmarking a webpage in our browsers, today’s society has become one with an enormous amount of data. Some internet-based companies such as Yahoo! are even storing exabytes (10 to the 18 bytes) of data. Like these companies and the rest of the world, scientific communities are also generating large amounts of data-—mostly terabytes and in some cases near petabytes—from experiments, observations, and numerical simulation. However, the scientific community, along with defense enterprise, has been a leader in generating and using large data sets for many years. The issue that arises with this new type of large data is how to handle it—this includes sharing the data, enabling data security, working with different data formats and structures, dealing with the highly distributed data sources, and more.
Frontiers in Massive Data Analysis presents the Committee on the Analysis of Massive Data’s work to make sense of the current state of data analysis for mining of massive sets of data, to identify gaps in the current practice and to develop methods to fill these gaps. The committee thus examines the frontiers of research that is enabling the analysis of massive data which includes data representation and methods for including humans in the data-analysis loop. The report includes the committee’s recommendations, details concerning types of data that build into massive data, and information on the seven computational giants of massive data analysis.