Crowdsourcing

NARA and NOAA join Old Weather Project to crowdsource transcription of historic naval ship weather logs

According to today's press release from NOAA, the National Archives (NARA) and NOAA are teaming up and joining the Old Weather Project hosted at Zoonivers.org to crowdsource the transcription of historic ships' logs in order to extract critical environmental data. The Old Weather Project began over 2 years ago with British Royal Navy log books -- 16,400 volunteers have transcribed 1.6 million weather observations so far! Transcribed data produced by Old Weather volunteers will be integrated into existing large-scale data sets, such as the International Comprehensive Ocean Atmosphere Data Set (ICOADS). Human volunteers are so important in this case because Optical Character Recognition (OCR) technologies cannot currently recognize hand-written text.

Before there were satellites, weather data transmitters, or computer databases, there were the ship’s logs of Arctic sea voyages, where sailors dutifully recording weather observations. Now, a new crowdsourcing effort could soon make of the weather data from these ship logs, some more than 150 years old, available to climate scientists worldwide.

NOAA, National Archives and Records Administration, Zooniverse — a citizen science web portal — and other partners are seeking volunteers to transcribe a newly digitized set of ship logs dating to 1850. The ship logs, preserved by NARA, are from U.S. Navy, Coast Guard and Revenue Cutter voyages in the Arctic between 1850 and the World War II era.

[Clip]

Organizers hope to enlist thousands of volunteers to transcribe scanned copies of logbook pages via the Old Weather project with an eye to Information recorded in these logbooks will also appeal to a wide array of scientists from other fields – and professionals from other fields, including historians, genealogists, as well as current members and veterans of the U.S. Navy and Coast Guard.

[HT to Gary Price at InfoDocket for calling our attention to this project!]

EPA wants your Documerica Photos!

This is from last year, in case you missed it. (I did.):

  • Documerica Returns!, EPA blog (May 2nd, 2011).

    Almost 40 years ago, EPA’s Documerica project captured thousands of images of environmental problems and everyday life. Now it’s your turn!

    On Earth Day 2011, EPA put out a global call for current photos of life and our environment, PLUS a challenge to photograph the ‘now’ of places in Documerica. Your photo could be exhibited around the U.S. in 2012!

    Join In!
    Sign up and submit photos through Flickr!

See also:

EPA wants your environment pictures, issues public photo challenge, by Michael Cooney, Network WorldBy (01/06/12).

Searching More Than 24K Pages of Email Messages From Sarah Palin Administration

The archived email messages were released earlier today and are now beginning to roll out into searchable databases and/or PDF files.

Scanned pages are being added to databases as they become available. Many news organization are asking the public for assistance in reviewing all of the pages. Yet another example of crowdsourcing government records.

Here are three of several source provin

1. NY Times
Search NY Times: Palin E-Mail Search
http://projects.nytimes.com/palin-emails/date/2008-08-01

2. MSNBC/Mother Jones/ProPublica
Search: MSNBC/Mother Jones/ProPublica
http://palinemail.msnbc.msn.com/

Updates at @openchannelblog and #palinemail

MSNBC Live Blog With Additional Information as it Becomes Available. Also, info about documents being withheld.
http://openchannel.msnbc.msn.com/_news/2011/06/10/6825771-heres-your-liv...

Background from Bill Dedman at MSNBC
http://msnbc.msn.com/id/43281157/ns/politics-more_politics/

3. Washington Post
PDF Files of Raw Email Messages (#1)
http://www.washingtonpost.com/wp-srv/special/politics/palin-emails/pdf/D...

Additional Material
http://www.washingtonpost.com/blogs/sarah-palin-emails/post/for-micro-up...

Google Map Maker links

The Scout Report has good links to CNET, Wired, and Wall Street Journal articles about Google's Map Maker project, as well as links to the American Memory Project and the David Rumsey Map Collection:

  • With the release of Google Map Maker, users can contribute their own spatial knowledge, by Max Grinnell, The Scout Report (2011-04-22).

    In the previous millennium, those folks who wanted a high-quality map of their area might have had to go purchase or borrow an actual physical map. In recent years, online mapping tools and resources have sprouted like mushrooms after a hard rain. With all of that in mind, it is not so surprising that on Tuesday Google announced that it is allowing users to contribute changes to their very popular maps. This tool is called Google Map Maker...

Crowd-sourcing transcription of historical texts

University College London has a treasure trove in the papers of the Enlightenment philosopher Jeremy Bentham. In the last 50 years, it has published 27 volumes of his writings — less than half of the 70 or so volumes ultimately expected. In an attempt to spur this project along, they're crowd-sourcing the transcription of the historical documents according to the NY Times.

The story also mentions another interesting crowd-sourcing project at George Mason University to reconstitute the papers of the early War Department (1784-1800) which had been destroyed by a fire on November 8, 1800. Sharon Leon, a historian at George Mason University and Director of Public Projects at the Center for History and New Media -- developers of one of my favorite Web tools, Zotero! -- recently received a grant from the National Endowment for the Humanities to design a free plug-in that any archive or library could use to open transcription to the public.

Obviously crowd-sourcing is becoming an invaluable tool for expanding the reach of scholarship. Last week, I mentioned the Old Weather project which is crowd-sourcing old weather observations made by UK Royal Navy ships around the time of World War I in order to assist with climate model projections and improve a database of weather extremes. Old Weather is part of the Zooniverse of crowd-sourcing projects to help scientific projects.

Some, like Daniel Stowell, the director and editor of the Papers of Abraham Lincoln in Springfield, IL, point out that hiring of nonacademic transcribers is not a panacea and in fact could produce so many errors as to make crowd-sourcing expensive and even more time consuming in correcting errors.

But, as Ms Leon points out, “We’re not looking for perfect. We’re looking for progressive improvement, which is a completely different goal from someone who is creating a letter-press edition.”

Dare I point out that the FDLP has been crowd-sourcing US government documents since 1813?! As Tomas Jefferson wrote in a 1791 letter, “Let us save what remains, not by vaults and locks which fence them from the public eye and use in consigning them to the waste of time, but by such a multiplication of copies, as shall place them beyond the reach of accident.”

Sunlight Foundation's Transparency Corps Recruits People Amazon Turk Style

The Sunlight Foundation recently announced the creation of the Transparency Corps. Modeled after Amazon’s Mechanical Turk, the Transparency Corps aim to make it easy to harness small efforts by enthusiastic volunteers to move forward efforts to improve government transparency.

From the June 30, 2009 Sunlight Foundation press release:

“Inspired by Amazon’s Mechanical Turk, Sunlight created Transparency Corps as a new way for people to volunteer to make government transparency a reality,” said Ellen Miller, executive director and co-founder of the Sunlight Foundation. “Now, when people ask ‘how can I help?’ Sunlight and future partners can provide micro-tasks that when aggregated, help solve research and data analysis problems when computers alone cannot properly scrutinize government information.”

Right now there are two projects:

Each time you complete a task, you get points. Those points add up and are how you move up the transparency leader board. I joined up to see what a task would look like. For the earmarks task I was presented with a PDF of a letter requesting funding for a local project and a form to the right of the letter to be filled in with data such as the quantity requested, title of the project and other requester information. You can see an example of one of the letters on ScribDB.

I am curious to see how big they can grow their corps & see what projects they target over the next year. I love that they are grabbing structured data. This particular task is part transcription and part encoding and reminds me of some of the work being done over on Freebase.com. For an example of one of the datasets they are building, take a look at their U.S. National Register of Historic Places base or the Government Commons.

Syndicate content Syndicate content