blogs
White House claims copyright of photos on Flickr
Submitted by jajacobs on Mon, 2010-02-08 18:21.White House Makes Full Copyright Claim on Photos, by Kathy Gill The Moderate Voice (Feb 6th, 2010).
The U.S. government policy on photographs and copyright is pretty straightfoward: photos produced by federal employees as part of their job responsibilities are "not subject to copyright in the United States and there are no U.S. copyright restrictions on reproduction, derivative works, distribution, performance, or display of the work."
Why, then, is the Obama White House asserting that no one but "news organizations" can use its Flickr photos? Why is it asserting that manipulation is prohibited? Why is it asserting that photos may not be used in "commercial or political materials, advertisements, emails, products, promotions that in any way suggests approval or endorsement of the President, the First Family, or the White House"?
- jajacobs's blog
- Add new comment
- 63 reads
Lots of coverage of Google + NSA = "do no evil"?
Submitted by jajacobs on Fri, 2010-02-05 09:22.The recent "alliance" between the National Security Agency, (one of the most secret and secretive members of the U.S. intelligence community), and Google has brought up more questions than answers. Here are some recent stories:
- Google to enlist NSA to help it ward off cyberattacks, By Ellen Nakashima, Washington Post (February 4,
2010)."The world's largest Internet search company and the world's most powerful electronic surveillance organization are teaming up in the name of cybersecurity."
- Google Asks Spy Agency for Help With Inquiry Into Cyberattacks, By JOHN MARKOFF, New York Times (February 4, 2010).
'By turning to the N.S.A., which has no statutory authority to investigate domestic criminal acts, instead of the Department of Homeland Security, which does have such authority, Google is clearly seeking to avoid having its search engine, e-mail and other Web services regulated as part of the nation’s "critical infrastructure."'
- 'Don't Be Evil,' Meet 'Spy on Everyone': How the NSA Deal Could Kill Google, By Noah Shachtman, Wired (February 4, 2010).
"The company pinkie-swears that its agreement with the NSA won’t violate the company's privacy policies or compromise user data. Those promises are a little hard to believe, given the NSA's track record of getting private enterprises to cooperate, and Google’s willingness to take this first step."
- Google, NSA ‘alliance’ has privacy advocate alarmed, By Stephen C. Webster, Raw Story, (February 4th, 2010).
- EPIC Seeks Records on Google-NSA Relationship, Electronic Privacy Information Center (February 4, 2010).
See also: Privacy: "I have nothing to hide".
- jajacobs's blog
- Add new comment
- 251 reads
iConference presentation on the future of govt information
Submitted by jrjacobs on Fri, 2010-02-05 09:02.[UPDATE: I added the slides for Tom Bruce's talk]
Shinjoung and I submitted a panel on the future of govt information for iConference 2010 in Champaign, IL. We had a good far-reaching discussion with Tom Bruce (Cornell Legal Information Institute), Daniel Schuman (Sunlight Foundation) and Cindy Etkin (GPO). Below are my slides and notes. I've also attached the notes and abstract as PDFs. As Tom tweeted, "World's problems: solved."
If the other panelists agree, I'll post their notes/slides as well. This is of course an ongoing conversation so please feel free to leave comments, questions, rants etc.
--that is all!
3:45 - 5:15 pm Thursday, February 4, 2010
Roundtable 4 : : Technology Room
"Gone today, Here tomorrow: assuring access to government information in the digital age." ShinJoung Yeo, University of Illinois; and James R. Jacobs, Stanford UniversityPanelists:
- Shinjoung Yeo, Moderator
- James Jacobs, Stanford University Library
- Thomas Bruce (Legal Information Institute, Cornell University)
- Daniel Schuman (Sunlight Foundation policy director)
- Cindy Etkin (Govt Printing Office)
[SLIDE 1: govt documents]
Right up front, I'm a librarian and a collaborator in the LOCKSS distributed digital preservation project (Lots of Copies Keep Stuff Safe). I've been in academia/education my whole life as a student, teacher, librarian and technologist. I've been a government information/FDLP librarian since 2002 and currently am serving a 3 year term on the Depository Library Council, the body which informs and advises the Govt Printing Office regarding issues of the Federal Depository Library Program (which Cindy talked about). So my mindset/perspective/bias is from one who assists in the scholarly communication process, one who believes that libraries have a place in the digital information landscape, and one who believes strongly in the idea that access to govt information is a fundamental right. As Ralph Nader has said, “There can be no daily democracy without daily citizenship.” And there can be no citizenship without access to government information.
[SLIDE 2: mmm documents]
With that in mind, I'd like to talk about the underlying historical ideals of the FDLP, discuss how those ideals have been under fire from both within and without the library community and argue that those ideals applied to today's information landscape give us the best chance at access to and long-term preservation and assurance of govt information.
[SLIDE 3: FDLP logo]
The federal depository library program (FDLP) has been around since 1813 in one form or another. The basis underlying the need for an FDLP is to give the public free access to government information. Depository libraries have long safeguarded the public's right to know by cooperating with and receiving for free the govt publications published by the Govt Printing Office (GPO), organizing, maintaining, and preserving those publications, assisting users in accessing said information in a geographically dispersed system and most importantly, assured that govt information is freely available and tamper-proof -- think Napster for govt information. Taken together, the collections of the 1238 depository libraries make up the historic corpus of govt information available for free to every citizen. Jessamyn West of librarian.net, recently called the FDLP the longest running open source project. I would add that it's the longest government-run public-centric open-source project to support the democratic ideal.
[SLIDE CHUCK QUOTE]
Over the last 20-30 years, developments in publishing and Internet technologies have affected the way government information is produced, disseminated, controlled, and preserved. These changes have affected the policies and procedures of the GPO and, in turn, have affected the depository library program. Despite the often-heard promises that Web technologies will bring more information to more people more quickly and easily, the actual effects have been decidedly mixed. The highly visible, short-term successes of rapid dissemination of single titles directly to citizens (e.g., the large number of downloads of the 9/11 report) mask the loss of a secure infrastructure (GPO's Federal Digital System (FDsys) notwithstanding) for long-term preservation of and access to government information as more and more agencies publish content on their own Web sites rather than using the GPO conduit (which librarians call "fugitive documents") and very few agencies publish to any standards or have policies in place that deal with archiving and preservation. As Chuck Humphrey, a data librarian friend of mine, once said, “there seems to be an inverse relationship between convenience of dissemination and preservation standards.”
In addition to this lack of a secure infrastructure, the growing din of the call for digitization of historic govt publications (most recently the Ithaka/ARL report "Documents for a Digital Democracy: A Model for the Federal Depository Library Program in the 21st Century"), while no doubt a boon for access today, is somewhat of a red herring that makes library administrators believe that they will soon be able to dispose of their physical collections and use that space for today or tomorrow's buzz word. This call for digitization may instead have the deleterious affect of damaging the long-term preservation of govt publications.
Lastly, the growing trend toward privatization of govt information has actually caused a decrease in public access despite it's digital nature. This is not a new trend. Herbert Schiller noted this in 1986 in his book "Information and the Crisis Economy." Speaking of machine readable formats, he wrote that, "Library information capability is greatly enhanced. Yet this benefit is accompanied by the abandonment of libraries' historical free access policy. User charges are introduced. The public character of the library is weakening as its commercial connection deepens. No less important, the composition and character of its holdings change as the clientele shifts from general public to the ability-to-pay user."
[SLIDE: GAO contract]
We've seen over the last 30 years a disturbing rise in Federal Agencies entering into contracts with private companies whereby public domain govt documents are digitized and then taken out of the commons via licensing agreements. See for example, the Government Accountability Office (GAO)'s deal with Thomson-West whereby Thomson-West digitized the GAO's 20,597 legislative histories of most public laws from 1915-1995 and in return received exclusive license to sell access to the content. GAO received nothing in return but an account on Thomson's service while the public received nothing at all.
Rapid technological change and the misplaced assumption that "it's all in google" have caused some in the FDLP community to question the need for the FDLP and some others to drop out of the program altogether. I believe that the inherent nature of digital information actually increases the need for a distributed network of dedicated, legislatively authorized libraries. It would be prudent to draw upon the existing infrastructure of FDLP libraries and the almost 200 years of cumulative experience of these institutions in assuring preservation of and access to government information. We must reinforce FDLP’s traditional mission of selection, collection, free access, and preservation of government information in the digital era in order to assure free access to this information into the foreseeable future. Some in the depository community, like my library, are doing just that by participating in the LOCKSS-USDOCS network, harvesting digital govt information -- for example, harvesting openCRS that Daniel mentioned along with other sites that post CRS reports -- and yes digitizing parts of their collections. But we need more libraries not less.
[SLIDE: FDLP ecosystem]
Nobody knows for sure how to preserve digital content for the long-term. This means to me that a loosely coupled, independently administered, distributed ecosystem is the best way to assure long-term preservation -- many organizations with many funding models and a distributed technical infrastructure(s) have a better shot at preservation than 1 or 2 organizations -- especially if one of those organizations has a tenuous budget, or is a private corporation etc.
Imagine if you will 2 future govt information systems: on the one hand, the system where there are one or two digital collections (say for example GPO's Federal Digital System (fdsys) and Portico, the dark archive currently housing digital journals); and on the other hand, one with many digital collections in fdlp libraries. How would each of these deal with or react to different stress situations or threat models (e.g., reduced budgets, increased demand for privatization, increased demand for censorship or control or removal of information, media/hardware/software/network failure, natural disaster, organizational failure etc.)? It's easy to see that a highly replicated, distributed FDLP model of preservation would deal with these situations much better than a centralized model. A web is much stronger than a silo.
[SLIDE: Federal Register XML]
law.gov, Carl Malamud’s proposal for a registry and repository of all legal information -- from what I've seen and heard and read, is a compelling proposal for a significant piece of the federal (and state) legal information ecosystem. What we ought to be doing is a) figuring out how to make law.gov a reality; b) figuring out how to expand it beyond legal materials to include ALL federal information -- information from all 3 branches of government, federal agencies as well as the regional and local offices of those agencies, data and statistics, the entire Congressional/legislative process including the funding that goes into that process to grease the skids so to speak, and making sure public information stays in public control; and c) MOST IMPORTANTLY from my perspective as a librarian, figure out how to preserve that ecosystem for the long term so that the public can inform itself not just today or tomorrow but 100 years from now. Now the 4 of us on this panel are just 4 players with dogs in this fight. But if we agree on the goals, then we ought to work together to proceed toward them and mobilize our communities and the public to support this endeavor.
It's going to take the government (and not just GPO) being serious about transparency and funding the necessary changes in its own federal information distribution system to include open format standards with no DRM, bulk data channels, indexing, description, collection and authentication of information resources, multiple digital preservation strategies to not only assure preservation but also to insure against tampering and deletion of vital information (which, as I've stated earlier, the FDLP historically has done very well!). It's also going to take libraries being serious about and applying the ideals of the FDLP to build a distributed digital infrastructure that takes into account access to as well as preservation of digital govt information.
I agree with Tom and am absolutely convinced that the changes in the information ecosystem that are needed should not be left to the market because the information market leans heavily toward monopoly, proprietary standards, licensing restrictions, lack of access, "rights management" and the like.
If an evolving ecosystem that is free, open, standards-based, authenticated, and privacy-protecting is built and sustained correctly then citizens, libraries, non-profit watchdogs, hackers, activists, AND government will thrive.
[SLIDE 7: THANKS! lockss, archive-it]
digital changes a lot of things about information, but it doesn't change the need to fund it, collect it, share it, preserve it, and give access to it. As my friend and colleague Jim Jacobs recently stated, "lots of collections keep stuff safe!"
- jrjacobs's blog
- Add new comment
- 426 reads
Data.gov.uk versus Data.gov
Submitted by jajacobs on Fri, 2010-02-05 08:22.Here is a point-by-point comparison of the big new data dissemination initiatives by the U.S. and the U.K.:
- Data.gov.uk versus Data.gov. Flowing Data (Feb 4, 2010).
While Data.gov.uk was just recently launched publicly, it has many advantages over Data.gov. It's easier to use and geared towards developers, who, let's face it, are the only ones who are going to do more with the data than open it up in Excel. Data.gov has some catching up to do. Both still have a long way to go. Both are good steps in the right direction.
Hat tip to Kevin Taglang at Benton Foundation!
- jajacobs's blog
- Add new comment
- 162 reads
NARA on Flickr
Submitted by jajacobs on Thu, 2010-02-04 11:33.The U.S. National Archives joins the Commons!, Flickr blog, (February 1, 2010).
Please welcome the U.S. National Archives to The Commons, the world’s public photography archives on Flickr to which you can contribute information and knowledge.
With over 3,000 images in 49 sets uploaded already, perusing these important archival images should keep you entertained for a long time. Their four collections encompass important Americana, ranging from the famous Mathew Brady Civil War images to historical and iconic images of American history.
- jajacobs's blog
- Add new comment
- 215 reads
Quadrenial Defense Review (QDR) 2010 released
Submitted by jrjacobs on Mon, 2010-02-01 17:49.Today the Department of Defense (DoD) released it's once-every-four-years report to Congress on the military's defense planning called the Quadrennial Defense Review 2010 (see DoD press release).
Significant highlights of the report include the consideration of the significance of climate change on national security; the greening of the Department of Defense, including efforts to make the military more environmentally friendly, to anticipate and prepare for environmentally driven crises and disasters, and to achieve energy security; and efforts to convert the nontactical vehicle fleet away from gasoline-dependence, and a Navy plan to deploy a carrier strike group running on biofuels and nuclear power by 2016.
For more analysis of what's inside the QDR, please see the following articles:
- Growing Pentagon Focus on Energy and Climate. Andrew C. Revkin. NY Times dOTEarth blog.
- What's inside the Quadrenial Defense Review. Robert Farley. Tapped: the group blog of the American Prospect
All of the strategic defense reviews are available at DoD Strategic Defense reviews including the Quadrenial Defense Review (QDR), Nuclear Posture Review (NPR), Ballistic Missile Defense Review (BMDR) and the Space Posture Review (SPR).
- jrjacobs's blog
- Add new comment
- 601 reads
Objective civil discourse
Submitted by moritz on Mon, 2010-02-01 16:47.Recently, when I have spoken about "data as evidence" in several academic settings, there has been a recurring question. Essentially it concerns the fact that dishonest people acting in bad faith will generate false, badly formed, or misleading data and propose it as evidence in support of predetermined (i.e. prejudiced / pre-judged) positions. To this day, parties or groups that base themselves in "values" or "beliefs" that are assumed a priori – i.e. values that are non-negotiable – in fact, not subject to discussion -- dominate our political landscape. One has only to watch the response of some of the Republican Congressional caucus to President Obama’s discussion there this past week to see clear evidence of this. A fundamental tenet for these believers is that compromise with any other set of beliefs represents moral "relativism" – which is equivalent to amorality (if not immorality).
I believe that much of the trouble we experience in contemporary civil discourse can be traced to a confusion, conscious or otherwise, of the distinctions between "Church" (institutionalization of religious belief) and "State" (government based on trust in a diverse and tolerant community). From the time of European settlement of this continent we have had problems in separating Church and State [See LoC for an excellent summary history ] AND, concomitantly, in maintaining the distinction between empirical knowledge as a basis for public policies and commitments to "truth" based in belief. The former can be understood as "objective and invariant" (as discussed previously) the latter as subjective and highly variable -- the phrase used by John Searle of UC Berkeley, "first person ontology" is well applicable.
With objective, scientifically based knowledge, we have the opportunity of arriving – through investigation and discourse -- at common agreements (within some bounds of reasonable, relative probability). Respecting contending "truths," based in belief, we have the very strong possibility of violence and conflict -- consider – Northern Ireland or South Asia? It is wrong and misguided to characterize the separation of Church and State as somehow inimical to one system of belief or another.
Separation of Church and State is fundamental to a diverse and inclusive society and protects religious freedom and the right of individual conscience. Without separation – and religious tolerance (as clearly expressed in the Bill of Rights) – a change in political power may result in murder. We are all too familiar – elsewhere in the world -- with the consequences of confusing government and religion.
And so we must return to the problem of objectivity and pragmatism in civil discourse. Today we are faced with a range of a priori values – beliefs that are considered "true" and above debate. On the right, the most fundamental of these a priori tenets is that "government is bad" [The Reagan/Thatcher formulation being: "Government is not the solution to our problem; government is the problem."] coupled with the corollary that raising funds to support government (taxing) is bad. Aside from the fact that this is fundamentally subversive (!) of the common welfare – it is also impractical and nonsensical. But I would also argue that on the left, there are similar a priori values – i.e. that government is good and corporations are bad.
All forms of human organization are subject to corruption and abuse – certainly this is true of government at all levels – but is absolutely true of corporate governance and is also true of private sector non-profit governance. I believe that the most stable and sustainable principle for our American system of democracy is justice based in the common value of fairness, and this value demands commitment to tolerant civil discourse embodying both rationality and science. It will be protected by an ongoing commitment to transparency and accountability in governance of all sectors: for profit, not for profit and public. (In recent years we have all seen flagrant examples of abuse in all three sectors. Journalism and publishing under first amendment protections together with free, open and effective access to data and information have been essential to the process of transparency and accountability.)
The previously mentioned GRI [SEE: http://www.globalreporting.org/ ] -- and similar initiatives working for transparency, accountability and rigorous standards of evidence –- present a clear alternative to organizational business-as-usual.
(As an aside, I will here note that expressions of anger – verbal or physical - as a part of political discourse – for example shouts of "You lie!" -- are signs of impotence, sure evidence of the abandonment of civil discourse, of the rational intention of serving the common welfare.)
As custodians of knowledge, as teachers and as advocates, librarians have always been primary defenders of fair and equitable access to knowledge for the common good. The World Wide Web is a technical fulfillment of the most basic ethos of librarianship. For the first time in human history, we have the technological means of sharing knowledge worldwide. But the existence of a global network does not assure that all people will have access, it does not assure that what flows across the network will be effectively useful in informing public discourse for the largest number of people.
We, librarians, have an obligation, in all our interactions to support the broadest possible access by all – freely, openly and effectively. We must maintain critical sensitivity to the practical usefulness of resources provided over global networks, to teach critical and evaluative skills and to assist wherever possible in interpreting and refining available resources.
- moritz's blog
- Add new comment
- 303 reads
Depository spotlight 2/2010: University of Maryland's Thurgood Marshall Law Library
Submitted by jrjacobs on Mon, 2010-02-01 11:24.This month's depository spotlight shines on University of Maryland's Thurgood Marshall law Library. Congratulations to Bill Sleeman, Jeff Elliott and the rest of the staff at TMLL! The spotlight highlights 2 solid long-standing digital projects from TMLL:
- Historical Publications of the US Commission on Civil Rights
- Congressional Research Service (CRS) reports focusing on various aspects of law and foreign policy (for which I heavily rely both as a trusted information source and a source of harvesting for my CRS harvesting project
For those projects as well as their everyday work to support their community, TMLL deserves the spotlight!
But I also found another aspect of their work very interesting and worthy of highlighting. This aspect was mentioned in the post to the FDLP-l listserv announcing the spotlight:
Do you ever wonder how your library can contribute online content to the depository community when you do not have a large staff, extensive resources, or state-of-the-art digitization facilities? Read about the variety of projects that the Thurgood Marshall Law Library at the University of Maryland School of Law manages. Despite being geared towards the Thurgood Marshall Law Library's own specific user group, every library can profit from their focused and high quality endeavors.
Many libraries are creating unique digital research collections that both support their own local user base as well as the larger public's information needs. Depository collections offer a vast and rich base from which to build these digital collections. Whether you work in a library that supports 900 or 90,000 information seekers, depository libraries can and DO assist in the larger collaborative work of giving access (digital or otherwise) to historic and current government documents. Whether your library is hosting 10 digital documents locally or involved in a collaborative digital project in partnership with GPO and/or a federal agency, please consider listing your collection in the FDLP Registry of U.S. Government Publication Digitization Projects
Congratulations once again to the staff at the Thurgood Marshall Law Library!
- jrjacobs's blog
- Add new comment
- 249 reads
National Agricultural Library’s Special Collections Online
Submitted by jajacobs on Mon, 2010-02-01 09:48.The American Historical Association's AHA Today blog has a nice post today about the wealth of information in the National Agricultural Library’s Special Collections online.
- The Special Collections of the National Agricultural Library, by Elisabeth Grant, AHA Today (February 01, 2010).
It includes Rare Books, Nursery and Seed Trade Catalogs, The Thomas Jefferson Correspondence Collection, the USDA Pomological [the science of fruit breeding and production] Watercolor Collection, and more.
See also the The National Agricultural Library Digital Repository (NALDR) which "provides access to primarily historical USDA publications either digitized by NAL or through NAL’s partnerships with other institutions."
- jajacobs's blog
- Add new comment
- 267 reads
January 2010 Lost Docs Report and Appeal
Submitted by dcornwall on Sun, 2010-01-31 09:23.With the January 2010 Lost Docs Report and Appeal, we have come to the last of our "saved receipts" with which we first seeded the blog. This means that starting February 1, 2010, every single posting to the Lost Docs Blog will be a receipt submitted during that month or during the last week of the proceeding month. That means that if everyone who sent in a lost document report to GPO also sent to lostdocs@freegovinfo.info, we would have an accurate report of the volume of document reports provided to GPO. We hope you will help make this happen.
Now on to the January 2010 Lost Docs Report and Appeal
REPORT
Thanks to the continued generosity of documents librarians, we posted 85 reports of fugitive documents submitted to GPO. About two thirds of these items were reported during December 2009/January 2010.
Of these 85 reported items, 11 items have been cataloged by GPO. You can view this list by visiting lostdocs.freegovinfo.info/category/found/ and looking at the postings with January 2010 dates. We are appreciative of these new records.
In our view, three of the items reported to GPO and posted to the blog in January were either out of scope for the Catalog of Government Publications (CGP) or were already in the catalog. You can view these items by visiting lostdocs.freegovinfo.info/category/false/ and looking for items with January 2010 dates.
There were two items added to the "E-Version Needs Cataloging" category. You can view these items by visiting http://lostdocs.freegovinfo.info/category/catalog-eversion and looking for items with January 2010 dates. If your library has either of these documents, please consider adding an 856 field to the record(s) so your patrons will be able to link to the electronic version(s) through your catalog.
APPEAL
If you like the concept of a public listing of fugitive documents reported to GPO, there are a number of easy ways to help us:
- If you report a fugitive document to GPO, send your e-mailed receipt to lostdocs@freegovinfo.info. We welcome any item reported to GPO in the past month. It is best if you can send us the receipt the same day you get it from GPO. Some e-mail programs will support auto-forwarding. If so, please consider autoforwarding items where the subject contains "lostdocs submission."
- Visit the blog at lostdocs.freegovinfo.info and comment on the listed items. Comments can include -- Did your library receive the item? Did you find it in the CGP? Do you think the item is out of scope for the CGP? Did you report the item as well and so on.
- Post the blog link to your website or share it on Facebook, Twitter, or other social media.
- Subscribe to the blog feed at lostdocs.freegovinfo.info/feed/
or better yet incorporate the feed into your website or blog.
- dcornwall's blog
- Add new comment
- 390 reads
OMB removes datasets from data.gov
Submitted by jajacobs on Fri, 2010-01-29 08:16.White House bars agencies from posting some statistics, by Aliya Sternstein, NextGov (01/27/2010).
According to this article, datasets posted to data.gov by the Nuclear Regulatory Commission, the Peace Corps, the Agriculture Department's Food Safety and Inspection Service, the Interior Department's Bureau of Reclamation, and the Social Security Administration have been removed by the Office of Management and Budget "because they raised privacy, security or other concerns."
The article is based on work done by OpenTheGovernment.org which is tracking agency participation with the Open Government Directive here.
- jajacobs's blog
- Add new comment
- 488 reads
What do we mean by "effective" access to data ? (Part II)
Submitted by moritz on Tue, 2010-01-26 21:50.In my last post, I described the possibility of a systematic approach to data validation. A key feature of such an approach must be it’s availability to all who are responsible for data – and of special importance, its capacity to support efficient and timely use by creators or managers of data. Bill Michener (UNM), leader of one of the currently funded DataNet projects has published a chart describing the problem of “information entropy” [SEE: WK Michener “Meta-information concepts for ecological data management,” Ecological Informatics 1 (2006): 4 ] Within recent memory, I have heard an ecologist say that were it not possible to generate minimally necessary metadata “in 8 minutes,” he would not do it. Leaving aside -- for now -- the possibility of applying sticks and/or carrots (i.e. law and regulations, norms and incentives), it seems clear that a goal of applications development should be simplicity and ease of use.
[ Within the realm of ecology, a good set of guidelines to making data effectively available was recently published – these guidelines are well worth reviewing and make specific reference to the importance of using "scripted" statistical applications (i.e. applications that generate records of the full sequence of transformations performed on any given data) this recommendation complements the broader notion -- mentioned in my last post -- of using work flow mechanisms like Kepler to document the full process and context of a scientific investigation. SEE “Emerging Technologies: Some Simple Guidelines for Effective Data Management” Bulletin of the Ecological Society of America, April 2009, 205-214. http://www.nceas.ucsb.edu/files/computing/EffectiveDataMgmt.pdf ]
As a sidebar, it is worth noting that virtually all data are “dynamic” in the sense that they may be and are extended, revised, reduced etc. For purposes of publication – or for purposes of consistent citation and coherent argument in public discourse – it is essential that the referent instance of data or “version” of a data set be exactly specified and preserved. (This is analogous to the practice of "time-stamping" the citation of a Wikipedia article...)
Lest we be distracted by the brightest lights of technology, we should acknowledge that we now have available to us, on our desktops, powerful visualization tools. The development of Geographic Information Systems (GIS) has made it possible to present any and all forms of geo-referenced data as maps. Digital imaging and animation tools give us tremendous expressive power – which can greatly increase the persuasive, polemical effects of any data. (For just two instances among many possible, have a look at presentations at the TED meetings [SEE: http://www.ted.com/ ] or have a look Many Eyes [SEE: http://manyeyes.alphaworks.ibm.com/manyeyes/ ] .) But, these tools notwithstanding, there is always a fundamental obligation to provide for full , rigorous and public validation of data. That is, data must be fit for confident use.
+++++++++++++++
Unanticipated uses of resources are one of the most interesting aspects of resource sharing on the Web. (At the American Museum of Natural History, we made a major investment in developing a comprehensive presentation of the American Museum Congo Expedition (1909-1915) – our site included 3-D presentation of stereopticon slides and one of the first documented uses of the site was by a teacher in Amarillo, Texas who was teaching Joseph Conrad – we received a picture of her entire class wearing our 3-D glasses.) It seems highly unlikely to me that we can anticipate or even should try to anticipate all such uses.
In the early 1980’s, I taught Boolean searching to students at the University of Washington and I routinely advised against attempts to be overly precise in search formulation – my advice was – and is – to allow the user to be the last term in the search argument.
An important corollary to this concept is the notion that metadata creation is a process not an event – and by “process” I mean an iterative, learning process. Clearly some minimally adequate set of descriptive metadata is essential for discovery of data but our applications must also support continuing development of metadata. Social, collaborative tools are ideal for this purpose. (I will not pursue this point here but I believe that a combination of open social tagging and tagging by “qualified” users -- perhaps using applications that can invoke well-formed ontologies – holds pour best hope for comprehensive metadata development.)
- moritz's blog
- Add new comment
- 398 reads
Sign the public domain manifesto!
Submitted by jrjacobs on Tue, 2010-01-26 16:48.The folks at Communia the European Thematic Network on the digital public domain have laid out a clear, concise, easy to understand Public Domain Manifesto calling for the preservation and strengthening of the public domain and calling on cultural heritage organizations (including libraries!) to ensure that works in the Public Domain are available to all of society. Please read the manifesto and consider signing on.
On a side note, This isn't the first manifesto on the block. Also check out the Charter for Innovation, Creativity and Access to Knowledge and Columbia Professor of Law Eben Moglen's dotCommunist Manifesto (coming out of the Free Software movement). These three together show that there's a significant number of people around the world who think the public domain is something that's too important to cultures to let go by the wayside.
General Recommendations
- The term of copyright protection should be reduced.
- Any change to the scope of copyright protection (including any new definition of protectable subject-matter or expansion of exclusive rights) needs to take into account the effects on the Public Domain.
- When material is deemed to fall in the structural Public Domain in its country of origin, the material should be recognized as part of the structural Public Domain in all other countries of the world.
- Any false or misleading attempt to misappropriate Public Domain material must be legally punished.
- No other intellectual property right must be used to reconstitute exclusivity over Public Domain material.
- There must be a practical and effective path to make available 'orphan works' and published works that are no longer commercially available (such as out-of-print works) for re-use by society.
- Cultural heritage institutions should take upon themselves a special role in the effective labeling and preserving of Public Domain works.
- There must be no legal obstacles that prevent the voluntary sharing of works or the dedication of works to the Public Domain.
- Personal non-commercial uses of protected works must generally be made possible, for which alternative modes of remuneration for the author must be explored.
[Thanks BoingBoing!]
- jrjacobs's blog
- Add new comment
- 327 reads
What do we mean by “effective” access to data?
Submitted by moritz on Mon, 2010-01-25 16:22.As previously discussed, “free” and “open” dissemination of data are primary values, are fundamental premises for democracy. Data buried behind money walls, or impeded or denied to users by any of a variety of obstacles or “modalities of constraint” (Lawrence Lessig’s phrase) cannot be “effective”. But even when freely and/or openly available data can be essentially useless.
So what do we mean by “effective”? One possible definition of “statistics” is: “technology for extracting meaning from data in the context of uncertainty”. In the scientific context – and I have been arguing that all data are or should be treated as “scientific” – if data are to be considered valid, they must be subject to a series of tests respecting the means by which meaning is extracted...
By my estimation, these tests in logical order are:
Are the data well defined and logically valid within some reasoned context (for example, a scientific investigation – or as evidentiary support for some proposition)?
-- Is the methodology for collecting the data well formed (this may include selection of appropriate, equipment, apparatus, recording devices, software)?
-- Is the prescribed methodology competently executed? Are the captured data integral and is their integrity well specified?
-- To what transformations have primary data been subject?
-- Can each stage of transformation be justified in terms of logic, method, competence and integrity?
-- Can the lineages and provenances of original data be traced back from a data set in hand?
The Science Commons [SEE: “Protocol for Implementing Open Access Data” http://www.sciencecommons.org/projects/publishing/open-access-data-protocol/] envisions a time when “in 20 years, a complex semantic query across tens of thousands of data records across the web might return a result which itself populates a new database” and, later in the protocol, imagines a compilation involving 40,000 data sets. Just the prospect of proper citation for the future “meta-analyst” researcher suggests an overwhelming burden.
So, of course, even assuming that individual data sets can be validated in terms of the tests I mention above, how are we to manage this problem of confidence/ assurance of validity in this prospectively super-data-rich environment?
(Before proceeding to this question let’s parenthetically ask how these test are being performed today? I believe that they are accomplished through a less than completely rigorous series of “certifications” – most basically, various aspects of the peer review process assure that the suggested tests are satisfied. Within most scientific contexts, research groups or teams of scientists develop research directions and focus on promising problems. The logic of investigation, methodology and competence are scrutinized by team members, academic committees, institutional colleagues (hiring, promotion, and tenure processes), by panels of reviewers – grant review groups, independent review boards, editorial boards -- and ultimately by the scientific community at large after publication. Reviews and citation are the ultimate validations of scientific research. In government, data are to some extent or other "certified by the body of agency responsible.)
If we assume a future in which tens of thousands of data sets are available for review and use, how can any scientists proceed with confidence? (My best assumption, at this point, is that such work will proceed with a presumption of confidence – perhaps little else?)
Jumping ahead, even in a world where confidence in the validity data can be assured, how can we best assure that valid data are effectively useful?
A year ago in Science a group of bio-medical researchers raised the problem of adequate contextualization of data [SEE: I Sim, et al. “Keeping Raw Data in Context”[letter] Science v 323 6 Feb 2009, p713] Specifically, they suggested:
“a logical model of clinical study characteristics in which all the data elements are standardized to controlled vocabularies and common ontologies to facilitate cross-study comparison and synthesis.“ While their focus was on clinical studies in the bio-medical realm, the logic of their argument extends to all data. We already have tools available to us that can specify scientific work flows to a very precise degree. [SEE for example: https://kepler-project.org/ ] It seems entirely possible to me that such tools can be used – in combination with well-formed ontologies built by consensus within disciplinary communities to systematize the descriptions of scientific investigation and data transformation. – and moreover – by the combinations with socially collaborative applications -- to support a systematic process of peer review and evaluation of such work flows.
OK -- so WHAT ABOUT GOVERNMENT INFORMATION??? We’re just government document librarians or just plain citizens trying to make well-informed decisions about policy? Stay tuned…
- moritz's blog
- Add new comment
- 461 reads
Big week for open access to government information
Submitted by jajacobs on Mon, 2010-01-25 08:52.You almost certainly have seen at least one story in the past week about "Open Government" and the release of new data. Reporters have slowly been picking up on a massive release of information spurred by President Obama's Open Government Directive. (See: New 'high value' data posted to data.gov.)
Below are a few announcements and stories that you may find of interest.
But, in addition to all the data released this week was a new policy that will, potentially, affect usability of government information in the future. In the December 8, 2009 memo (Open Government Directive [pdf] Memorandum For The Heads Of Executive Departments And Agencies, M-10-06, Peter R. Orszag Director, Office of Management and Budget) that implemented the President's Open Government Initiative, OMB specifically mandates open file formats.
To increase accountability, promote informed participation by the public, and create economic opportunity, each agency shall take prompt steps to expand access to information by making it available online in open formats.
And, OMB defines open formats as:
An open format is one that is platform independent, machine readable, and made available to the public without restrictions that would impede the re-use of that information.
This is big news for two reasons. First, it should lead the government away from proprietary formats which are hard to preserve, hard to re-use, and typically require either proprietary software or only operate on specific platforms, or both. Think: documents in ODF format rather than Microsoft Word. Second, the directive mandates formats "without restrictions [on] re-use." Think: no DRM (and no licensing restrictions!).
As the ODF Alliance noted back in December when the OMB memo was released, much of government information is still released in "documents" which are not ideal for re-use of information even when the document formats are open. But, this is still an important, essential step:
Like it or not, government bureaucracies are still very document-centric and there is a lot of government “data” stored in documents, the challenge being how to provide easy access to this data.
...With today's announcement, the Obama Administration has taken an important step on open government data and acknowledged the role open formats play in this regard. For document-centric governments, an open document format remains essential to delivering on this promise.
-- Obama Administration To Require Government Agencies to Make Information Available in Open Formats. ODF Alliance, December 08, 2009.
Open formats will help libraries that want to preserve digital government information by making it easier and less costly to do so.
Here are some of the announcements about releases of new government data:
- Open Government Initiative White House.
- Another Milestone In Making Government More Accessible and Accountable. White House.
- U.S. Government, OSTP, Open New Troves of Data to the Public
- Justice Department Announces Release of New Information Online as Part of President’s Open Government Initiative
- How "Open Gov" Datasets Affect Parents and Consumers. White House.
- Open Government Initiative. White House.
- jajacobs's blog
- Add new comment
- 367 reads


Recent comments
2 weeks 4 days ago
2 weeks 5 days ago
2 weeks 5 days ago
3 weeks 4 days ago
3 weeks 5 days ago
3 weeks 5 days ago
4 weeks 7 hours ago
4 weeks 22 hours ago
4 weeks 1 day ago
4 weeks 3 days ago