There’s a very interesting article in Wired about a data mining tool developed to discover instances of whitewashing (e.g. editing in one’s self-interest; presumably inappropriately) of Wikipedia entries. As has been noted before, Wikipedia has no authority control over the entries and is therefore particularly subject to self-serving or highly partisan edits. Now a clever grad student has developed a tool to identify those instances based on the version tracking built into wikis. While it doesn’t necessarily identify a particular person, just knowing that, as described in the article, someone at Diebold HQ removed negative information about Diebold voting machines is adequate because it forces Diebold to prove they weren’t the ones to make the changes. In short, it provides accountability by making use of the Wikipedia equivalent of the historical record.
I mention this story because I think that this kind of activity is going to be increasingly important in determining what constitutes a real and/or official government publication. Traditionally, you held a government accountable by getting offiical documentation of its activities and holding on it for comparison with other official documentation. However, government information published electronically has made this a lot harder because of the changable nature of digital files. A longstanding concern of government information librarians with respect to electronic govnernment information has been how to know when changes have been made, what the changes consisted of and who made them.
In this respect, the surging popularity of web 2.0 -style tools may be a great boon for government information. These tools — wikis, online collaborative software like Google Documents or Zoho and so on — derive their value from their ability to be shared. Government agency personnel are no different from anyone else – they’ve got work to do, a limited patience with messing around with how to do it and a desire to take the path of least resistance. So, for government employees, i.e. the folks creating government information, there’s just as much reason to use these kinds of software as there is for me right now writing this post.
And that means that neither the historical record nor legal accountability is necessarily lost, although it will entail expanding the definition of preservation of the historical record to include methods of acting on databases (creating data mining software to run against databases) in addition to the collection of objects (finding that last copy of a Serial Set volume) and any other activities that may become necessary as technology evolves.
As with everything, the possibilities are not limitless. The Wikipedia Scanner was developed in cooperation with Wikipedia and required a full download of the whole database. Allowing that level of access is an option that individual agencies could turn on or off and certainly some agencies would never allow those levels of access to their publications. However, the agencies unlikely to play well with others in this scenario probably already don’t provide much access to their information. For agenices that would be amenable to this kind of datamining, a benefit would be not just automated archiving (which the version tracking amounts to), but no-cost-to-the-agency management of those archives since they’ll be allowing others to do it for them.