Home » Posts tagged 'semantic web'
Tag Archives: semantic web
How researchers enhanced Data.gov using semantic technology, by Sean Gallagher, GCN, May 18, 2010.
A three-person team at Rensselaer Polytechnic Institute, however, has demonstrated how one approach can make greater use of the massive sets of data available on Data.gov, using the power of the semantic web. The conversion project has shown how quickly and inexpensively visualization and mash-up applications can be built from government data when it’s put into a web-friendly form.
Just got finish reading an article from the upcoming Sunday edition of the New York Times — “If You Liked This, Sure to Love That” which talks about the public contest Netflix is running to improve the accuracy of the search engines that recommend movies to their users. Here is a section from the story that describes the problem and the prize —
“THE “NAPOLEON DYNAMITE” problem is driving Len Bertoni crazy. Bertoni is a 51-year-old “semiretired” computer scientist who lives an hour outside Pittsburgh. In the spring of 2007, his sister-in-law e-mailed him an intriguing bit of news: Netflix, the Web-based DVD-rental company, was holding a contest to try to improve Cinematch, its “recommendation engine.” The prize: $1 million. Cinematch is the bit of software embedded in the Netflix Web site that analyzes each customer’s movie-viewing habits and recommends other movies that the customer might enjoy. (Did you like the legal thriller “The Firm”? Well, maybe you’d like “Michael Clayton.” Or perhaps “A Few Good Men.”) The Netflix Prize goes to anyone who can make Cinematch’s predictions 10 percent more accurate.”
Deeper in the story is this tidbit —
“IT USED TO BE THAT if you wanted to buy a book, rent a movie or shop for some music, you had to rely on flesh-and-blood judgment — yours, or that of someone you trusted. You’d go to your local store and look for new stuff, or you might just wander the aisles in what librarians call a stack search, to see if anything jumped out at you. You might check out newspaper reviews or consult your friends; if you were lucky, your local video store employed one of those young cinéastes who could size you up in a glance and suggest something suitable.”
And then this —
“Cinematch has, in fact, become a video-store roboclerk: its suggestions now drive a surprising 60 percent of Netflix’s rentals. It also often steers a customer’s attention away from big-grossing hits toward smaller, independent movies. Traditional video stores depend on hits; just-out-of-the-theaters blockbusters account for 80 percent of what they rent. At Netflix, by contrast, 70 percent of what it sends out is from the backlist — older movies or small, independent ones. A good recommendation system, in other words, does not merely help people find new stuff. As Netflix has discovered, it also spurs them to consume more stuff.”
The implications for government information library service seem, to me, profound. Automated trust? Where could we go with this when it comes to that sense of trust the informs the best part of librarianship and the community of users that rely on our institutions. I know some libraries are using aspects of this kind of recommendation automation … but I love the notion of using the social software tools in such a way to help people find more stuff they might in which they might be interested. I know there is a huge gap between selecting movies and TV shows based on likes and dislikes and what we do as government information librarians when we explain large or small complex policy/legal connections. But just as Jim points out about the inherent necessity of collaboration embedded in librarian practice and theory (and we will continue to agree to disagree about the centrality of possession in that mix) — I can only dream of an government information search algorithm that picks and chooses its way among the complex of relationships embedded in government information.
If you like this regulation on natural gas, then you might want to consider this one.
I know this happens at a very “structural” level in the Federal Register and Code of Federal Regulations that link the regulations through citations and their foundational public laws. And I know that, in a very real sense, subject headings and other authority records do this at a kind 19th century linear approach. But it seems to me that I spend most of my flesh and blood library time explaining these connections to our users rather than seeing any evidence that they grasp these library connections intuitively.
How can we make our library intuition more transparent? In some ways we are flesh and blood alogrithims — which gets to another part of the story that delights me, when the contest participants sharpen their mathematical tools —
“As the teams have grown better at predicting human preferences, the more incomprehensible their computer programs have become, even to their creators. Each team has lined up a gantlet of scores of algorithms, each one analyzing a slightly different correlation between movies and users. The upshot is that while the teams are producing ever-more-accurate recommendations, they cannot precisely explain how they’re doing this. Chris Volinsky admits that his team’s program has become a black box, its internal logic unknowable.”
Which has always been the challenge of teaching student librarians about the art of reference work — there is this black box quality to how we know what we know and where to search for relevant information.
See you on Day 60.