Much Government Information Still Not Searchable on Google, etc.
Submitted by jajacobs on Thu, 2008-12-11 08:55.Firms Push for a More Searchable Federal Web, By Peter Whoriskey, Washington Post,December 11, 2008; D01.
Here is another article about the problem that commercial search engines have in indexing government information.
Google chief executive Eric Schmidt says that the "vast majority" of U.S. government information is still not searchable or findable. J.L. Needham, Google's manager of public-sector content partnerships, estimates that 1,000 federal government Web sites are inaccessible to search engine crawlers.
A person using one of the search engines, for example, can't find Environmental Protection Agency enforcement actions against a given company, can't discover the picture of a specific ancient Egyptian artifact at the Smithsonian and can't search by name for the details of a Vietnam War casualty.
And for many Web users, if an online item can't be found with a Web search engine, then for all practical purposes it doesn't exist.
What's the problem? Often, it is simply a matter of agency budgets. "[I]nformation technology officials in the federal bureaucracy said that the transition may require significant manpower and that the costs could be large." One official said that "With limited resources as always, it's a little bit hard."
- jajacobs's blog
- Add new comment
- Email this blog
- 242 reads
tag cloud of the google book deal
Submitted by jrjacobs on Wed, 2008-11-12 23:36.I've been so busy lately (research time of the term, helping to organize the first ever Stanford Open Source (un)conference...) that I've neglected to comment on the Google Book Search Copyright Settlement. The library blogosphere is abuzz, so I won't add anything else, just point to some people I trust who've tracked on the Deal much closer than me (like Peter Brantley, Karen Coyle, John Wilkins, and James Grimmelmann; Siva Vaidhyanathan has links to them all plus some of his own commentary!) But I DID want to show the tag cloud of the agreement. Notice anything?
- jrjacobs's blog
- 4 comments
- Email this blog
- 384 reads
Google map tool shows where to vote
Submitted by jrjacobs on Fri, 2008-10-24 19:03.Here's a handy little google tool to help you find out where to vote. Go to the 2008 US Voter Info google map, put in your address and voila! you have your voting station and can easily get driving/walking directions. the sidebar also includes information on voter registration, contact information for local voting officials and a link to the Google 2008 election site to track what's happening on election day. This map tool was developed by state and local election officials from Iowa, Kansas, Maryland, Minnesota, Missouri, Montana, North Carolina, North Dakota, Ohio, and Los Angeles County and the Voting Information Project in conjunction with the League of Women Voters.
[shout out to the UC Berkeley Library govblog from whom I got the tip!]
- jrjacobs's blog
- Add new comment
- Email this blog
- 580 reads
Google and the Search for Federal Government Information
Submitted by jajacobs on Mon, 2008-09-01 07:57.A couple of weeks ago, Bonnie Klein, of the Defense Technical Information Center, submitted a comment here with a link to an article she wrote about the effectiveness of using Google and other commercial search tools to find government information. I recommend it highly:
- Google and the Search for Federal Government Information, by Bonnie Klein, Against The Grain, v.20, no. 2 April 2008.
In it, Klein notes that "Google and other search engines are commercial enterprises, not public utilities." She addresses in particular the fact that government information gets no priority in ranking of search results: "Business operations and revenue-generating advertising partnerships, not altruism, factor into page ranking."
The article examines legal, technical, commercial, and copyright issues, and includes many useful citations.
For example, she quotes, Donna Bogatin from ZDNet, who observes that "By requiring that Web pages have inbound links from third-party Web sites, the PageRank based algorithm may result in automatic exclusion of the most relevant pages for a given query simply because no other Websites have linked to them." (Google Search Page Rank Excludes Relevant Websites, by Donna Bogatin. ZDNet, January 26, 2007).
This is a good reminder of how government web sites that make it difficult to link to documents ("Documents that exist within databases on GPO Access cannot be bookmarked") automatically lower their PageRank.
Thanks, and a tip of the hat to Bonnie for this useful article!
See also: Hiding in Plain Sight: Why Important Government Information Cannot Be Found Through Commercial Search Engines, Center for Democracy and Technology.
- jajacobs's blog
- Add new comment
- Email this blog
- 490 reads
Google VP for search quality talks about searching
Submitted by jajacobs on Sat, 2008-06-07 07:00.An interesting interview with Udi Manber, vice president in charge of search quality at Google:
- At Google, a search guru's dream comes true, by Stephen Shankland, CNet, June 5, 2008.
Manber says that even as recently as the early 1990s searching "was done by professionals in various limited domains. There was legal search, there was medical search, there was chemical search, and some limited news search. And it was done by a searcher--professional people....The idea that people will do the search themselves--that it'll democratize the whole thing and you don't have to go to a professional--that's the revolution."
He also says that Google "tunes" search results based on where you are physically in the world:
The other difference is it depends on location. If you do the same search from a different country, you get different results, even if it's the same language. We will tune the results by the country in which you're searching. It's by language and location.
- jajacobs's blog
- 5 comments
- Email this blog
- 1354 reads
Pentagon Audit of Iraq Spending
Submitted by jajacobs on Fri, 2008-05-23 08:56.I always find it odd when news reports cite government documents without giving a link or good reference to them. It seems to me that this is something news web sites should do regularly. These reports are not always that easy to track down. Case in point: today's New York Times has a story about a Pentagon report:
- Iraq Spending Ignored Rules, Pentagon Says, By James Glanz, New York Times, May 23, 2008
A Pentagon audit of $8.2 billion in American taxpayer money spent by the United States Army on contractors in Iraq has found that almost none of the payments followed federal rules and that in some cases, contracts worth millions of dollars were paid for despite little or no record of what, if anything, was received.
Using Google to search on the title of the report plus "site:.gov" yields nothing this morning, although the report is available from two different government web sites.
The report is available at the site of the House Committee on Oversight and Government Reform, Committee Holds Hearing on Accountability Lapses in Multiple Funds for Iraq, Wednesday, May 21, 2008, along with other statements and documents.
- Internal Controls Over Payments Made in Iraq, Kuwait and Egypt, U.S. Defense Department, Office of the Inspector General, Report No. D-2008-098, (May 22, 2008).
It is also available at www.dodig.mil/audit/reports with this url: www.dodig.mil/audit/reports/fy08/08-098.pdf The same google search with "site:.mil" substituted for "site:.gov" finds the title in a May 22 "what's new" story on the home page www.dodig.osd.mil of the Office of the Inspector General.
This is the second report I have looked for this week that is available as a PDF document on a government web site that google has (evidently) not indexed full text. I do not know if this reflects a google policy or just a delay in indexing.
- jajacobs's blog
- Add new comment
- Email this blog
- 674 reads
Creating Gov Doc "Libraries" in Google Books
Submitted by blakeley on Tue, 2008-03-04 23:49.Digitized Government Documents in Google Books has been written about quite a lot over here at the FGI and I'd like to revisit this topic again but with a different focus.
I was searching for Civil War era government documents for a History Professor, and I realized that we did not own one of the documents he sought. Before suggesting that he interlibrary loan a copy of this document, I decided to search online for a full-text digitized version. Alas, it did not exist in the digital realm, but I did find some other digitized gov docs pertaining to his research needs in Google Books. We were both elated, he because I had found what he needed, and I because so many documents I found digitized on Google Books were the same documents we had lost to mold and water damage from Hurricane Rita!
Out of curiosity, I did a Google Book search for other types of government publications and found these gems:
Trial of the Conspirators, for the Assassination of President Lincoln
Illustrations of the Gross Morbid Anatomy of theBrain in the Insane (isn't that a Cypress Hill song? Nevermind...) by the Government Hospital for the Insane.
How it Feels to be the Husband of a Suffragette (not published by the Government Printing Office, but it is a book housed in the National American Woman Suffrage Association Collection in the Library of Congress).
Official Records of the Union and Confederate Navies in the War of the Rebellion
Most of these documents were scanned at large research universities or depositories, but the quality is not always decent andcan sometimes border on the illegible. I was quite amused when I discovered a staff person's hand digitized on this document's cover:

However, there are bigger snafus than a digitized librarian's hand. For example, despite government documents being in the public domain, Google Books treats most post-1922 (i.e. post-copyright law) government documents as copyrighted material by only allowing a limited view! For more details, please read James Jacobs' post on this issue.
Despite all these issues (which have yet to be resolved), I decided to take advantage of the access to full-text, pre-1922 government documents and create a McNeese Gov Docs "Library"account in Google Books for my depository. The account also allows you to subscribe to updates of its holdings via an RSS feed. I put a link to the library account and the RSS feed on my depository's homepage and our "Gov Guides" wiki. I'll add more of these interesting and old documents as I come across them, especially those pertaining to Louisiana or documents that were lost to Hurricane Rita.
Here are some tips for finding gov docs in Google Books: Use Advanced Search, and in the Publisher field, type in Govt OR GPO OR "Government Printing Office". You can also search by agency, (i.e. "Department of the Interior") by typing the name of the agency in the Author field.
Have fun exploring and building your own digital collections, but please let me know if you find some really cool gov docs, ok?
- blakeley's blog
- 8 comments
- Email this blog
- 1524 reads
The internet, Google, libraries
Submitted by Susannaleers on Sat, 2008-02-23 07:19.Thanks to my fellow blogger for the link to Fister's article. I recently came across an article (linked through that splendid online publication brought to us by the Chronicle of Higher Education, Arts and Letters Daily) that offers more food for thought. Titled Better than Free,this thoughtful piece by Kevin Kelly of Wired discusses how we might be able to add value to the vast amount of free information now available - value that people will be willing to pay for. He says that "The internet is a copy machine....When copies are super abundant, they become worthless. When copies are super abundant, stuff which can't be copied becomes scarce and valuable. When copies are free, you need to sell things which can not be copied. Well, what can't be copied?" He discusses eight "generative values" that are better than free:
immediacy, personalization, interpretation, authenticity, accessibility, embodiment, patronage and findability. All of these are relevant to librarians as we adjust our skill sets to provide information available on the internet.
Fister on Privacy, Facebook, Google, Libraries
Submitted by jajacobs on Fri, 2008-02-22 10:42.This is a very useful and thoughtful piece that starts with musings on Facebook and privacy issues and addresses much larger issues that affect libraries and library users and academic publishing. This is a must read.
- Face Value, By Barbara Fister, Inside Higher Ed (Feb. 18, 2008).
Sample:
Libraries have always taken privacy seriously - not because it's valuable in itself, but because it's a necessary condition for the freedom to read whatever you want without risk of penalty. When the PATRIOT Act was passed, librarians checked to make sure their databases erased the connection between a book and its borrower as soon as the book was returned. That erasure, however, makes it harder to offer the kind of personalization, such as recommendations based on previous book choices, that the public increasingly expects from online systems. After all, it's what they get from Amazon.
...[W]e've barely begun to examine the unintended consequences of the Faustian bargain we strike when we share content through privately-owned digital domains of the public sphere.
Joe Esposito pointed to this article in a posting to the liblicense-l mailing list and he says:
As I was reading this, I reflected on an ongoing conversation with a friend of mine, a former Congressional staffer, about the growing political need for Google to be declared a regulated public utiility, like the AT&T of yesteryear. Too much power in the hands of too few: it's morally wrong, and socially dangerous.
I would just add to this that, when we rely on the government to be the only official repository of all government information, we are putting too much power in the hands of too few. We are allowing the government to be the only entity that controls access to that information and the privacy or lack of privacy of all readers of that information. The solution to that is to build collections of digital government information is libraries. We have barely begun to understand the Faustian bargain we strike when we share content through a single government-controlled digital repository.
Fister is a librarian at Gustavus Adolphus College. Her blog is barbara fister's place.
- jajacobs's blog
- Add new comment
- Email this blog
- 706 reads
Most fed data is un-Googleable
Submitted by jajacobs on Thu, 2007-12-27 12:00.As we've noted here before (Is your search engine finding the government information you need?), the problem of relying on commercial search engines to find government information is that a lot of government information on the web goes un-indexed by those search engines.
- Most fed data is un-Googleable By Jason Miller FCW (December 17, 2007). "After five years, a major E-Gov Act provision goes unmet because of search problems."
Sen. Joseph Lieberman (I-Conn.), chairman of the Homeland Security and Governmental Affairs Committee, says "There are more than 2,000 federal government Web sites not included in commercial search engine results. Is it accidental, or is there a policy, or it is just laziness? I would like to know why" and "Agencies do not let commercial search engines index their sites."
I wonder if that is true? I wonder if there is any document librarian who can answer that question or point to which sites are not indexed?
It is probably closer to the truth to say, along with John Needham, Google's manager for public-sector content partnerships, that government "databases" are being missed by web crawlers and that "Agencies are concerned more about how information is presented than if users are finding it." In other words, agencies would probably like to have their information indexed, but haven't figured out how to do so, or don't have the budgets to do what is necessary. It probably isn't "laziness" but lack of funds and other resources; it probably is sometimes "accidental" in that some may not know what to do. It is probably sometimes even "policy" -- but probably less often.
But, one big problem is that we don't really know the scope of the problem or the cause. FDLP librarians should be pushing GPO, researchers, and library schools to research these issues so we have answers.
- jajacobs's blog
- Add new comment
- Email this blog
- 715 reads
Is your search engine finding the government information you need?
Submitted by jajacobs on Tue, 2007-12-11 11:34.The Center for Democracy and Technology (CDT) and OMB Watch have released a report, Hiding in Plain Sight: Why Important Government Information Cannot Be Found Through Commercial Search Engines (Dec. 11, 2007), that highlights "a critical gap in online access to vital government information." The press release for the report says:
"It is unclear if agencies know there is a roadblock between the public and their information and have not taken the adequate steps to correct the problem, or if the agencies simply do not realize that their important information is not being found and indexed by search engines," said Sean Moulton, Director of Federal Information Policy for OMB Watch. "In today's Internet age, either answer is unacceptable."
The report uses several search examples that Americans might expect to result in access to trustworthy government information. Instead, the results overlook a vast amount of useful government information.
The report is also included in the testimony of Ari Schwartz, Deputy Director, Center for Democracy and Technology at the December 11, 2007 Hearing of U.S. Senate Committee on Homeland Security and Governmental Affairs on E-Government 2.0: Improving Innovation, Collaboration, and Access. See: CRS Reports, E-government, Thomas, indexing of the government web, and more!
- jajacobs's blog
- Add new comment
- Email this blog
- 1423 reads
Research Libraries Question Google Book Scanning Restrictions
Submitted by jajacobs on Sun, 2007-10-21 14:08.Google's book-scanning project and restrictions that Microsoft places on books it scans in a similar project continue to attract attention, praise... and controversy. This article in the International Herald Tribune outlines some of the key problems of commercializing information in libraries and of libraries outsourcing one of their key functions.
- Research libraries close their books to Google and Microsoft, by Katie Hafner, International Herald Tribune, October 19, 2007.
Hafner notes that "Several major research libraries have rebuffed offers from Google and Microsoft to scan their books into computer databases, saying they were put off by restrictions these companies wanted to place on the new digital collections."
One particular example demonstrates how Google's business plan simply does not allow for adequate scholarly access and use. Tom Garnett, director of the Biodiversity Heritage Library, a group of 10 prominent natural history and botanical libraries tells the story.
Garnett said the most striking example of this came when he asked the Google representatives about a theoretical example.
"We asked, 'Suppose we allowed you to digitize all our literature, and there was an ant researcher who wanted to peel off 10,000 pages of ant literature and load it on his own server and perform advanced analysis to correlate it with climatological data over the last 100 years, using software he had developed to study trends in species research,'" Garnett recalled.
He said the Google executives told him this would not be possible. "They said, 'We'd be sympathetic but it doesn't fit in with our model.'" Smith [Adam Smith, project management director of Google Book Search] ... said this was not the case. "It's certainly something we would work with libraries to do," he said.
The Open Content Alliance (OCA) offers an alternative to the Google project, but Hafner says that Microsoft, after joining the Open Content Alliance in 2005, "added a restriction that prohibits a book it has digitized from being included in commercial search engines other than Microsoft's". This was news to me and I was not able to confirm that.
Paul Duguid, an adjunct professor at the School of Information at the University of California at Berkeley and author of The social life of information, says, "There are two opposed pathways being mapped out. One is shaped by commercial concerns, the other by a commitment to openness, and which one will win is not clear." And Doron Weber, a program director at the Sloan Foundation, which has made several grants to libraries for digitization, says, "You don't want any for-profit company having control of the world's knowledge."
[The article was online on Saturday morning October 20, but I have been unable to find it on the IHT web site since then. A copy is available here. The article is in LexisNexis and can be found by doing an "easy search" on "Major U.S. and World Publications" on the phrase "research libraries have rebuffed offers from Google" (including the quotation marks).]
[UPDATE: the article is now available on the NYT website:
http://www.nytimes.com/2007/10/22/technology/22library.html ]
See also: On Google's Monetization of Libraries, By Rory Litwin, Library Juice 7:26 (December 17, 2004).
- jajacobs's blog
- 1 comment
- Email this blog
- 2293 reads
Questioning the power of Google
Submitted by jajacobs on Sun, 2007-10-21 10:55.Google. Who's looking at you?, by John Arlidge, The Sunday Times, October 21, 2007. "It wants to know everything about you. It wants to be your best friend -- or your Big Brother. Are your secrets safe with Google?"
Google's overall goal is to have a record of every e-mail we have ever written, every contact whose details we have recorded, every file we have created, every picture we have taken and saved, every appointment we have made, every website we have visited, every search query we have typed into its home page, every ad we have clicked on, and everything we have bought online. It wants to know and record where we have been and, thanks to our search history of airlines, car-hire firms and MapQuest, where we are going in the future and when.
This would not just make Google the largest, most powerful super-computer ever; it would make it the most powerful institution in history. Small wonder that the London-based human-rights group Privacy International has condemned its plans as "hostile to privacy", and EU ministers called Google's vision "Orwellian". Even John Battelle, one of the net's leading evangelists, who co-founded the technology bible Wired magazine, and wrote The Search, the definitive study of Google's rise, now says: "I've found myself more and more wary of Google, out of some primal, lizard-brain fear of giving too much control of my data to one source."
(see also: Google: "We don't know enough about you"... yet.)
- jajacobs's blog
- Add new comment
- Email this blog
- 904 reads
The Googlization of Everything, "Drop the fight"? or Start a Revolution?
Submitted by jajacobs on Fri, 2007-10-19 10:13.Siva Vaidhyanathan is writing his next book, The Googlization of Everything: How One Company is Disrupting Culture, Commerce, and Community--and Why We Should Worry, and will be posting snippets of text on the blog, The Googlization of Everything and asking readers for comments. He says of the book:
The book will answer three key questions: What does the world look like through the lens of Google?; How is Google's ubiquity affecting the production and dissemination of knowledge?; and how has the corporation altered the rules and practices that govern other companies, institutions, and states?
Vaidhyanathan is an associate professor of media studies and law at the University of Virginia, a fellow at the Institute for the Future of the Book, and the author of Copyrights and Copywrongs and The Anarchist in the Library.
"Drop the fight"?
To those of us at FGI, the increasing reliance by libraries on Google is something that needs close scrutiny. I was dismayed to read recently that an associate dean for public services and collection development said, that because Google does the "search function" "better than so far any library can do" therefore "..what would be in our best interests is to drop the fight, to let Google take over that..." (Susan Gibbons quoted in Young Librarians, Talkin' 'Bout Their Generation by Scott Carlson, Chronicle of Higher Education, October 19, 2007). To say that finding information can be done only one way and the (apparently) winner-take-all popularity-contest and keyword-in-text approaches used by google are so good that libraries should "drop the fight" is just plain short-sighted and an abrogation of our responsibility (IMHO). For a different point of view, see On Google's Monetization of Libraries, By Rory Litwin, Library Juice 7:26 (December 17, 2004).
- jajacobs's blog
- Add new comment
- Email this blog
- 961 reads
New Feature: Google Books/Fed Docs
Submitted by dcornwall on Wed, 2007-08-15 12:17.FGI is pleased to announce a new occasional series that will examine how Google Books treats US Federal Documents. These posts will have titles that begin with "Google Books/Fed Docs".
We're very pleased to have a guest researcher putting up these posting. Please give a warm FGI welcome to Julia Tryon, Government Documents Librarian of the Phillips Memorial Library at Providence College. Julia has started to gather statistics and other information about the tens of thousands of government documents that have been scanned by the Google Books projects. We posted her first govdoc-l message on this subject.
Julia has agreed to start blogging on this subject for FGI. Unlike our "Blogger of the Month" series, Julia will post whenever she finds something interesting at the intersection of US Federal Documents and Google Books until she feels that she's exhausted the subject.
Take it away Julia!
- dcornwall's blog
- Add new comment
- Email this blog
- 1916 reads



Recent comments
13 hours 12 min ago
16 hours 39 min ago
1 day 18 hours ago
1 day 19 hours ago
1 week 1 day ago
1 week 1 day ago
1 week 1 day ago
1 week 3 days ago
1 week 4 days ago
2 weeks 1 day ago