Gary has a nice summary of how Chronicling America Has a New Look! (INFOdocket, by Gary D. Price, May 29, 2011).
Chronicling America is a Website providing access to information about historic newspapers and select digitized newspaper pages, and is produced by the National Digital Newspaper Program (NDNP). NDNP, a partnership between the National Endowment for the Humanities (NEH) and the Library of Congress (LC), is a long-term effort to develop an Internet-based, searchable database of U.S. newspapers with descriptive information and select digitization of historic pages. Supported by NEH, this rich digital resource will be developed and permanently maintained at the Library of Congress. An NEH award program will fund the contribution of content from, eventually, all U.S. states and territories.
Agencies identify 78 services for cloud transition, By Joseph Marks, NextGov (05/26/2011).
Federal agencies have identified 78 computer systems they plan to migrate to the cloud within a year, according to the Office of Management and Budget.
...Computer clouds essentially are large banks of computer servers that can operate much closer to full capacity than standard servers by rapidly repacking data as one customer surges in usage and another one dips. Data storage in the cloud is operated like electricity grids or other utilities, with customers paying only for what they use.
A handful of low-risk government services, such as websites that don't take in sensitive public information, are already in privately owned cloud space.
Agencies Have Identified 78 Systems Migrating to the Cloud Within One Year (10 page PDF listing and describing services by agency).
Preserving History at the U.S. National Library of Medicine, by Kristi Davenport, Peter Gabriele, Stephen Greenberg, Holly Herro, Christie Moffatt, Paul Theerman, and Jeffrey S. Reznick, History News Network (May 23, 2011).
In this landmark year for the U.S. National Library of Medicine (NLM)--its 175th anniversary--its staff have been facing down nature through a project that looks to the next 175 years and beyond. Engaging in the emerging field of forensic conservation--a cross-over application of forensic science and state-of-the-art analytical technologies--staff are seeking to protect and save for future generations one of the most important historical documents of the twentieth century: the first summary of the genetic code, created by the American biochemist and 1968 Nobel Laureate, Dr. Marshall Nirenberg (1927-2010), whose papers the Library makes publicly available through Profiles in Science, the NLM's premier digital manuscript project that celebrates twentieth-century leaders in biomedical research and public health.
Written in multiple blue-ballpoint pen inks on several sheets of 8 1/2" x 11" paper taped together with pressure-sensitive tape, the Nirenberg genetic code chart records the author's deciphering of the genetic code contained in DNA.
State Agency Databases Report 6/29/2011
There's been new activity for the State Agency Databases Across the Fifty States project on the ALA GODORT wiki. I've decided to start making occasional reports when it seems like there is enough activity to justify a report.
Databases removed from project pages due to dead and apparently unrecoverable links:
- Who's Who in Arizona GIS
- WRA/WRITE Project Database
- Brownfields Search Utility
- Brownfields SiteMart
- Bike project database
For details on the above, click on the "history" tab of the state page and click on a previous version.
If you know of a new link for one of the above items, please let me know.
Databases ADDED to project pages:
- Doctor Search (Arizona Medical Board)
- ADWR Image Records Database (Water Resources)
- AZURITE License Application Query Utility
- Project volunteers who have not updated their pages since January 2010 are being contacted about their continued participation in the project.
- Project pages without volunteers are in the process of being link checked.
If you have questions or comments about this project, feel free to contact me at email@example.com
A new report on link rot on the blog of the social bookmarking service Pinboard:
- Remembrance of Links Past, by Maciej Ceglowski, Pinboard Blog (May 26, 2011).
Using a random sample of 300 URLs stored in Pinboard for each year 1997-2011 (based on the year the bookmark was created), Ceglowski says "you can expect to lose about a quarter of them every seven years."
Ceglowski also shows year by year lists and the raw data.
In my quick look: of the 47 .gov web sites checked over the entire period 7 were not found (status code 404) and 7 had internal server errors (status code 500) for a 29.7% link rot rate.
Hat tip to INFOdocket!
While working on the State Agency Databases Across the Fifty States project updating Arizona databases, I ran across this resource:
I don't think this is a deliberate attempt on Arizona's part to discourage access to records, but it does illustrate that there is much more to providing access to public records than simply putting them online. Ideally an agency considers the formats that citizens are likely to use and uses technologies that require low levels of involvement from users. Or, failing that, posting the data in an easy to manipulate format (spreadsheets, xml, pdf, etc) that other people and organizations can use to provide better access.
The United Nations and the Inter-Parliamentary Union (IPU) recently released the World e-Parliament Report 2010. The Report, prepared by the Global Centre for ICT in Parliament, is based on the results of the Global Survey of ICT in Parliaments conducted by the Global Centre for ICT in Parliament between July and November 2009, to which 134 parliamentary assemblies responded.
The 134 parliaments were surveyed on a number of issues, including whether or not they make the work of their parliamentary research services available to the public. Daniel Schuman, Policy Counsel for the Sunlight Foundation, contacted one of the report's authors, and asked for the underlying data on which countries make their CRS-like reports publicly available. Although they could not share that specific data, they told Daniel how many countries made those reports available.
Answers to Daniel's questions were provided in an email from the Global Centre for ICT in Parliament, for which Daniel had permission to make public. Here are the highlights:
With regard to parliamentary chambers within members of the G-20:
- Parliamentary chambers in 16 of the 20 members of the G-20 responded to the 2009 survey. Because the European Union is a member of the G-20, the European Parliament is included in this group of 16. The names of all parliaments and chambers that participated in the 2009 survey can be found on page 5 of the Report.
- Parliamentary chambers in 4 of the G-20 members did not participate in the survey.
- Parliamentary chambers in 13 of the 16 G-20 members who responded to the survey reported that they did have subject matter experts on public policy issues who provide research and analysis for members and committees.
- Parliamentary chambers in 3 of the G-20 members reported that they did not have subject matter experts on public policy issues who provide research and analysis for members and committees.
- Parliamentary chambers in 11 of the 13 G-20 members who reported that they did have subject matter experts on public policy issues also reported that they make the results of that research and analysis available to the public. This represents 85% of the G-20 members whose chambers have subject matter experts (11/13).
- A number of the parliaments among the 13 who have subject matter experts are bi-cameral. These 13 therefore include a total of 19 separate chambers. Of these, 16 (84%) have subject matter experts whose work is made available to the public (16/19).
- NOTE: the Global Centre for ICT in Parliament has assured all participants of the confidentiality of their responses to the survey and that the names of individual chambers have not been provided in this correspondence.
Thought this might be interesting and helpful. Thanks Daniel for sharing this information.
A new Web site, the Geospatial Data Preservation Resource Center, aims to help those responsible for producing and managing geospatial data learn about the latest approaches and tools available to facilitate long-term geospatial data preservation and access. The Web site provides descriptions and links for a variety of relevant resources, including education and training modules, useful tools and software, information on policies and standards for preserving geospatial data, and examples of successful preservation and associated benefits. This first release of the Web site, which CIESIN will be enhancing over the next year, was developed as an element of the National Digital Information Infrastructure and Preservation Program (NDIIPP) of the Library of Congress.
The Geospatial Data Preservation Resource Center is accessible at http://geopreservation.org/
CIESIN, the Center for International Earth Science Information Network, is a unit of the Earth Institute at Columbia University, based at the Lamont campus in Palisades, New York.
Eleven Words in Pentagon Papers to Remain Classified, by Steven Aftergood, Secrecy News (May 26th, 2011).
The Pentagon Papers that were leaked by Daniel Ellsberg four decades ago have been formally declassified and will be released in their entirety next month -- except for eleven words that remain classified.
David S. Ferriero, the Archivist of the United States, announced the surprising exception to the upcoming release of the Papers at a meeting of the Public Interest Declassification Board on May 26.
The Real Pentagon Papers, by A. J. Daverede, National Declassification Center blog (May 26, 2011)
The conditions under which the copies of the Report [leaked by Daniel Ellsberg] were made and distributed, coupled with the speed with which the copies were distributed and the urgency to publish the material, meant that the newspaper and magazine releases of the Papers covered only a very small portion of the 7,000 page Report.
The copies of the Report that were leaked to Congress ultimately had better luck in publication. Ultimately, Senator Mike Gravel (D, Alaska) made available his copy of the Report to the publishing house of Beacon Press, located in Boston. The Beacon Press editions, published in 1971 in both hard and soft cover versions, were the definitive account of the Report until now. However, Beacon Press had its own copy problems that led to words, paragraphs, and even full pages of the Report being deleted, possibly due to the quality problems in the copy received from Senator Gravel.
...NARA's June release of the Report of the OSD Vietnam Task Force will present the American public with the first real look at this historic document. It is true that 11 words are redacted from one page of a 7,000 page report. However, this is very much a complete release of the Report.
Sen. Ron Wyden (D-Oregon) says that the government applies a broad legal interpretation of certain provisions of the "P.A.T.R.I.O.T Act" and has classified that interpretation so that it cannot be publicly assessed or challenged.
- There’s a Secret Patriot Act, Senator Says, By Spencer Ackerman, Wired (May 25, 2011).
Wyden says he "can't answer" any specific questions about how the government thinks it can use the Patriot Act. That would risk revealing classified information -- something Wyden considers an abuse of government secrecy. He believes the techniques themselves should stay secret, but the rationale for using their legal use under Patriot ought to be disclosed.
- The Secret PATRIOT Act and the End of Limited Government in America, by E.D. Kain, Forbes (May 26, 2011).
Apologists for the PATRIOT Act have claimed that the innocent have nothing to fear from the government’s broadened powers.
At isssue is the so-called "business-records provision" of the Act (Section 215) which empowers the FBI to get businesses, including libraries, to turn over records it deems relevant to a security investigation.
Sen. Wyden Decries “Secret Law” on PATRIOT Act, by Steven Aftergood, Secrecy News (May 25th, 2011)
"We can have honest and legitimate disagreements about exactly how broad intelligence collection authorities ought to be, and members of the public do not expect to know all of the details about how those authorities are used," Sen. Wyden said. "But I hope each Senator would agree that the law itself should not be kept secret and that the government should always be open and honest with the American people about what the law means."
But the Senate moved toward cloture on reauthorization of the PATRIOT Act provisions and the Wyden amendment, which was co-sponsored by several Senate colleagues, was not permitted to be offered or to be voted upon.
Roy updates us on the status of the Wayback machine with an example from the White House:
- Back to the Wayback Machine, Roy Tennant, Library Journal (May 18th, 2011).
But that means that any claims to be "archiving the web" should be taken with a grain of salt. Maybe say "archiving the parts of the web that matter" or "ignoring what doesn’t matter so much".
And, don't forget Archive-It, the web archiving service from Internet Archive.
Through a user-friendly web interface, Archive-It partners can catalog, manage, and browse their archived collections using web archiving tools developed at the Internet Archive. Collections are hosted at the Internet Archive data center and are accessible to the public, including full-text search.
You just never know when and when and where U.S. baby name info might be needed.
Now, iOS users who need the info can access SSA material using a new app.
In the past couple of days the Social Security Administration has introduced a new iPhone app (it should also work on iPad and iTouch devices) that provides access to 130 years of SSA info that you can browse or search.
A "surprise me" button is available and you can also shake the iOS device to get info about a random name.
According to the iTunes App Store here's what the the free app includes:
+ "Browse or search over 130 years of baby name data — over 45,000 unique names!"
+ "Save your own favorite names list and share them on Facebook, Twitter, or via an email."
+ "View baby name popularity trends"
+ "A Trivia game to test your baby name knowledge"
+ "View government-approved baby and family related services and information"
Direct to App Store:
Following up on Google cancelling its newspaper digitization project, here are two more stories:
- Google archive decision 'astonishing' to Ottawa originator, by Vito Pilieci, The Ottawa Citizen (May 24, 2011).
Google’s decision to end support for its newspaper archival services is distressing news for the Ottawa businessman who sold Google the technology to digitize records.
"It’s disappointing, especially when you consider what I thought that this would do," said Bob Huggins, former chief executive officer and co-founder of PaperOfRecord.com, which Google bought in 2008 for an undisclosed sum.
...Huggins suggested that Google should partner with public sector institutions, such as the Library of Congress in the United States, to continue the newspaper digitization effort. The information could be stored by the library for safekeeping and made available online for everyone to read.
"They need to give their head a shake here and realize they have some public responsibility," added Huggins. "For a company that said they wanted to organize all of the world’s data, what happened to that mandate?"
While Google has shifted its focus away from digitizing historical newspapers, the company is trying to work hand-in-hand with newspapers to help them charge for content on their websites.
- Demise of Google Newspaper Archive Shows Need for National Digital Library Policy, by Irvin Muchnick, Beyond Chron, The San Francisco Alternative Daily (May 25, 2011).
...the collapse of the project reinforces the limits of self-appointed public utilities. Apparently, the newspaper archive wasn't getting enough eyeballs to make the project profitable.
The Library of Congress will address these issues:
- Determine which aspects of current metadata encoding standards should be retained and evolved into a format for the future. We will consider MARC 21, in which billions of records are presently encoded, as well as other initiatives.
- Experiment with Semantic Web and linked data technologies to see what benefits to the bibliographic framework they offer our community and how our current models need to be adjusted to take fuller advantage of these benefits.
- Foster maximum re-use of library metadata in the broader Web search environment, so that end users may be exposed to more quality metadata and/or use it in innovative ways.
- Enable users to navigate relationships among entities--such as persons, places, organizations, and concepts--to search more precisely in library catalogs and in the broader Internet. We will explore the use of promising data models such as Functional Requirements for Bibliographic Records (FRBR) in navigating relationships, whether those are actively encoded by librarians or made discernible by the Semantic Web.
- Explore approaches to displaying metadata beyond current MARC-based systems.
- Identify the risks of action and inaction, including an assessment of the pace of change acceptable to the broader community: will we take incremental steps or take bolder, faster action?
- Plan for bringing existing metadata into new bibliographic systems within the broader Library of Congress technical infrastructure--a critical consideration given the size and value of our legacy databases.
OMB to cut two transparency programs, suspend others, By Joseph Marks, NextGov (05/24/2011).
Two transparency initiatives and numerous improvements to open government programs will be scrapped as a result of a 75 percent funding cut lawmakers agreed to in April to avert a federal shutdown, officials said Tuesday.
...The government's most visible and popular transparency initiatives will stay in operation, such as USAspending.gov, which tracks government spending on contracts, and the IT Dashboard, which details spending on information technology projects. But planned improvements will be put on hold, according to Kundra's letter.
Kundra on e-gov cuts: no project unaffected, by Daniel Schuman, Sunlight Foundation (May 24, 2011).
Data quality may suffer....New data will be harder to come by.
Recently, President Obama issued an executive ordering the streamlining of federal websites. Last week, OMB Watch sent a letter to the Office of Management and Budget with recommendations for its guidance on implementing the order including the suggestion that "customer service doesn't always look like filling out a form or receiving a payment. Providing information is a major government service...."
Informing and engaging the public is a critical government service for many agencies, and improving those services should properly be considered within the scope of the order;
Successfully soliciting meaningful customer feedback requires embracing the principles of participation and collaboration embodied in President Obama’s memorandum on transparency and open government; and
Agencies should be mindful that, although they may use customer service considerations to improve their interactions with regulated entities, their true customers are always the American people and not the regulated community.
-- Letter, (May 13, 2011) to Jeffrey Zients Office of Management and Budget, Re: Executive Order 13571 Streamlining Service Delivery and Improving Customer Service, from Sean Moulton Director and Gavin Baker, OMB Watch.
The letter goes on to note that "regulated entities are not the 'Customer,'" saying: "the order should not be seen as permission to develop an overly familiar relationship with regulated entities or place too much emphasis on the stated needs of their regulated communities. We urge OMB to include guidance that reminds agencies that the public is the primary customer and cautions agencies from overly identifying the regulated entities as customers."
Ric Davis, former acting Superintendent of Documents of the Govt Printing Office -- and FGI guest blogger :-) -- has just been named Chief Technology Officer (CTO) of GPO (PDF). Congratulations to Mr. Davis. I'm sure his prior work with the FDLP, FDsys, and GPO Library Services and Content Management (LSCM) will put him in good stead to do the ongoing technological work needed at the GPO.
Dawn Presents Wikileaks' Pakistan Papers, Dawn, Karachi, Pakistan.
The Dawn Media Group and Julian Assange, Chief Executive of Sunshine Press Productions, the publishing arm of WikiLeaks, have signed a Memorandum of Understanding for the exclusive first use in Pakistan of all the secret US diplomatic cables related to political and other developments in the country.
...On May 20, 2011, Dawn carried the first set from a huge cache of cables, and will continue to publish them in the following days. Dawn’s extensive coverage will include publication of the actual cables as well as specially commissioned stories around them. All cables referenced are available for viewing in their original form on Dawn.com.
Newspaper in Pakistan Publishes WikiLeaks Cables, By JANE PERLEZ, New York Times, (May 20, 2011).
The leading English-language newspaper, Dawn, and its Web site on Friday began publishing a selection of more than 4,000 American diplomatic cables obtained from WikiLeaks that are devoted to Pakistan, opening a window onto the American-Pakistani relationship and domestic politics never before seen here.
On a tangent: Google is cancelling its newspaper digitization project:
- Google abandons master-plan to archive the world's newspapers, by Carly Carioli, The Boston Phoenix (May 19 2011).
Google told partners in its News Archive project that it would cease accepting, scanning, and indexing microfilm and other archival material from newspapers, and was instead focusing its energies on "newer projects that help the industry, such as Google One Pass, a platform that enables publishers to sell content and subscriptions directly from their own sites."
...In an email, Google said it would continue to support the existing archives it has scanned and indexed. It added, "We do not, however, plan to introduce any further features or functionality to the digitized news product."
The papers that were scanned and indexed (not all scanned papers were indexed) are, apparently, still available through http://news.google.com/archivesearch (See, the about page for more inforamtion.) The Boston Phoenix article speculates on this:
It remains to be seen whether Google will complete the process of indexing the newspapers it has scanned. We'd guess not.
But wait, there's more:
...The deal Google struck with partner newspapers stipulated that, somewhere down the line, a paper could purchase Google's digital scans of its content for a fee. That fee is now being waived, and Google is not only giving publishers free access to the scanned files, but also the rights to publish them with other partners. In essence, Google just scanned a huge chunk of the newspaper industry's valuable long-tail content, and then handed it to the publishers.... Are any of us is in a position to exploit those resources without Google's help?
I wonder if any libraries or library consortium is willing to strike a deal with the publishers and put those scans online?
BTW, this is evidently completely separate from the Google purchase of PaperOfRecord, which, after going offline after the Google purchase, is now, apparently, back online:
Zombie Apocalypse Claims its First Victim: a CDC Blog, By Joseph Marks, NextGov (05/19/11).
The Centers for Disease Control and Prevention want us to be prepared for the zombie apocalypse, according to a recent post on CDC's public health blog.
But an agency tweet touting that post Wednesday night sent so many people rushing to the blog, presumably wide-eyed and arms outstretched, that it crashed the site for several hours, a CDC communications official told Nextgov.
Social Media: Preparedness 101: Zombie Apocalypse, Centers for Disease Control and Prevention.
The Public Library Manifesto, by David Morris, Yes Magazine, (May 06, 2011).
We need a grassroots effort to defend our public libraries, an effort that can and should be part of a growing nationwide and international effort to defend the public sphere itself.
Could Pass-The-Hat Kill Open Government?, By Joseph Marks, NextGov (05/18/11).
With the E-Gov fund, which paid for open government websites such as data.gov and usaspending.gov, severely trimmed and unlikely to be restored, the open government group OMB Watch speculated in an article Wednesday that those sites may have to turn to the so-called pass-the-hat funding model to stay in business.
The Trouble with the "Pass-the-Hat" Funding Model for Government Technology Projects, OMB Watch (May 17, 2011).
Rather than using funding expressly designated by Congress, a project under a pass-the-hat model is funded by contributions from other purpose accounts of multiple agencies, frequently prorated based on agency size or use. This model is different from "fee for service," which also involves agency payments but is usually in return for specific services, such as payroll.
Earlier this week I reported on a significant report that may be hard to find and preserve (Senate Anatomy Of Financial Collapse). Here is another example of a prominent government report that may be hard to identify and preserve.
- Wegman, E.J., Scott, D.W., Said, Y.H., 2006. Ad-hoc Committee Report on the ‘Hockey Stick’ Global Climate Reconstruction, "A Report to Chairman Barton, House Committee on Energy and Commerce and to Chairman Whitfield, House Subcommittee on Oversight and Investigations: Paleoclimate Reconstruction." (PDF, 1.5 MB), 91pp (missing page 1) [citation based examination of item and on a footnote in the Computational Statistics & Data Analysis article listed below.]
This report is in the news this week because a scholarly journal has withdrawn a paper based on the report.
- Climate study gets pulled after charges of plagiarism, By Dan Vergano, USA Today (May 15, 2011).
Based on information in another USA Today story, (Retracted climate critics' study panned by expert, By Dan Vergano, USA Today, May 19, 2011), the retracted paper is, apparently, this one (still available as of this morning from ScienceDirect):
Yasmin H. Said, Edward J. Wegman, Walid K. Sharabati, John T. Rigsby, Social networks of author-coauthor relationships, Computational Statistics & Data Analysis, Volume 52, Issue 4, 10 January 2008, Pages 2177-2184, ISSN 0167-9473, DOI: 10.1016/j.csda.2007.07.021.
A hearing before the same committee, from 2006, with testimony by Wegman is available from FDsys:
- House Hearing, 109Th Congress - Questions Surrounding The 'Hockey Stick' Temperature Studies: Implications For Climate Change Assessments, U.S. House of Representatives. Committee on Energy and Commerce, July 19, 2006, July 27, 2006, Serial No. 109-128, Y 4.C 73/8
Because climate change is a contentious political issue, it is easy to find on the web lots about the original "Wegman report" and the retraction of the journal article. It is not easy to find citations to the article that has been retracted (one citation in USA Today was apparently built from a Google Scholar search and breaks). Copies of the report with the missing first page are available at a number of web sites, but I could not find it in FDsys. The only "official" copy I found (linked above) was buried on the Committee web site.
One wonders if this report will be easy to find and attribute and authenticate in a year or ten years or fifty years.
As one of the new members of the LostDocs team I want to provide a little more information about my background. My professional interest in government information goes back to an undergraduate internship with my congressman in D.C. For the past thirteen years I have been working with government documents/information through the field of archives and also through work at an academic selective depository while completing my MLIS. My MA thesis focused on the local administration of a federal program—Urban Renewal. As an archivist I helped establish an archive focused on preserving materials related to USDA Child Nutrition Programs. So, government information—Federal, state, and local— has always been central to my work and research endeavors. My interest in the LostDocs project revolves around a concern for the permanent preservation of all formats of government information, but of particular concern and urgency is the need to ensure permanent access to electronic/digital born publications.
Very happy to be a part of the team.
It is not always clear to me if reports like this are centrally listed or cataloged anywhere and when or if or where they will end up in FDsys. This report is massive and important and seems worth listing here with links that may or may not last.
U.S. Senate. Committee on Homeland Security and Governmental Affairs. Permanent Subcommittee on Investigations.
- PSI Financial Crisis Report, "Senate Investigations Subcommittee Releases Levin-Coburn Report On the Financial Crisis" Press Release, (April 13, 2011) [PSIfinancialreportrelease041311.pdf] (PDF 86.2 KBs). (Another copy in HTML available at Committee Chairman Carl Levin's website.
Using their own words in documents subpoenaed by the Subcommittee, the report discloses how financial firms deliberately took advantage of their clients and investors, how credit rating agencies assigned AAA ratings to high risk securities, and how regulators sat on their hands instead of reining in the unsafe and unsound practices all around them. Rampant conflicts of interest are the threads that run through every chapter of this sordid story.
- Wall Street And The Financial Crisis: Anatomy Of A Financial Collapse, Majority And Minority Staff Report (April 13, 2011) (PDF 5.7 MBs, 646pp). Another copy at Committee Chairman Carl Levin's website.
- Footnote Exhibit Locator (by FN and Bates)
- FN 107 - 1342 (pgs 1-1037)
- FN 1343 - 1459 (pgs 1038-2164)
- FN 1462 - 1576 (pgs 2165-3003)
- FN 1584 - 1622 (pgs 3004-3448)
- FN 1623 - 2406 (pgs 3449-4484)
- FN 2409 - 2706 (pgs 4485-5459)
- FN 2724 - 2831 (pgs 5460-5901)
The People vs. Goldman Sachs, By Matt Taibbi, Rolling Stone, (May 26, 2011 issue)