Home » Posts tagged 'scrubbing' (Page 2)

Tag Archives: scrubbing

Our mission

Free Government Information (FGI) is a place for initiating dialogue and building consensus among the various players (libraries, government agencies, non-profit organizations, researchers, journalists, etc.) who have a stake in the preservation of and perpetual free access to government information. FGI promotes free government information through collaboration, education, advocacy and research.

DOD withdraws embarrassing report

The Defense Department has withdrawn from its web site a report that had exonerated it from using retired generals for propaganda.

In a highly unusual reversal, the Defense Department’s inspector general’s office has withdrawn a report it issued in January exonerating a Pentagon public relations program that made extensive use of retired officers who worked as military analysts for television and radio networks.

…In addition to repudiating its own report, the inspector general’s office took the additional step of removing the report from its Web site.

The DOD memo withdrawing the report:

The web page (dodig.mil/inspections/IE/Reports.htm) where the report originally was listed now only links to the withdrawal memo and DOD blocks that page from being archived by the Internet Archive and others. Cryptome, however, grabbed a copy and that copy is still available: ie-2009-004.zip (“Examination of Allegations Involving DoD Public Affairs Outreach Program, Department of Defense, Office of Inspector General, January 14, 2009, Report Number IE-2009-004”.)

(Libraries interested in preserving fugitive documents might well consider purchasing Cryptome DVDs for $25. Two DVDs of the Cryptome collection contain 47,000 files from June 1996 to January 2009, ~6.9 GB).

Amy Goodman interviewed David Barstow recently:

For earlier coverage of this issue on FGI, see: Military analysts.

Tracking the mutable government web

ProPublica’s ChangeTracker service has the potential for being very useful indeed. (See the FGI post about ChangeTracker by Rebecca Blakeley for more information about the service and ChangeTracker itself at ProPublica.)

The service tracks changes at the government web sites whitehouse.gov, recovery.gov and financialstability.gov. The service is similar to other change-tracking services that have been around for years (e.g., ChangeDetection) which automatically monitor changes on a web page and notify you anytime the page changes. In fact, it is built using Versionista (See Steal Our Code: How to Build Your Own Change-Tracking Feeds, by Brian Boyer, ProPublica, February 19, 2009.) See also, Steven Bell’s A Librarian’s Resource Center for Keeping Up for similar services. ProPublica goes one step further by setting up all the monitoring for you and doing side-by-side comparisons that highlight the changes.

Most changes are additions — new blog postings, new announcements, and so forth. This is nothing that a good RSS feed from the monitored site couldn’t provide just as well.

The potential for ChangeTracker, though, is its ability to automatically discover policy changes. For example, it discovered changes in wording about Hurricane Katrina on the “the “Additional Issues” portion of the White House web site’s “Agenda” section. On this page, text that described the Bush Administration’s response to Katrina as “unconscionable ineptitude” was deleted. See:

This particular example may not be earth-shaking, but it highlights the mutable nature of government information on the web. Posting information on a web page is not the same as publishing (instantiating) information and depositing it in FDLP libraries. The current situation endangers the accuracy and completeness of the historical record.

White House documents found to be altered

Researchers at the University of Illinois say they have found evidence on the Whitehouse Web site that suggests “a pattern of revision and removal from the public record that spans several years, from 2003 through at least 2005. Instead of issuing a series of revised lists with new dates, or maintaining an updated master list while preserving copies of the old ones, the White House removed original documents, altered them, and replaced them with backdated modifications that only appear to be originals.”

Once again, our reliance on government websites for current information fails to preserve the historical record and yields an incomplete, unverifiable, and even altered record.

We need government to instantiate information and actively deposit those instantiations outside the dot-gov realm (e.g., with FDLP libraries) to help guarantee a complete and accurate record.

NYT: Federal Files Blip Into Oblivion

This is a pretty good popular-press overview of the problems of digital preservation of government information and some of the steps being taken to address the problems.

Sample of the problems:

The Achilles’ heel of record-keeping is people.

In an effort to save money, federal agencies are publishing fewer reports on paper and posting more on the Web.

The Web site of the Environmental Protection Agency lists more than 50 “broken links” that once connected readers to documents on depletion of the ozone layer of the atmosphere.

At least 20 documents have been removed from the Web site of the United States Commission on Civil Rights. They include a draft report highly critical of the civil rights policies of the Bush administration.

93 percent of [top officials surveyed at NASA] were violating federal requirements for preserving e-mail correspondence.

“Most Web records do not warrant permanent retention,” because they do not have “long-term historical value,” the [National] Archives said.

Alarmed at the possible loss of White House e-mail messages, the House passed a bill in July that would require agencies to preserve more electronic records. … Republican opponents said the requirements would be onerous and costly. Mr. Bush has threatened to veto the bill, saying it could “interfere with a president’s ability to carry out his or her constitutional and statutory responsibilities.”

See also: Citizens in the Dark? Government Information in the Digital Age.

Can we identify or verify or prevent government website scrubbing?

One issue we at FGI are concerned about is that, when government information is not officially distributed to depository libraries and when official digital government information is available only from government-controlled web servers, then that information can (intentionally or unintentionally) be deleted or altered leaving historians, journalists, economists and other citizens with no clear, complete record of government activities.

From time to time there are stories about government websites being “scrubbed,” i.e., of information being removed from them, but it is often difficult to determine if these stories are accurate. Since stories like this are often (perhaps, usually) published to make a political point, discussion of them often revolves around the political issue rather than the issue of the integrity and permanence of government information in the larger sense.

One such story this week gives us an opportunity to at least quickly and superficially examine the existence of a problem, if not its extent:

Perr says that a flash animation and a paragraph on tax cuts, which were on the White House Jobs and Economic Growth web page (also referred to as the “Economy & Budget Policies in Focus” web page) on March 16, 2008, were removed and no longer available on March 20. The animation said:

  • “18,000 jobs created in December 2007,”
  • “Over 8.3 million new jobs created since August 2003”
  • “Unemployment rate remains low at 5%.”
  • “President Bush’s actions are moving our economy forward”

And the deleted paragraph read:

President Bush Continues To Call On Congress To Further Reduce Economic Uncertainty By Making His Tax Relief Permanent.

President Bush believes the most important action to ensure the long-term health of our economy is to make sure the tax relief that is now in place is made permanent. The 2001 and 2003 tax cuts are set to expire in less than three years. If Congress allows that to happen, 116 million taxpayers will see their taxes go up by $1,800 on average, and we will see an end to many of the measures that have helped our economy grow – including the 10 percent individual income tax bracket, reductions in the marriage penalty, the expansion of the child tax credit, and reduced rates on regular income, capital gains, and dividends.

Perr discoverd that MSN has a cached copy of that page (dated 3/8/2008) that includes the animation and text. This morning, I used WebCite to make a copy of the MSN copy. (The WebCite copy does not do a good job of retaining the layout of the original, but the Flash animation is there and viewable as is the text paragraph and should remain there even after MSN removes its cached copy.)

I checked the Internet Archive, but the most recent snapshot of www.whitehouse.gov/infocus/economy/ as of this morning is June 7, 2007. I did some Google searching and was not able to locate the Flash animation, but I was able to locate a series, of nearly identical ones:

If Google is an accurate way to judge the content of whitehouse.gov, it would appear that, the White House has maintained earlier versions of this animation but has not preserved this more recent one. But, we do not know how accurate or comprehensive or current Google is.

I also browsed the White House News releases for March 2008 page, because it appeared that similar information had migrated to various “Fact Sheets.” Indeed, the text paragraph is in the March 7, 2008 Fact Sheet: Taking Responsible Action to Keep Our Economy Growing. I was not able to find a link to the animations, however.

This brings me to the question: “Can we identify or verify or prevent government website scrubbing?” My own tentative conclusions are:

  • We cannot prevent the government from changing its own websites, so we cannot prevent “scrubbing.”
  • We can verify that a site has changed, but currently our tools are limited to a) commercial web crawlers (like google, MSN and Internet Archive, and b) individuals who regularly monitor websites, and c) web crawlers created by libraries using their own tools or those provided by others (such as Archive-it).
  • While tools exist to monitor changes in a web site (e.g., Change Detection), I don’t believe that we can use these to look for significant (e.g., loss-of-information) alterations.

What conclusions can we draw from all this? Since we do not know how commercial indexers such as Google and MSN work and what their criteria are and since they do not have preservation as a mission, we can hardly rely on them. While this particular example may be trivial in itself, it demonstrates that government information in the digital age, the “e-government” age, is volatile and fragile and that we do not have a system in place that is as reliable for digital content as the FDLP libraries were for non-digital content. While it is hard to imagine a system that would be robust enough to catch every single digital bit of government information from every agency for all time, it is possible to imagine a system that would capture much more than we do now.

That leads me to a conclusion that we at FGI have long advocated: Libraries should be building collections of digital government information and GPO should facilitate this by depositing government information in FDLP libraries. If libraries created collections that could be text-mined by scholars and researchers, it would be possible to better audit, analyze, and preserve government information and make it more difficult for information to be scrubbed without being discovered and exposed. Indeed, it would remove, to some extent, the motivation to “scrub” if it was well know that the information was preserved and easily discoverable.

The question we should be asking ourselves is: How much are we losing every day? The task is too big for any one library or any one government agency (i.e., GPO). And it is not a task that commercial entities like Google and MSN are likely to take on.