NARA will NOT harvest at end of current administration

According to a post on .govwatch (The National Archives Is Quietly Destroying Millions of Documents April 08, 2008 by Coby Logen), a recent memo at the National Archives and Records Administration says:

After considering our other records management program priorities for FY 2008, availability of harvested web content at other "archiving" sites (e.g., www.archive.org), and the resources required for conducting and preserving a government-wide web snapshot, NARA has determined that we will not conduct a web harvest or snapshot at the end of the current Administration.

Logen says that "Not capturing federal web sites now may mean losing millions of web pages authored under the Bush administration when leadership changes in January 2009."

John Wonderlich at the Sunlight Foundations comments that "The fact that digital preservation is done by others outside NARA isn't an excuse for NARA to abdicate their responsibility, but an argument that they should be capable of fulfilling it." (Digital Preservation Under Threat? by John Wonderlich on April 9, 2008)

This seems yet another example of the government saying it cannot and therefore it won't. (The NARA/TGN contract as a bad precedent). Call it the Katrina of digital preservation?

The New York Times sums up the underlying issue nicely yesterday: "In Storing 1's and 0's, the Question Is $" (By John Schwartz, New York Times, April 9, 2008). It is not a technological issue; it is an issue of funding and policy and control. (See: The Technical is Political.)

Average: 1 (1 vote)

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Web Harvest

Take a look at:

http://www.archives.gov/press/press-releases/2006/nr06-99.html

Unfortunately, it was not noted in the NARA release.

The two Executive Office web harvests or snapshots were very, very limited. At best, they went down 2-3 levels, and did not cover all web information found on Federal websites. My understanding is that these web harvests cost several hundred thousand dollars. Any further harvesting beyond the few levels covered would have in fact, cost several hundred thousand more dollars.

NARA does in fact, want agencies to schedule websites as records, and provide appropriate disposition instructions. Those that lack permanent value will be deleted and disposed of as appropriate; those that are of permanent value will be scheduled as such. In fact, if the NARA website was reviewed, I think that you might easily find web transfer instructions.

All of this is based on the quaint but efficient and economical concept of scheduling records, and transferring permanent records to NARA. I could go on about the arbitrary capriciousness of the web crawler initiative, but I won't. Leave it to say it was not well thought out, is not efficient, is not economical, and really, poorly reflects the Federal web presence.

The 2006 Congressional webcrawl was a funded Congressional mandate. An "earmark." It, too, cost a few hundred thousand dollars. For the National Archives, excluding the Electronic Records Archives system, a few hundred thousand dollars is real money. Just ask the NARA reference unit trying to provide reference services, the custodial units trying to process a few hundred thousand feet of records, the records management unit trying to schedule records, and many other units trying to fullfill a mission with a staff that has decreased considerably- perhaps as much as 40% over the past 8 years. I hear no one screaming from the rooftops about this.

Poorly written NARA Records Management Releases are, how do we say, "low hanging fruit." Reach not very high, and "voila!!", there is a rotten one. No muss, no fuss. Don't have to actually review the NARA website for information on what we might be doing about websites, do we? God forbid we actually research the Federal Budget and such.

I think I could prove that 75% of all NARA critics lacked any real basis for their criticism.

Bottom line- Yup, a few hundred thousand dollars is real money to us. And yes, a "web crawling" mission is not a priority. Sorry. North Texas has the bucks, has done a good job, and is now affiliated with GPO and NARA. Check it out.

Pretty darn good stuff for government-academic cooperation.

Citing Your Sources Would Help

Dear Anon,

I appreciate that you seem to feel slighted by what you consider to be superficial coverage of the web harvesting issue. But you don't help your case when you say stuff like:

"I think I could prove that 75% of all NARA critics lacked any real basis for their criticism."

And then follow up with unsupported statements like:

  • My understanding is that these web harvests cost several hundred thousand dollars.
  • In fact, if the NARA website was reviewed, I think that you might easily find web transfer instructions.

This statements appear to come from superior knowledge. If it does, well and good but then show us where to go. If the above information is so easy to find, why not stick the URLs in our faces? And if it isn't so easy that you could post citations supporting your assertions, maybe you should be easier on those of us who expect NARA and other agencies to be informative in their press releases.

I also understand and share your frustration at NARA budget cuts. We at FGI have written about those too:

As you become aware of new budget budget cuts at NARA, we invite to forward documentation of them to admin AT freegovinfo DOT info. We'll protect your anonymity if that's what you want.

------------------------------------
"And besides all that, what we need is a decentralized, distributed system of depositing electronic files to local libraries willing to host them." -- Daniel Cornwall, tipping his hat to Cato the Elder for the original quote.

Web Guidance

Here is the NARA Web guidance-

http://www.archives.gov/records-mgmt/policy/managing-web-records-index.h...

I'll try to come up with more exact figures for the cost of the web harvest. If there s a cite on-line (and I suspect there is), then I will post that, too.

Yes- the statement, "I think I could prove that 75% of all NARA critics lacked any real basis for their criticism," is a hypothesis, and should be proved, and/or modified. It would take a bit of work. However- I will note that the statement comes at the end of the statement, not before the two "unsupported" assertions (at least one of which is forthwith, "in your face.")

Such misrepresentation of what was said, especially when the entire paragraph is there for the world to see, suggests that the 75% figure might be valid.

NARA's responsibility

This post and subsequent comments are quite interesting and bring up lots of issues for discussion; but I hope we won't sink into invective. The facts of this case are: 1) NARA has a responsibility to preserve the historical materials of the US Government (44 U.S.C. Chapter 21); 2) Executive agency Web sites contain much of historical value; 3) NARA has made an administrative decision that in the eyes of many is short-sighted and sets a dangerous historic precedent (witness the articles cited above); 4) Web harvesting is expensive and not altogether perfect. Can we all agree on those items?

Given that, I don't understand why this criticism of NARA's decision has no basis? To me the criticism is justified. Anonymous commenter, assuming #1, #2 and #4, can you give some alternatives for how this situation could be rectified? To me the issue of Web harvesting is a straw man argument; the issue that has so many upset is about archiving of historically important administrative information. I'm all for govt-library cooperation, and maybe that's the area that NARA could explore to distribute the budgetary impact of preserving agency Web sites.

NARA's responsibility.

Before I attempt to answer the questions posed (later on, today)let me go ahead and say this morning, "yes, we are all for a great and good NARA." I apologize for the role I played in diluting the focus of what should be an honest, civil discussion.

Discussion of web harvest issues

Hi,

I've posted a longish discussion about this issue over on ArchivesNext:

http://www.archivesnext.com/?p=137

I'm supporting NARA's initial decision, so I'm braced for lots of constructive dialog from other points of view!

Kate

Clarification from NARA

We read with interest your postings on this topic.

The National Archives and Records Administration (NARA) has posted background information regarding our web harvest decision. This background document includes links to our guidance products related to web records and the decision-making process we went through to arrive at our decision.

Paul M. Wester, Jr.
Director, Modern Records Programs
National Archives and Records Administration

Post new comment

The content of this field is kept private and will not be shown publicly.
  • Lines and paragraphs break automatically.
  • Easily link to terms in various wikis. For help, see <a href="/interwiki/3">interwiki</a>.
  • Web page addresses and e-mail addresses turn into links automatically.

More information about formatting options

Syndicate content