archives
C-SPAN archives online
Submitted by jrjacobs on Tue, 2010-03-16 08:15.C-SPAN has posted their archives online. That's 23 years worth, 160,000 hours - online (almost all of their content). This is extremely cool. Get ready to waste a chunk of time today going through their archive. It should be noted that while all their programming is available, popular programs like Book TV are not embeddable (although you CAN send the link to facebook, twitter etc). Go ahead and browse the committee list for a little vicarious legislating :-)
The C-SPAN Archives records, indexes, and archives all C-SPAN programming for historical, educational, research, and archival uses. Every C-SPAN program aired since 1987, now totaling over 157,000 hours, is contained in the C-SPAN Archives and immediately accessible through the database and electronic archival systems developed and maintained by the C-SPAN Archives.
[HT to Paul Blumenthal (@PaulBlu) at Sunlight Foundation!]
- jrjacobs's blog
- 1 comment
- 39 reads
Lost Conversations, Lost Decisions, Lost History
Submitted by jajacobs on Sat, 2009-11-14 08:56.Oliver Bell summarizes the issue of new forms of government communication and the need for new ways of preserving them:
- Lost Conversations, Lost Decisions, Lost History..., by Oliver Bell, TalkStandards (November 13, 2009)
Work needs to begin on archiving standards that will retain the information that is driving decisions today and as technology plays an increasingly larger role in the business of government archiving standards needs to be a core part of systems design, not a problem that we try and solve after the fact.
One important element of this issue is Title 44 of the US Code that defines what GPO and the FDLP can handle. Its definitions limit what we can archive within those boundaries of the FDLP. But... if we had digital deposit, GPO could deposit official Title-44-approved content in FDLP digital libraries and those libraries could combine that content with non-Title-44 (Gov-2.0) content. GPO can't do this because of the limits of Title 44, but individual FDLP libraries have the flexibility to build their own collections combining Title 44 content with other content. We can do this today without changing the law. But we need digital deposit to make those collections rich and useful.
- jajacobs's blog
- Add new comment
- 966 reads
North Carolina State Archives and State Library of North Carolina’s Web
Submitted by archive on Wed, 2009-07-01 18:35.The North Carolina State Archives and the State Library of North Carolina teamed up in 2005 to create Archive-It collections that collect, preserve, and utilize the state's historic and evidential resources so that present and future residents may better understand their history. This contributes to their overall goal to safeguard the documentary and material evidence of past generations for the education of all citizens and the protection of their democratic rights. You can find the North Carolina State Archives’ portal to their Archive-it collections here.
The North Carolina State Archives and State Library specifically used Archive-It during the 2005 Archive-It pilot period to capture former Governor James Hunt’s website which they had been unable to obtain from other sources, and the site came down from the web shortly after they captured it. The Archives reports that it has gotten many requests for information from Governor Hunt’s website and being able to point folks to the website archives collection has elicited very positive feedback.
They also captured then Governor Mike Easley’s August 2003 video message to President Bush regarding the closing of textile mills in North Carolina (these mills were very important to NC’s economy). The video is no longer available online due to a change in administration, so having it archived will ensure continued access.
The Archives and State Library are now working to capture their current governor’s Facebook and Twitter accounts as a way to document elected officials use of technology to reach large communities with their message.
-Lori
- archive's blog
- Add new comment
- 1296 reads
Black History Month Resources
Submitted by laster on Mon, 2009-02-02 07:05.As a librarian working in reference services, I am always looking for resources that can capture the interest of everyone who use my library and its website. After all, what better way to build grassroots support for the availability and preservation of government information?
The Library of Congress is exploring The Quest for Black Citizenship in the Americas as its theme this year in its galleries and presentations. The website includes webcasts, photographs, and learning tools on African American history and the Civil Rights movement. One featured item is the National Park Service's Tuskegee Airmen exhibit, which may be of particular interest to those who watched the inauguration of Barack Obama.
Another resource to highlight is the Black History Month section of America.gov. This website includes articles and photo galleries on contemporary topics and defining moments in American history. There's an RSS feed for articles so you can stay updated throughout the month.
Through Federal Resources for Educational Excellence (FREE is a great discovery tool for digital collections) I was reminded of the Frederick Douglass Papers at the Library of Congress. This enormous collection, part of the American Memory project, includes a diary Douglass kept on a tour to Europe and Africa, and correspondence with prominent abolitionists and political figures.
One other fascinating resource for Black History Month is, unsurprisingly, the Federal Bureau of Investigation FOIA Reading Room. While only about one percent of the entire FBI file for Martin Luther King, Jr. is available for viewing here, the file includes some information on surveillance practices and informants. Other files available in the reading room are on Paul Robeson and his wife Eslanda, and Jackie Robinson.
I'll be back throughout the month with more on topics and tools to build interest in government information resources.
- laster's blog
- 2 comments
- 1651 reads
Lunchtime listen: Archives of dissent, food for docs thoughts!
Submitted by jrjacobs on Tue, 2008-11-18 12:10.In September, I had the good fortune to attend a most interesting panel discussion held at UC Berkeley's Free Movement Speech Cafe (which just so happens to be in the UCB's Moffitt Library!) called Archives of Dissent. The panel was part of a week-long series of Bay Area events called The Great Rehearsal commemorating the 40th anniversary of the uprisings and worldwide upheavals of 1968, their impacts and legacies. Archives of Dissent brought together librarians, curators, oral historians, conservators, publishers, academics, and others working to prevent the loss and erasure of radical voices, events and movements of both the past and the present.
The panel included:
- Lincoln Cushing (19:35), independent librarian and Docs Populi archivist. The first 10 minutes of the presentation are images from Lincoln's collection of radical posters.
- Julie Herrada (28:20), Labadie Collection Librarian, University of Michigan, curator of a “1968? special exhibit, and good radical reference buddy. The Labadie Collection is an internationally renowned archive of social protest materials.
- Kalim Smith (41:25), UC Berkeley doctoral student in anthropology and folklore, researching the preservation of Native American languages threatened with extinction.
- Megan Shaw Prelinger & Rick Prelinger (50:08), Co-founders of the appropriation-friendly Prelinger Library in San Francisco
What does this have to do with government information you say? in many aspects, govt documents collections fall within the context of cultural archives, govt documents librarians by and large have the same radical political passion about govt information as professional and lay archivists, and the myriad issues and opportunities of digitization and the transformation of physical collections discussed in terms of archives parallel (and in many respects are predated by) those same opportunities and issues of govt information collections.
What were the main themes of the panel? (I'm in full Rumsfeld mode :-) ). All of the speakers had great things to say about needing willpower to build collections -- especially those of social movements that aren't necessarily well-funded -- building archives that are situated within and expound on cultural contexts, the importance of preservation, the politicization of access, DIY archivism, information ecologies, archives as battlegrounds, etc.
The most challenging for me (and therefore the most interesting) was Kalim Smith's talk. Kalim is an Anthropology PhD student at UCB. He talked passionately about extinction, loss and erasure of native languages. He surmised that the efforts to revitalize/preserve native languages might have the effect of re-colonizing them; that writing down, or archiving those languages, takes them out of the very context in which they grew and thrived. To think about this in terms of archives and libraries, the very act of preservation outside of context in which the materials were created, is potentially damaging. That's certainly a thought bomb that has reverberated in my mind.
Please take some time to watch this panel of most engaging folks. You'll be glad you did!
- jrjacobs's blog
- 2 comments
- 1173 reads
2008 Society of American Archivists Convention: "Citizens in the Dark? Government Information in the Digital Age"
Submitted by jajacobs on Fri, 2008-08-29 08:16.These are my speaker-notes for the presentation, "Citizens in the dark?
Government Information In the Digital Age," which I gave on Friday
Aug 29, 2008, at the meeting of the Acquisitions and Appraisal Section
of the Society of American Archivists Convention in San Francisco.
The theme of the convention was "Archival R/Evolution & Identities."
This is not a transcript of what I actually said, but an outline from which
I spoke. There are sentence fragments and inconsistent capitalization and
other less-than-final-draft editing. I hope that this is useful to you in
spite of these distractions.
I do include the points I tried to make and a bit of the verbiage and all
of the links I have.
- Jim Jacobs.
---
The title of this presentation is:
Citizens in the dark? Government Information In the Digital Age
We are seeing a fundamental change in the way governments communicate
with citizens. These changes are NOT caused by technology, although
they are enabled by technologies. They are driven and determined by
economic, political, and social issues.
The solutions are therefore, not technological either, although they
will be enabled by technology. The solutions are economic, social,
and political.
Abby Smith, Director of Programs at the Council on Library and Information
Resources, in a CLIR report on "authenticity" in a digital age, summed
this up quite nicely:
Interestingly, the scholar-participants suggested that technological
solutions to the problem [of establishing the authenticity of a digital
object] will probably emerge that would obviate the need for trusted
third parties. Such solutions may include, for example, embedding
texts, documents, images, and the like with various warrants (e.g.,
time stamps, encryption, digital signatures, and watermarks). The
technologists replied with skepticism, saying that there is no
technological solution that does not itself involve the transfer of
trust to a third party. Encryption -- for example, public key
infrastructure (PKI) -- and digital signatures are simply means of
transferring risk to a trusted third party. Those technological
solutions are as weak or as strong as the trusted third party. To
devise technical solutions to what is, in their view, essentially a
social challenge is to engender an "arms race" among hackers and their
police.45
Abby Smith, "Digital Authenticity in Perspective." in "Authenticity in a
Digital Environment," Council on Library and Information Resources,
Publication 92. (May 2000).
http://www.clir.org/pubs/reports/pub92/smith.html
"Trust" is a social issue, not a technological one.
-----------------------------------------------------------------------
As we look at the technological changes, the way governments are using
and not using, adopting and avoiding, and in general coping with these
technological changes, i think we all see trends.
The agenda of this conference reflects these trends and changes.
with its theme of Revolution and Evolution, with sessions on
everything from
- digital repositories
- born digital materials,
- digitization
- digital manuscripts
- e-mail
- e-records
- e-discovery
- the "e-tiger"
And of course, representatives from NARA, LC, and GPO are here to discuss
their projects
I assume that all of us are familiar at least in a general way with the
many of the difficulties of digital archiving. things like:
- format obsolescence
- media deterioration
- content that is tied to a particular operating system or application
- the need for new kinds of metadata
and
- emulation and migration strategies.
So, I will not cover those today.
What I do want to do is to give you a (perhaps) slightly different
perspective and some (possibly) different ideas and approaches to these
challenges and bring up some issues that i believe do not have enough
attention yet.
THE PAST
-----------------------------------------------------------------------
1. In the past, government information archiving was straightforward
a) We knew and could fairly easily define and identify records
b) We could (again, in a fairly straightforward way) identify
responsibility for record creation, scheduling, retention, deposit,
preservation, access, etc.
c) We could establish procedures to get things done. predictable,
definable, etc.
So... in the past, we had a pretty clear path of preservation:
- of what we wanted to preserve and
- of how to preserve it and
- of who was responsible at each stage from record creation through
retention and disposition and preservation.
We could define and identify what we wanted to preserve and seek and
possibly fund the preservation.
We may not have always been 100% effective, there may have been failures,
gaps, short-funding, recalcitrant agencies, mistakes, etc. but we at least
knew what we were doing and where the gaps were...
THE PRESENT
-----------------------------------------------------------------------
A lot has changed, perhaps everything. Here are four areas
of fundamental change that affect our ability to archive the
complete historical record of governments:
1) WHAT. While to some extent we can still define and identify records,
the job of doing so is much less clear. There may be some things that
we cannot get a hold on to define as records. there may be things that
are part of the record which the govt does not even possess. Or for
which it lacks licensing or copyright permission to possess or copy.
2) WHO. Even to the extent that we can identify (broadly) what we want
to preserve, it may be hard to identify who is responsible and
difficult to create adequate, implementable, schedules for
preservation.
3) HOW. Even if we can do all that, digital preservation itself is
difficult and it is very hard to move from a quick-moving,
service-oriented, bureaucratic, day-to-day, digital environment, to an
environment of digital preservation.
4) ACCESS. While preservation without access is not preservation at all,
"access" is a very different process than preservation.
It seems to me that the very processes that make it *easier* for a
current end-user to find and use digital information make it *harder*
for the archivist to preserve that same information and ensure its
usability in the future.
EXAMPLES
-----------------------------------------------------------------------
Let's look at some examples
EMAIL (1)
------------------------------------------------------------------------
E-mail certainly provides good examples of the "recalcitrant agency" problem.
But I want to emphasize some other issues that will plague us even if we
solve that one.
An article in Technology Review gave several good examples.
One related a story that Allen Weinstein tells about how he discovered
in his FBI files a newspaper clipping with a note hand-written on it by
J. Edgar Hoover.
If that same communication happened today, it would most likely happen
in an email with, perhaps an attachment of the article, or worse, a
link to the article.
Even if we had in place all the new laws and regulations that are being proposed
to ensure that we can actually save email, would we have complete record? Or
would we have a partial record with a key part missing. And would be able
to find or identify that part? Would we be permitted to archive it?
Talbot, David. "The Fading Memory of the State." Technology Review, July
2005.
http://www.technologyreview.com/printer_friendly_article.aspx?id=14583&c....
EMAIL (2)
------------------------------------------------------------------------
Another problem with email is the difficulty in knowing what to preserve.
the simplest algorithm for preservation of email is to preserve everything, but
that means preserving so many trivial, unimportant messages that would not
normally be scheduled for retention in any rational universe.
RECORD OF INFORMATION USED IN DECISION MAKING
------------------------------------------------------------------------
Another example from that same Technology Review article:
The mistaken bombing in 1999 of the Chinese embassy in Belgrade.
U.S. officials blamed the error on outdated maps used in targeting.
Today's planners would use GIS software to zoom and pan, and run
calculations about the topography to make a targeting decision.
Would the software preserve the decision making process?
There are layers of challenges here:
- the data used (spatial data, databases of locations, topography, etc.)
- the software used to analyze and use the spatial data
- the code behind that software that has its own algorithms for implementing
particular user-analyses
- the actual use by the end-users, the trail of how they used the software
to analyze the data
These are difficult things to archive!
When decisions are based on computer models working on dynamic databases,
will we be able to preserve for future historians the state of the database
and the algorithms built into the models?
PUBLIC DOCUMENTS
------------------------------------------------------------------------
When we think about the preservation of the historical record, we have to
include public documents as well as private communications and decision-making
records.
In the past, "public documents" meant "publications" that were widely distributed
to the public and depository libraries.
Today, it means web sites.
As you probably know the Library of Congress recently announced a big
project with several partners to crawl the .gov domain at the end of the
current presidential administration. This will harvest a lot of digital
content that might otherwise disappear with the change of administrations.
http://www.loc.gov/today/pr/2008/08-139.html
An article about this project discusses the software being used and some
of the issues.
Quint, Barbara. "Consortium--Minus NARA--Archiving Bush Administration
Websites." Information Today NewsBreaks, August 28, 2008.
http://newsbreaks.infotoday.com/nbReader.asp?ArticleId=50486.
See also a discussion about NARA's role at the ArchivesNext blog:
http://www.archivesnext.com/?p=137
And some more on NARA here:
http://freegovinfo.info/taxonomy/term/189
Web-harvesting is not a definitive solution, though,
When we compare web harvesting with active deposit by the government of
documents in depository libraries we can get a glimpse at the scope of
the preservation problem we face.
Web harvesting puts the onus on the harvesters.
It releases the government from the obligation of actively depositing information.
It is a step back in time. It means that archivists and librarians have
less control on their own selection and acquisition and that agencies have
less responsibility.
WHERE IS GOVINFO? (ON THE WEB...?)
------------------------------------------------------------------------
A study by the Center for Democracy and Technology late last year
examined
"Why Important Government Information Cannot Be Found Through
Commercial Search Engines"
http://www.cdt.org/righttoknow/search/
The reasons for the failure of search engines to adequately index
government web sites are the same reasons that make it difficult for
web harvesting to be successful. If we can't find the information,
we cannot harvest it.
We cannot preserve what we cannot save.
BEYOND .GOV AND .MIL
------------------------------------------------------------------------
One problem (and a rapidly growing one) is that not all government information is
on the .GOV and .MIL domains.
Here are some examples:
- TWITTER.
twitter.com is the very popular "micro blogging" site, where people post
very short entries about what they are doing, where they are having lunch,
and so forth. did you know that many government agencies twitter?
among them:
- the white house Communications Office
- the Department of Health & Human Services: Office on Women's Health
- and more than 60 others.
- 20 or 30 members of congress
http://twitter.pbwiki.com/USGovernment
http://www.sourcewatch.org/index.php?title=Members_of_Congress_who_Twitt...
- YOUTUBE
The military is actively using YouTube to post videos
http://articles.latimes.com/2007/may/01/world/fg-cyberwar1
U.S. military offers up its side of the Iraq war on YouTube
la times By Alexandra Zavis May 01, 2007 in print edition A-4
One military youtube channel says that:
"Video clips document action as it appeared to personnel on the ground and
in the air as it was shot."
http://www.youtube.com/profile?user=MNFIRAQ
- NASA posts videos on YouTube and iTunes
- NOT JUST FEDERAL...
I'm mostly giving examples of the federal government today, but the trends
stretch across all levels of government.
for example:
the PRINCE WILLIAM COUNTY SERVICE AUTHORITY IN VIRGINIA is posting
videos on YouTube.
http://www.fcw.com/print/22_25/technology/153418-1.html?type=pf
- the STATE OF CALIFORNIA has a youtube channel
http://www.youtube.com/californiagovernment
and GOVERNOR SCHWARZENEGGER posts to twitter.
http://twitter.com/schwarzenegger
HOUSE MEMBERS
- According to GovTech magazine, more than 100 House members have
multimedia pages and YouTube links on their Web sites
http://www.govtech.com/gt/241670
FLICKR / LC
- FLICKR. I'm sure you read about the success the library of congress
has had by posting photos on flickr.com
http://www.flickr.com/photos/library_of_congress/
QIK.com
qik.com allows you to stream video live from their cell phones.
Congressman John Culberson of Texas is a big fan and has his own qik
channel where he streams and posts interviews, meetings and more.
http://qik.com/johnculberson
Is this official government information? or political? or both?
While these examples may strike you as "not official" or "non-governmental"
the point here is that the environment for distribution of information is
changing rapidly and we must keep up with the changes. If Culbertson's qik
site is not "official," for example, we need a way of appraising it as such
and a way of differentiating it from the next channel that appears that
that *is* official.
HYBRID SITES WITH MIXED MESSAGES
------------------------------------------------------------------------
The military is providing us with lots of examples of archiving problems.
These issues of provenance, use-rights, copyright, and just plain finding
and getting information.
This is an extension of what I call the "copyright poison-pill" in which
copyrighted material appears in an otherwise non-copyrighted government
publication and creates confusion over the rights of libraries and archives
to save, reproduce, and display any or all of such materials. We see this
today in the way the Google book project has blocked access to most
government publications because they "might" be covered by copyright.
DEPARTMENT OF DEFENSE
------------------------------------------------------------------------
The the DOD "official website of Multi-National Force - Iraq" is a .com site,
not a .mil.
There we find
- links to other commercial sites with streaming video without download
links
- web links designed to be clever (with javascript and hidden urls)
but which add an additional level of difficulty in identifying and
bookmarking links and downloading pages.
Another DOD site that provides video clips is .mil but a lot of the content
is actually hosted by .coms
DODvClips.mil
- While this is a .mil domain, it is actually operated by the Intel
Corporation and is hosted and maintained by a commercial organization
known as The FeedRoom or Globix Corporation
- While you can download video, you are bound by an END-USER LICENSE
AGREEMENT, in which Intel claims all proprietary rights to the content
and videos on the site.
- Those who try to harvest the content from this site will find
that the site instructs robots that should must not save copies of
videos or even web pages.
HORMUZ
------------------------------------------------------------------------
Here is another example of DOD video problems.
In January of 2008, the Pentagon broadcast a video of the "straights of
Hormuz incident" in which an unidentified voice says, apparently to a US
battleship "You will ... explode."
It was much in the news. (I found more than 1000 items on LexisNexis over
about a 4 week period.)
In January, two defense department web sites linked to a video of the
incident and one labeled it as "From Defense Department Video." One
of those pages still exists.
http://www.defenselink.mil/transcripts/transcript.aspx?transcriptid=4116
But by June 2008, that url linked to a chef doing a promo for his show
called "Grill Seargent" and searches for "hormuz" turned up zero hits.
Last week, when I checked again, the link I got was to a 15 second ad for
"the pentagon channel" that said:
"Embrace accountability for all that you do -- for everything in your
area of responsibility."
A shorter version of the Hormuz video, complete with "you will explode" quote
is available here:
http://www.defenselink.mil/dodcmsshare/briefingslide%5C320%5C080107-D-65...
Background information including why it is hard to know what goes missing here:
Documenting the Government -- Strait of Hormuz edition
http://freegovinfo.info/node/1567
But, this is more than the tale of a broken link.
- The link was not to a .mil site, but to a commercial site (FEEDROOM again).
- The video was provided only as a streaming video and no download was
available.
So, here we have a critical piece of the historical record, with no
indication of who filmed it or edited it or posted it or took it down.
And we have no easy way to preserve this video and no guarantee that any
one will or can taken the responsibility for doing so.
WHAT MAKES A WEBSITE OFFICIAL?
------------------------------------------------------------------------
How do we determine what makes a website official?
One document I found is explicit, but vague. It says that
an "official website" includes any website hosted on the .mil domain,
but also "any website PUBLISHED or SPONSORED by a military comand but
hosted on a commercial server."
http://www.mnf-iraq.com/images/stories/For_The_Troops/bloggers_policy.pd...
Unfortunately, this creates a cascade of problems.
- Will archivists overlook these sites because they are not .mil or
.gov sites?
- Upon finding them, can we identify who is actually responsible for
the content? (were they Published or Sponsored by the government?)
- If we find the site and identify it as something that is
government-generated, are we allowed to archive it?
STANDARDS BY ANY OTHER NAME
------------------------------------------------------------------------
One of the biggest problems digital archivists face is that of file
formats. When formats are tied to particular software or operating
systems or operating environments, it creates barriers to preservation.
"Standards" that work well for the end-user (and the service provider)
one year may be exactly the wrong standard for the archivist.
We can see an example of user-friendly, archive-unfriendly at the EPA.
EPA
------------------------------------------------------------------------
The EPA has a nice site that has videos, audios, podcasts, and more.
But they have chosen the "Flash based" video format as a "standard"
this is indeed a common format for streaming video, but adds additional
layers of difficulty to anyone wanting to preserve the videos by
downloading them.
http://www.epa.gov/multimedia/
Feds set sights on small screen
By Wade-Hahn Chan FCW August 11, 2008
http://www.fcw.com/print/22_25/technology/153418-1.html?type=pf
DISAPPEARING PUBLICATIONS
------------------------------------------------------------------------
There have not been any substantial, comprehensive studies of what gets
withdrawn from the web by government agencies.
(See "Chronology of Disappearing Government Information" Data collected
through May 8, 2002, Compiled by Barbara Miller for ALA/GODORT
Education Committee With special assistance of Karrie Peterson, for an
example of one attempt.
http://www.library.okstate.edu/Govdocs/chronchart.doc )
We are left with anecdotes about things disappearing or being
withdrawn and random discoveries of something here today and gone
tomorrow.
Anyone who works with government agencies for very long will encounter,
as I have over the years, as many "policies" as their are individuals
who administer those policies.
So we sometimes see agencies that are very careful about keeping older
documents online and others that express that opinion that "No one wants
last year's (or last month's) report.
Here is a recent example:
------------------------------------------------------------------------
AT: http://www.mnf-iraq.com/
We find links to a issue 6 of a "newspaper" but no links or indication of
earlier issues being available.
http://www.mnf-iraq.com/images/Unit_Newsletters/080826_aam_al-binaa_engl...
E-GOV
------------------------------------------------------------------------
I have left for last the concept of "E-government" -- not because it is
less important, but because it is emerging and something to watch.
E-government is intended to transform the way government communicates with
citizens and business and itself.
To the extent that it creates communications that are faster, more
accurate, and more convenient, it is a Good Thing.
But, for us, it, again, fundamentally transforms the role of government.
In the past, the role of government in information dissemination ended at
the point of dissemination. Governments would collect and create and
assemble and edit and publish information products and distribute them to
the public and to libraries.
But today, the government is taking on a new, continuing role.
With e-government, governments are saying we must go them to get our
information today, and tomorrow, and forever.
As governments move to e-government, we are going to increasingly see
government information provided as "transactions" as opposed to
"instantiations."
Here is a simple example:
I can call 411 and get a phone number: that's a transaction and is a
big improvement over having to locate and use a bulky telephone book
which may not even be current.
Lots of kinds of government information lend themselves to this kind of
transaction delivery and make for better, more accurate, more timely
service.
But, if I am a journalist and I want to look at a directory of all
employees in a department, or if I'm an historian and want to see who
was in a particular office last year (or 10 or 50 years ago), or if I'm
a demographer and I want to do an surname or given-name analysis of an
agency's employees, then a current, up-to-date
one-transaction-at-a-time system won't help me at all. I need an
instantiation of the information from one or more time periods.
THE CENSUS
------------------------------------------------------------------------
Let me give you a concrete example: The Census
Every 10 years the federal government takes a population and housing census.
Through the government's American FactFinder web site, the Census bureau
delivers a transaction-based service where you can find census facts and
tables.
http://factfinder.census.gov/
But, in addition, the Bureau makes the raw, anonymized census data
available for downloading and has deposited the data in the largest social
science data archive in the U.S (ICPSR at the university of michigan).
http://www.icpsr.umich.edu/cocoon/ICPSR/SERIES/00166.xml
What this means for us is that we can preserve the census. There is an
instantiation of the census in a format that we can preserve over time.
That instantiation is what is behind American FactFinder, but it is a
preservable form of that information.
This means that, we can preserve the data
- without crawling a web site
- even if the census bureau budget is cut and it takes data offline
it also means that the raw data are available for uses and re-uses
beyond the transactions that the bureau makes available.
This is a model for making government information available and preservable
and usable and re-usable for the long-term.
It is important to note that this model benefits users today, not just in
the future. Transaction-interfaces offer a limited number of possible uses
of the underlying information. When the raw data are available, users
can analyze use, and re-use the data in many ways not provided by the
transaction-interface.
Clifford Lynch has written eloquently about the need that scholars
have to get access to the raw information in the realm of scholarly
literature (Clifford A. Lynch, "Open Computation: Beyond
Human-Reader-Centric Views of Scholarly Literatures," Open Access: Key
Strategic, Technical and Economic Aspects, Neil Jacobs Ed., Oxford: Chandos
Publishing, 2006, pp. 185-193.).
http://www.cni.org/staff/cliffpubs/OpenComputation.htm
Governments may not like the idea of doing this, though. They may want to
keep control and may want to do so under the vise of "accuracy." (E.g.,
"Last year's phone book isn't accurate anymore. We don't want copies of it
out in the world confusing people.") Indeed, we hear that very argument
from some who still argue that the people should not have free open access
to Congressional Research Service reports. Local governments in particular
may also see information as an "asset" and wish to charge for access or
use of it.
And the private sector may not like the idea of raw information being
freely distributed because they want to control access so they can charge
for it. (Indeed we see something like that with CRS reports!)
It may be a challenge to get governments to understand this concept and,
once they do, to embrace distribution.
WHAT CAN WE DO?
------------------------------------------------------------------------
There is no single solution. And we should not expect any single entity or
agency or archive to "solve" the problems.
We need a multifaceted approach to preserving the historical record.
Here are some general approaches that I hope will guide you in your
local environments.
1) Do you have influence over the creation of information? Then make
sure that the information is created with preservation in mind. Talk
to creators about providing an instantiation of information in addition
to transaction-based access. Advocate free and open access. Insist on
open formats (e.g., ODF http://opendocument.xml.org/) rather than
proprietary formats.
2) Identify your partners in your organization.
- IT depts. They may have tools that will help you do your job. They may
be able to do things differently that would enable preservation, but they
haven't thought of them.
- Managers who want information access in the near term. Managers may not
think of long-term access and usability, but they usually do understand the
benefits of having their own information usable in the near-term (1 to 5 years).
If you can *guarantee* something will be usable in 5 years, you can probably
guarantee that you are going to be able to preserve it for longer periods.
3) Identify other partners
- The Internet Archive is doing a lot right now to preserve information
on the web and you can work with them to have them do preservation for you.
http://www.archive.org/index.php
http://www.archive.org/create/
http://www.archive-it.org/
- Look for others locally and regionally with whom you can collaborate.
Universities may want to collaborate with governments and vice-versa, for
example.
4) Are you a partner?
Even if you are in an archive that has clearly no responsibility for
preservation of (say) the records of an agency, you may be in a
position, because of your own archival mandates (you have personal
records of a government official, soldier, elected offical) or because
of your constituency (users at a university who need the complete record for
historical analysis), you may have the opportunity (and obligation) to
collect information that is relevant to and even part of the complete
historical record.
The library model of having many copies dispersed over many institutions
has worked well for preserving and authenticating published materials, and
it may work in the archival environment as well when we are no longer tied
to a single copy of record. Software already exists to help with this:
Lots of Copies Keep Stuff Safe
http://www.lockss.org/lockss/Home
http://www.clockss.org/clockss/Home
http://lockss-docs.stanford.edu/
i want to close with a quote from that same Technology Review article
that I quoted earlier.
In it, computer scientist Robert F. Sproull of Sun Microsystems
Laboratories, who chaired a a National Academy of Sciences panel that
advised NARA, said:
"If you become obsessed with getting the technical solution, you will
never build an archive."
The challenges we face are as much political, sociological, and
economic, as technological.
- Add new comment
- 2605 reads
A discussion by archivists of long-term preservation of digital government information
Submitted by jajacobs on Sun, 2008-04-20 12:35.There is an excellent post relevant to government information over at ArchivesNext. I recommend this highly.
- NARA and the web harvest: a discussion of the issues, ArchivesNext, April 14, 2008.
Kate does an excellent job, in my opinion, of analyzing the NARA decision to not do another web harvest of agency web sites at the end of the current administration. For example, she says, "For archivists, these web harvests should be troubling because they dispense with the process of appraisal. In effect, anything on the top four levels of an agency’s web site was determined to be of permanent value." Kate also includes links to articles about the issue and the NARA response.
It has excellent and informative comments that include, but go beyond, the specific issue of NARA and web harvesting. I found these comments particularly useful because they are mostly from the perspective of archivists and give insight into long-term preservation issues. Some of those making comments are well known in archival circles and speak from experience and with authority. Christine says that "it is very difficult to do item-level appraisal of web files, because the pages are usually so interconnected."
Of the original blog posting at .govwatch that started off the controversy and its claim that NARA is "Quietly Destroying Millions of Documents," Thomas E. Brown says "Nothing could be father the truth" and backs up what he says with facts.
Maarja discusses information gaps created when dynamic records are overwritten and not preserved. I found this comment by Maarja particularly interesting:
Depending on the agency, decisions on how best to share information might have been driven initially by technological factors more so than long term capture of knowledge. From reading records managers’ forums, I gather that in some agencies IT more so than RM may have driven adoption of solutions for dealing with electronic records.
No two organizations are going to have exactly the same culture and organizational climate. So it’s hard to predict how preservation of electronic information is going to play out throughout the government.
- jajacobs's blog
- Add new comment
- 1788 reads
5.2 Million 19th Century Passenger Arrival Records Now Online at NARA
Submitted by blakeley on Wed, 2008-03-12 19:55.The National Archives and Records Administration (NARA) announced the online availability of over 5.2 million records of passengers who arrived at the ports of Baltimore, Boston, New Orleans, New York, and Philadelphia in the 19th century. These records were transcribed from original ship manifests into databases by Temple University's Center for Immigration Research and donated to NARA.
Intrigued, I went to NARA's Access to Archival Databases (AAD) and searched "Records for Passengers Who Arrived at the Port of New York During the Irish Famine" between 1846-1851 (over 607,800 records!), and I found several of my Troy clan ancestors that arrived in 1851. I'll have to compare the names with the extensive family tree that my grandfather made. If he was alive today, he'd be searching this database for hours!
Other record sets include: Data Files Relating to the Immigration of Germans to the United States, 1850-1897; Data Files Relating to the Immigration of Italians to the United States, 1855-1900; and Data Files Relating to the Immigration of Russians to the United States, 1834-1897.
- blakeley's blog
- Add new comment
- 1416 reads
New book on web archiving
Submitted by jajacobs on Sat, 2006-12-02 11:14.Web Archiving, Masanès, Julien (Ed.) 2006, VII, 234 p., 28 illus., Hardcover
Julien Masanès, Director of the European Archive, has assembled contributions from computer scientists and librarians that altogether encompass the complete range of tools, tasks and processes needed to successfully preserve the cultural heritage of the Web. This book serves as a standard introduction for everyone involved in keeping alive the immense amount of online information, and it covers issues related to building, using and preserving Web archives both from the computer scientist and librarian viewpoints.
Practitioners will find in this book a state-of-the-art overview of methods, tools and standards they need for their activities. Researchers as well as advanced students in computer science will use it as an introduction to this new field with a hopefully stimulating review of open issues where future work is needed.
Some of the chapters:
- Selection for Web Archives
- Copying Websites
- Archiving the Hidden Web
- Access and Finding Aids
- Mining Web Collections
- The Long-Term Preservation of Web Content
- Small Scale Academic Web Archiving
- jajacobs's blog
- Add new comment
- 1479 reads
Please donate to the Pacifica Radio Archives
Submitted by jrjacobs on Wed, 2006-11-29 10:46.I was listening to KPFA, my local pacifica station this morning, and they were having a fund drive for the Pacifica Radio Archives. The archives contain the voices of 20th century history: Studs Terkel, Mahalia Jackson, James Baldwin, Gore Vidal, Alice Walker, Martin Luther King, jr., Rosa Parks, bell hooks, and *many* more voices of peace and justice! The fund drive is needed to support their preservation project (and other projects) which seeks to digitize many of the old tapes in the archive in order to make them more accessible and longer-living. Or think about a donation in someone else's name for the holidays -- really, how many ties does your dad really need?!
Please, please, PLEASE donate to the pacifica archives if you can!
[Please note: I am not on the Pacifica board and FGI has no affiliation whatsoever with Pacifica. I just think that librarians and those interested in preserving history should support valuable preservation projects like this one.]
- jrjacobs's blog
- Add new comment
- 1304 reads


Recent comments
30 min 34 sec ago
1 week 1 day ago
2 weeks 4 days ago
2 weeks 4 days ago
2 weeks 5 days ago
4 weeks 2 min ago
4 weeks 22 hours ago
4 weeks 3 days ago
4 weeks 3 days ago
4 weeks 3 days ago