Home » post » NARA looks to privatizing 1940 Census

NARA looks to privatizing 1940 Census

The National Archives is apparently preparing to reverse a long standing policy of providing free public access to Census Schedules when it releases the 1940 Census next year. (See “Update” below for additional information.)

Early next year the National Archives and Records Administration (NARA) can make the 1940 Census Schedules available to the public for the first time. (See “Background” below.) NARA has digitized these files and created metadata for them in preparation for making this valuable trove of information accessible on the web. It now only remains for NARA to decide who should provide access at what cost to users. Should NARA provide free access? Or should it contract this service out to a private company that will imposes fees on users and make a profit by providing access to this public information?

The answer should be obvious. For decades NARA has provided free access to Census Schedules at regional archive facilities and has sold microfilm to libraries that provide free access to their users. Now that online digital access is possible, NARA can provide better access online without having to maintain physical access at its regional facilities. It can distribute digital files to libraries for little or no cost so that libraries could further increase access and functionality for all of the information or subsets of it.

It seems, however, that NARA is choosing privatization instead of free public access. In an eight page RFI (Services Request for Information (RFI) NAMA-11-RFI-0004, 1940 Census [Microsoft Word .docx] or see the PDF version] NARA is seeking “industry input” for a “no-cost contract” to provide managed hosting and online access to the 1940 Census. The RFI is intended to explore options and “may or may not lead to a solicitation” for an actual contract. This means that NARA could, apparently, make a decision to do this work itself, but it is exploring the privatization route first. (Presumably, libraries could respond to the no-cost RFI as well. Responses were due on June 22.)

According to NextGov, the “no-cost contract” means that “the vendor would do the work for free and then charge the public a fee to access the records.” (Archives Wants to Put 1940 Census Online, by Joseph Marks, NextGov TechInsider, July 15, 2011

The RFI does not explain why NARA is pursuing this path or what advantages it sees to privatization. I would guess the most likely reason is that NARA does not anticipate that it can get adequate funding to host the data online itself. But has it asked for funding? Has it made the case for continuing its historic provision of free public access to Census Schedules? Has it justified privatizing public information?

We have seen NARA follow this path before. NARA partnered with footnote.com to digitize selected holdings. This resulted in access restrictions including membership fees, per-page charges for downloading, and age restrictions to these digitized public documents. NARA partnered with Ancestry.com to make public records available for a fee. At the time, a spokesperson said that budget constraints and other priorities kept the Archive from making this information available itself.

“In a perfect world, we would do all this ourselves and it would be up there for free,” she said. “While we continue to work to make our materials accessible as widely as possible, we can’t do everything.” — Ancestry.com unveiled more than 90 million U.S. war records, New York Times (May 24, 2007).

In 2008, NARA contracted with TGN to digitize and provide access to some of NARA’s holdings. The contract restricted free public access for five years. We’ve written about this here at FGI before (The NARA/TGN contract as a bad precedent) and believe that deals like this remain bad for NARA and bad for the country. We believe these kinds of deals set a bad precedent — a precedent that is now being unnecessarily followed with the 1940 Census.

Those past deals were different from this one in one key way, however. They involved digitization of materials by the private contractors. In the case of the 1940 Census, the materials are already digitized, according to the RFI. The arguments we heard in the past were that digital access was so much better that it was worth privatizing access in order to get the digitization done. Without privatization, it was argued (even by some librarians and archivists), the materials could not be digitized and we’d be stuck with analog access. This is not the case for the 1940 Census since the materials are already digitized. The decision for online digital access has been made. The only question now is whether to make the existing digital files freely available or available through privatization.

Of course, the cost of any project providing access to all the 1940 Census Schedules and maps will not be insignificant. According to the RFI, NARA has created 3.8 million JPEG images, comprised of 20 terabytes of data.

Twenty terabytes is a lot of data, but it is fast becoming an almost modest size for a digital library. For comparison, the HathiTrust has over 3 billion pages and over 400 terabytes of data, OCLC has an over 600 Terabyte capacity, the Wayback machine contains 100 terabytes, the Library of Congress web archive is 235 terabytes, the University of California Curation Center has 70 TB in its Merritt digital preservation repository, and NARA’s own Electronic Records Archives (ERA) has more than 90 terabytes. These terabyte-scale digital libraries are virtually the new norm and petabyte-scale digital libraries are already being built. Some of these are the Shoah Foundation Institute’s digital library (8 petabytes), the Stanford Digital Repository anticipating a capacity of petabytes, and the Digital Hammurabi Project which is building a petabyte-scale digital library and museum of virtual 3D cuneiform tablets.

But the cost of providing access to microfilm at 13 regional offices was not cheap either. To me the question is whether or not the government is willing to continue its historic mission of providing free access or if it is ready to abandon that mission to the private sector.

There have always been those who argue that the fee-based private sector should take precedence over the public sector free-access. But, privatization of access to Census schedules would represent a reversal of long-standing policy. What was once an unquestioned government function is now, apparently, being considered a commercial function. Where, in the past, it was the government that provided free, public access to census schedules, now, when access can be improved, the government is abrogating its role and turning access over to private companies that will provide the information for a fee. The issues are not new. The precedents for providing free public access exist and have a long and respectable history. The only thing new is that NARA seems to have accepted privatization as inevitable.

Will there be funding for NARA to provide access to 1940 Census Schedules? There may not be. We have argued here at FGI for years that relying solely on Congressional funding for permanent, free public access to government information is risky because there is always the chance that Congress will not fund it. In these highly-politicized, economically troubled times it is easier to imagine a lack of any funding that to imagine adequate funding for the long term.

But this does not mean that privatization is the only option. There are precedents for government projects that are supported by donations and public-private partnerships. The American Memory Project is one notable example. And individual libraries or groups of libraries could step in and offer to provide free public access.

Now is the time for NARA, supported by researchers, libraries, and archivists to actively promote and pursue free public access solutions. There is no reason to accept privatization as inevitable.

Update: Note that the NARA website 1940 Census page says that “The digital images will be accessible at NARA facilities nationwide through our public access computers as well as on personal computers via the internet.” Additionally, a comment on the Ancestry World web site said: “NARA will make the digitized copies of the 1940 Census population schedules available to the public, free of charge, on April 2, 2012 through our new Online Public Access search (http://www.archives.gov/research/search/).”

It is not clear from the above if the policy on free public access has changed with issuance of the RFI or not.

Background:
The Census Bureau conducts the Decennial Census every ten years. The Bureau summarizes its findings in reports that contain no information on individuals. The raw information collected, including names and addresses of those surveyed (sometimes called the “manuscript census” or the “census schedules”), is protected by law and is kept confidential for 72 years. After 72 years, that raw information is released by the National Archives and Records Administration. This information is invaluable to genealogists and other researchers. Typically, this raw information has been made available on microfilm at regional National Archives offices (Availability of Census Records About Individuals). This 72 year period for the 1940 Census expires on April 2, 2012.

Print Friendly

Creative Commons License
This work, unless otherwise expressly stated, is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 United States License.


6 Comments

  1. jrjacobs says:

    Thanks Amy West for tweeting/pointing out about the following comment in January, 2011 by Rebecca Warlow of NARA’s Digital Strategies and Services Staff and a member of NARA’s 1940 Census Working Group.

    I hope I can clarify NARA’s plans for the 1940 Census release for you.

    The sentence from Federal News Radio story, referred to in the blog post above, was missing a few key words. NARA will make the digitized copies of the 1940 Census population schedules available to the public, free of charge, on April 2, 2012 through our new Online Public Access search (http://www.archives.gov/research/search/) .

  2. jajacobs says:

    yes. i found that too and added it as an update earlier today. the statement is from january and i didn’t find anything else that said that on the nara site. i’d love to hear that i’ve missed something, but meantime i’m still wondering if the policy has changed since january…

    jim

  3. Anonymous says:

    Just a note, but the Wayback Machine is much larger than stated here. More in the neighborhood of 2+ petabytes than 100 terabytes. And of course that doesn’t count the other multiple petabytes of texts, video, and audio at archive.org (including census records here: http://www.archive.org/details/us_census).

  4. jajacobs says:

    thanks for this. i, too, saw different counts and wasn’t sure which was which and which was accurate and which things were being counted (“wayback” vs. “internet archive” vs. archive-it vs. everything?). clearly there is a lot there!

  5. Anonymous says:

    Why is archive.org being allowed to privatize government records? Are the Census records they sell access to online the same individual records from the Census that are available at the NARA centers?

  6. jrjacobs says:

    FYI Archive.org is the url for the Internet Archive. While Internet Archive *does* host government documents — including census schedules previous to 1940 — it is the National Archives (NARA) — nara.gov — that is looking to private organizations to help NARA with the overwhelming interest when the 1940 census schedules are released. Ancestry.com has this week announced that they’re working with NARA to provide access — for free until 2013 only. I’m hopeful that other non-profit organizations will also step in to provide permanent free access and that NARA will eventually be able to host those data as well.

    And, yes, the census schedules are those available at NARA regional centers as well as some libraries. The schedules are the individual respondent’s answers to the questions asked. Under U.S. privacy laws, seventy-two years must elapse before census data on individual persons can be released, and thirty years before census data on individual businesses can be released. The most recent schedule data on individuals available for the U.S. is the individual responses to the 1930 Decennial Population Census, released in 2002.

Post a Comment

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Archives

%d bloggers like this: