The Canadian government's Library and Archives Canada (LAC) announced more details of its digitization project. In a "digitization partnership" with Canadiana.org, a not-for-profit charitable organization, there will be a large scale digitization project that will involve about 60 million images from numerous collections, including the indexing and description of millions of personal, administrative and government documents, as well as land grants, war diaries and photographs and the transcription of millions of handwritten pages. This is a "10-year agreement."
- Library and Archives Canada and Canadiana.org partner on digitization, online publication of millions of images from archival microfilm collection. Library and Archives Canada (2013-08-29).
The announcement says that Canadians will have "access" regardless of where they live, at no charge.
We have seen this happen before in the U.S. (See, for example: The NARA/TGN contract as a bad precedent and GAO *did* sell exclusive access to legislative history to Thomson West) and Canada (Help save the Library & Archives Canada), but this seems like a particularly bad, unjustifiable example of privatization of public information.
- Library and Archives Canada private deal would take millions of documents out of public domain, By Chris Cobb, OTTAWA CITIZEN (June 12, 2013).
Library and Archives Canada has entered a hush-hush deal with a private high-tech consortium that would hand over exclusive rights to publicly owned books and artifacts for 10 years.
...LAC is partnering with Canadiana.org in what is being billed as The Heritage Project -- digitizing 40 million images from more than 800 collections of publicly-held LAC material, much bought by Library and Archives over the years with taxpayers' money.
...Under the agreement, digital images will begin rolling back into the free public domain -- known as "open access" -- as the 10-year exclusive rights expire.
Hat tip to InfoDocket!
We've been tracking this story since this spring when the Depository Services Program of Canada (DSP) announced that, by 2014, it would, “no longer be producing, printing, or warehousing hard copies of publications.” Well it's much more than no longer printing govt publications. As BoingBoing notes:
Canada's national archives are in trouble: they've undergone a $9.6M cut, with more to come. The collections are being sold off to private collectors, many outside of the country. Now the Documentary Organization of Canada has weighed in: "Lisa Fitzgibbons, Executive Director of the Documentary Organization of Canada (DOC), succinctly states a case for continuance of sustainable funding of Library and Archives Canada."
Please go to Save Library & Archives Canada (hosted by the Canadian Association of University Teachers!) to learn more about the issues and take action to save the Library & Archives.
What happens when federal agencies rely upon standards developed by standard-setting bodies and communities of practice and incorporate those standards into federal rules? In many cases agencies refer to the standards but do not include the full text of the standards in Federal Register or the Code of Federal Regulations. As a result, those interested in commenting on a particular regulation may not have access to the relevant standard, particularly if it is copyrighted or only accessible for a fee.
The Electronic Frontier Foundation (EFF), the Association of Research Libraries, and OpenTheGovernment.org have sent comments to the Administrative Conference of the US recommending that "all material incorporated by reference -- regardless of the stage in the regulatory process, the subject matter of the regulation, or the identity of the regulated entity -- should be made freely available, with no purported copyright restrictions and downloadable on a government agency's website."
Public.Resource.Org submitted comments to the Office of Management and Budget on making standards that are incorporated by reference into federal regulations widely available to the public without charge. Public.Resource.Org also said that such standards should "be deemed in the public domain rather than subject to copyright restrictions."
- OpenTheGov and ARL Join EFF in Urging Government to Make all Parts of the Law Easily Available to Everyone (10/24/2011).
"copyrighted materials, once incorporated into law, should be available for free." The principles of transparency and accessibility to the law should animate agency decisions in this arena and materials incorporated by reference should be made freely available, online and off, at all times...
- Revised Draft Recommendations of the Administrative Conference of the US on "Incorporation by Reference in Federal Regulations" ACUS.gov (October 2011)
- Comments on "Incorporation by Reference in Federal Regulations" (October 21, 2011) To Committee on Administration and Management Administrative Conference of the United States Committee of Administration and Management from Corynne McSherry & Mark Rumold Electronic Frontier Foundation, Prue Adler, Association of Research Libraries, and Patrice McDermott, OpenTheGovernment.org
We urge ACUS to reject any suggestion that access to the law may be limited where the regulation in question happens to incorporate copyrighted materials. All material incorporated by reference - regardless of the stage in the regulatory process, the subject matter of the regulation, or the identity of the regulated entity - should be made freely available and downloadable on a government agency's website.
- Incorporation by Reference, A Proposed Rule by the Federal Register Office on 02/27/2012
On February 13, 2012, the Office of the Federal Register (OFR or we) received a petition to amend our regulations governing the approval of agency requests to incorporate material by reference into the Code of Federal Regulations. We've set out the petition in this document. We would like comments on the broad issues raised by this petition.
- Re: Request for Information 2012–7602, 77 FR 19357 submitted by Public.Resource.Org to the Office of Information and Regulatory Affairs of the Office of Management and Budget Washington (April 11, 2012).
See also: Liberating America's secret, for-pay laws.
Cory Doctorow says: "This morning, I found a an enormous, 30Lb box waiting for me at my post-office box. Affixed to it was a sticker warning me that by accepting this box into my possession, I was making myself liable for nearly $11 million in damages. The box was full of paper, and printed on the paper were US laws -- laws that no one is allowed to publish or distribute without permission. Carl Malamud, Boing Boing's favorite rogue archivist, is the guy who sent me this glorious box of weird. I was expecting it, because he asked me in advance if I minded being one of the 25 entities who'd receive this law-bomb on deposit. I was only too glad to accept -- on the condition that Carl write us a guest editorial explaining what this was all about. He was true to his word."
Liberating America's secret, for-pay laws, By Carl Malamud, boingboing (Mar 19, 2012).
Boing Boing Official Guest Memorandum of Law To: The Standards People Cc: The Rest of Us People From: Carl Malamud, Public.Resource.Org In Re: Our Right to Replicate the Law Without a License
Last month the House released its legislative appropriations ("Privatization of GPO, Defunding of FDsys, and the Future of the FDLP"). As we noted then, "the future of long-term preservation of and free access to government information is in the hands of Congress today." And it doesn't look any better today with the release of the Senate's legislative appropriations markup, S.Rept. 112-080 (see p.42 - 44).
The Senate added $500,000 to the House's revolving fund appropriations of ZERO (GPO had requested $6million!) -- but looks to be thinking that FDsys funding will come from there (GPO requested $6million for FDsys!) -- added around $6.8 million to congressional binding and printing and kept $35 million for salary and expenses for the Superintendent of Documents -- the same amount as the House. There's none of that restructuring language in the House appropriations (requesting GAO to research the efficacy of GPO privatization and splitting functions between LC and GSA). The Senate is recommending a 12.4 percent reduction in overall funding for the GPO from the fiscal year 2011 enacted level, which is only slightly less ugly than the 20% reduction recommended by the House.
Please call, write and email your Senators TODAY and express your support for FULL GPO funding, *especially* if you happen to live in a state whose Senator sits on the Appropriations Committee.
On July 22, the House passed a bill that would remove funding for FDsys, reduce funding for GPO by 20%, and reduce funding for the Superintendent of Documents by 16% (Kelley). The House Report on the bill also directs the Government Accountability Office to conduct a study on "the privatization of the GPO" and the transfer of the Superintendent of Documents and the FDLP to the Library of Congress (page 25).
The bill includes many other changes that are relevant to the dissemination of government information (see House Bill Questions Future of GPO and the comments to that post, and the stories in Library Journal and OMB Watch), but the ones related to FDsys and the privatization of the GPO are the ones which, if ultimately approved, would have the greatest negative impact on long-term free public access to government information.
Passage of only some of these bad ideas would almost certainly result in a catastrophic loss of long-term access to and preservation of government information. These bad ideas are, however, only symptoms of a still bigger problem. There is, luckily, an obvious, logical path around all these threats.
Proposals not new
While these proposed cuts and changes are drastic, they are not new. Similar proposals were considered in 1982 and 2001 by NCLIS, in 1988 by the Office of Technology Assessment, and in 1993 and 1994 by bills in the House of Representatives.
In addition to these official recommendations, the information industry has long argued that the private sector, not government agencies, should disseminate government information. It has characterized almost any government information activity as unfair competition with the private sector. Industry commissioned reports and official statements in 2000 and 2004 (Wasch) suggested that governments should only distribute raw data and should refrain from making data easier to use if there is even a potential commercial market for such information.
These private sector ideas have re-emerged in the last few years as governments have made raw data more easily accessible and technological mashups of government data have become almost commonplace. Calls for government to limit its role to the delivery of raw data and for a reliance on the private sector to make the data useful have become popular. (Robinson)
Whether such proposals suggest turning over government information dissemination to the private sector or commercializing the distribution of information by agencies themselves, when such proposals have been examined from the perspective of the user and from the perspective of information access in a democracy, they have been found to be severely wanting.
An examination of the literature reveals three reasons that proposals to privatize and commercialize government information make bad policy. First, by commoditizing public information, they conflict with the needs by citizens in a democracy for free access to accurate information about the activities of government. Second, they ignore that producing and disseminating public information is an essential role of government, not something that can be left to the whims of the market. Third, it has never been demonstrated that the privatization of major federal information dissemination activities is cost-effective or beneficial for important governmental functions.
It is also worth remembering that GPO was originally created because relying on private printing did not work well. Private printers often delivered jobs late and the printers themselves found that they lost money on public printing contracts (MacGilvray). Today, GPO contracts with many private publishers while maintaining overall control of the entire throughput.
Even information-industry-friendly reports on similar proposals have recognized an important, even essential, role for the government in government information dissemination. The 1982 NCLIS report, while strongly promoting a major role for the private-sector, nevertheless said that government information should be openly available without any constraints on subsequent use. It also advocated depositing documents into FDLP libraries for free accessibility. The 2001 NCLIS report similarly supported private-sector involvement but also concluded that, "...the federal government must continue to have primary responsibility for the entire life cycle of government information, including the dissemination and permanent public availability to public information resources to the American public without restrictions on its use or reuse." (emphasis added)
Nevertheless, there are still those who promote policies that would rely on market forces to determine what public information would be available to the public and at what cost.
Strong opposition to that approach comes from non-profits, libraries, citizen advocacy groups, scientists, journalists, historians, and government agencies. These groups understand that "there is [a] need to ensure equitable, open access by the public in general to information which has been generated, collected, processed, and/or distributed with taxpayer funds." (NCLIS p.ix). And in 1988, OTA said, some government information dissemination activities are "inherently governmental" because they "facilitate an informed citizenry [and] assist the mission agencies in carrying out their statutory responsibilities." (p301)
Some attempts by the government to commercialize or commoditize its information have failed either economically or functionally. For example, although STAT-USA existed on a "revolving fund" without Congressional funding for many years, charging a cost recovery fee for access to economic and trade information from federal agencies, in the long run it found that its fee-for-service business model was no longer viable and it shut down its operations. (Krasowska)
If you have been around government information issues for less than fifteen years, you may not be aware that the first incarnation of GPO Access attempted a cost recovery model similar to NTIS or STAT-USA. It charged annual suscription fees ranging from hundreds to thousands of dollars for access to the Congressional Record and Federal Register (GPO Access Status Report ). This model failed and was abandoned after less than two years (Relyea), partly as a result of libraries creating gateways that made this same information available without charge.
Even when fee-based services, such as PACER (Public Access to Court Electronic Records), seem to survive financially, their functional failure to provide free information is obvious and it is arguably true that they fail to adequately meet the needs of all.
There are Catch-22s implicit in proposals to privatize or commercialize government information. First, the information-industry suggests that it should have exclusive rights to information products that are profitable and leave to governments those that are not profitable. This sets up governments for failure when they try to support their activities with income from demonstrably non-profitable information products.
Second, governments set themselves up for failure when they charge for access to public information. If they simultaneously attempt to honor their role of providing information to the public without charge while charging for that same information (as in the early years of GPO Access), they compete against themselves. But if they attempt to protect their ability to charge for their information "products," they find themselves in the awkward position of attempting to control and restrict access to public information that is in the public domain. (Gellman)
Ultimately, attempts to commercialize government information therefore conflict fundamentally with the essential and inherent duty of government to make public information freely available and usable.
The current threat
Unfortunately, just because a proposal is bad policy, self-contradictory, or doomed to failure does not keep it from being implemented. The current budgetary and political situations in Washington DC create an environment where one or more of these bad ideas is more likely to pass than ever before.
Regardless of how important this issue is, it is unlikely that it will get much media attention. It is also not at all clear that proposals to keep government information free and well-preserved will garner much support politically. When essential government services such as food safety, police, defense, nutritional programs for low-income women and children, nurses, clean drinking water, and much more all face drastic cuts, will there even be room in the budget debates to consider government information? When Congress can seriously consider proposals to cut spending on programs that affect the health and safety of the country, we can hardly assume that it will necessarily provide adequate funding for information access. President Obama's own government transparency programs have been drastically cut and Obama himself is on record as thinking the printing of the Federal Register is wasteful.
The Big Problems
As serious as the current situation is, it can at least help us see the bigger issues that surround long-term preservation and access of government information and suggest solutions. The big problem is that we lack an adequate preservation and access "ecosystem" for government information. This puts all government information at the mercy of relatively small changes in government budgets. It is somewhat ironic that, if we had addressed the underlying issues earlier and had a rich ecosystem, we would be less vulnerable to the drastic proposals on the table today.
There are lots of issues and challenges that face those who wish to preserve long-term free access to government information. We can boil down a lot of those issues to two big ones:
1. Quantity. Just the quantity of information being produced digitally provides one huge challenge. Any attempt to preserve so much information must be able to scale to sizes that, until recently, were almost unimaginable. As Nicholas Taylor at the Library of Congress wrote recently, the amount of "data stored by the Library of Congress" has become a popular, if unusual, unit of measurement for capacity of storage, network traffic speed, size of digital collections, and so forth. The "End of Term" crawl of the web pages of the George W. Bush presidential administration by the Library of Congress, the California Digital Library, the University of North Texas, the Internet Archive, and the Government Printing Office produced almost 16 terabytes of data. (See more size comparisons here.) And digital preservation and access requires duplication and replication and backups that multiply the scale of projects quickly. LOCKSS-USDOCS, for example, says that the approximately 1 terabyte of data it is currently preserving is only a fraction of the 18 terabytes of content in FDSy when all the workflow iterations, copies, and backups are taken into account.
What this means is that providing preservation and access to all government information is a very large, non-trivial task. It is not clear that any one institution or organization will ever have the capacity or resources to do everything on its own.
2. Selection. But, you may well ask, how much of all digital government information is worth preserving? It is almost certainly true that much of the born digital content being produced by the government is of only transient interest or value. It is without question true that the rules have changed in ways that make it more difficult to know what is worth saving. In the past, we knew and could fairly easily define and identify "government publications" and could identify who created and published them. "Publications" were, for the most part, packaged as "books" and "journals" and "pamphlets" and so forth. These qualities made it relatively easy to know what we wanted to preserve and how to preserve it.
But in the digital environment, we find ourselves facing a whole new set of circumstances. It is not always clear who has created digital information, whether or not it is "government" information, or whether or not we have sufficient rights to collect or preserve or provide access to any given piece of information (Peterson). A single web page may display information from many different sources. A dot-gov web page may contain information from a commercial source and government agencies may post original content on dot-com web sites. The very processes that put a wealth of government information a click or two away also make it harder for us to preserve that same information and ensure its usability far into the future.
Apart from some obvious, preeminent series (e.g., Federal Register, Congressional Record, Hearings, Reports, the censuses, and other Essential Titles), lies everything else. Who will decide what of that "everything else" is worth saving? Who will decide if we save the digital equivalent of looseleaf binders, pamphlets, posters, one-off maps, slip laws, drafts, versions, editions, memos, press releases, and so forth? We must consider multi-media formats. We have to decide whether or not the "look and feel" of website presentations of information is important to preserve and, when there are several different presentations of the same information, which we should preserve.
Selection in the world of bad budgets ultimately means de-selection and weeding. Digital objects don't get preserved by accident. They require constant attention and preservation work. When a repository says, "We can no longer afford to preserve this and this and this," it is often relegating those things to oblivion.
Despite these big difference between the digital world and the analog world, the big, foundational issues we face are not that different. Specifically, there are two foundational issues: First, different people have different needs. What is important to you may not be important to me and vice versa. Second, the question of selection of what to preserve is a question of who will have the decision making power and who will have the control over their own decisions (Jacobs).
Some conclusions. There are some inescapable conclusions we can draw from the combination of the issue of quantity and need for selection. First, there is a need for more than one organization to be responsible for preservation and long-term access just to deal with quantity and scale. We cannot rely on any single institution or organization to preserve everything that is of value to everyone; that is just too big a job. The Library of Congress has come to the same conclusion, which is why it has created the National Digital Stewardship Alliance (NDSA).
Second, there is a need for different organizations to be involved in preservation in order to adequately reflect the information needs of different user communities. No single institution that intends to serve "everyone" can afford (literally afford, in terms of money and other resources) to pay sufficient attention to the needs of every small, specialized constituency. Without such attention, information will fall through the cracks and be lost.
Third, each such institution must have the ability to select information for preservation and obtain sufficient control over that information such that it can perform the needed digital preservation activities that will ensure long-term preservation of and access to that information.
Digital Preservation Road Maps
Luckily, we have road maps for digital preservation that help us address the issues and challenges outlined above. The road maps are the Reference Model for an Open Archival Information System (OAIS) combined with the checklist for certifying digital repositories, the Audit And Certification Of Trustworthy Digital Repositories (TDR).
TDR is built upon OAIS. Together they provide the context for long-term preservation and access to any digital collection. They do not describe how to build a repository nor do they define technologies that must be used. Rather, OAIS describes the required functions of a digital repository and TDR provides a checklist of "metrics" for evaluating if a given repository is meeting its own goals and objectives for achieving those functions. OAIS and TDR are just as applicable to small repositories and institutions as large ones.
TDR recognizes that preservation is not just about technology. It is also about continuity over time of the archive itself. TDR describes two essential requirements in this area that are particularly relevant here: the need for long-term financial sustainability and the need for succession planning.
1. Sustainability. TDR says that, to ensure viability, a repository must have business planning processes that ensure its financial sustainability over time.
Viewing sustainability for government agencies is tricky. On the one hand, an agency can claim that it has the full faith and credit of the government, legal mandates, and (in some cases) the historical precedent of its long-term mission. On the other hand, agencies come and go, budgets are cut and reallocated, and missions change.
In fact, as noted above, the current proposals are only the most recent examples of these very issues facing GPO. GPO has always had high hopes and made big promises, but its hopes and promises are limited by what Congress sets as its mission from year to year and how Congress funds it -- or denies it funding. In a single budget cycle, "permanent preservation" can change to "temporary storage" and "free" can change to "fee-based." A single bad-budget year can force GPO to make selection decisions that result in weeding of information that some communities will still consider vital.
While a lot of what affects sustainability is outside the control of the repository, there are many things than each repository can control and many actions it can take to control those. It can also take actions that will provide the best possible context for dealing with events outside its control. TDR enumerates these. But TDR says that a repository must also prepare for the possibility that unforeseen or unavoidable events might make sustainability impossible. For such occasions, a repository needs a succession plan.
2. Succession Planning. Any organization faces the possibility of funding cuts and shortages and unforeseen problems that can result in anything from scaling back to going out of existence entirely. IBM recently made this point clear about private sector companies when it said, "Nearly all the companies our grandparents admired have disappeared. Of the top 25 industrial corporations in the United States in 1900, only two remained on that list at the start of the 1960s. And of the top 25 companies on the Fortune 500 in 1961, only six remain there today." TDR recognizes this and says that any trusted repository must have a formal succession plan, contingency plans, or escrow arrangements in case the repository ceases to operate or the governing or funding institution substantially changes its scope. These are exactly the threats GPO faces today. I cannot think of a better demonstration of the need for succession planning.
But what does it mean to have a succession plan? It means having a plan that will ensure the long-term preservation of the content for which a repository is responsible even if the repository ceases to exist. In general terms, it means that an organization has a plan for what specific actions it will take if it learns it has to change missions or if it will cease to exist. In extreme circumstances, it means that it has a plan in place to hand over its content to one or more trusted repositories.
We already have some existing projects for government information that may serve as models for for viable, long-term, collaborative solutions to succession. These include the Department of State Foreign Affairs Network (DOSFAN) partnership between the U.S. Department of State, the University of Illinois at Chicago, and the Government Printing Office; the LOCKSS-USDOCS partnership between Stanford, Carl Malamud's public.resource.org, GPO's FDsys, and more than three dozen libaries; and the CyberCemetery partnership between The University of North Texas Libraries and GPO.
In order for a repository to say that its content will survive the downsizing or elimination of the repository, it needs to be able to show that its content already is in another repository or that it could hand over its content to another repository. For a hand-over to take place, there would have to be another repository technically capable of accepting such a hand-over.
This, along with the above conclusions based on quantity and selection, leads us to some solutions both for our current situation as well as for the underlying issues surrounding long-term preservation and access.
There is a common theme to the conclusions above. First, to ensure preservation of all that needs to be preserved, we need multiple repositories serving the needs of multiple communities of interest. OAIS calls these "Designated Communities" and both OAIS and TDR make them an essential (non-optional) element of trusted repositories.
Second, in order for repositories to have realistic succession plans, we need an information preservation ecosystem consisting of many repositories capable of cooperating with each other's succession planning.
In a nutshell: The more repositories we have, the more secure all repositories will be, collectively. The more repositories we have, the better we can ensure that content relevant to all user communities will be selected and preserved by at least one of those repositories.
Visions of the future
What might this look like in practice? There is no single prescription for success, but we can imagine effective, practical, successful scenarios. Success would be achieved if we had a mix of a variety of different kinds of libraries and archives and repositories, each working for the best interests of its own designated user community, but, collectively, providing a national, loosely-coupled "system" of preservation and access. (Does this sound like the traditional FDLP? Yes! The FDLP provides us with a working model of experience in just such a system.)
In such a system, individual libraries (small and large) and consortia of libraries (small and large) could contribute to the long-term free public access to government information -- simply by meeting the needs of their own user communities.
The Digital Public Library of America might provide a technical and organizational framework within which many libraries might act and contribute.
I can imagine lots of examples of how individual libraries or groups of libraries might take actions that would benefit their own user communities as well as the information ecosystem as a whole. I am sure that you can add to this list from your own experiences with your own user communities.
- A few big repositories like HathiTrust, the Internet Archive, and LOCKSS-USDOCS containing large volumes of easily identifiable and obtainable major series of government digital information.
- Consortia of law libraries (like the Chesapeake Project Legal Information Archive) combining forces to preserve the basic, essential, official legal record of the nation (from all 3 branches).
- Regional, state, and local law libraries preserving local jurisdictional legal information and linking their collections through rich metadata and APIs to each other and national collections.
- Libraries with a regional focus collecting information relevant to the region from multiple agencies and jurisdictions. (e.g. water rights, immigration, trade, agriculture)
- Libraries that focus on specific kinds of users with common kinds of information needs (e.g., undergraduates, K-12, practicing physicians, farmers) collecting government information from many sources to build strong, dynamic working collections.
- Libraries that want to emphasize a particular kind of information or research (e.g., spatial/GIS data, astronomical data, statistical and raw numeric data from censuses, weather data, textual corpa), combining government information with information from other sources and with computational tools to provide rich research environments for researchers.
- Research libraries with institutional repositories of their research output combining government information to supplement, document, and enhance those collections.
The above are just examples, not prescriptions or predictions. The concept I want to illustrate above is that, when lots of libraries and archives and repositories select and acquire digital government information and create rich digital collections for their own communities, the result will be, collectively, better preservation and more focused access than any single institution could create on its own. This rich environment would be much more secure than our current environment in which each library hopes that some other library or government agency will take care of preservation and access to materials that are essential to its own user community.
So what do we do next?
- We need to oppose recommendations before Congress that would gut GPO or force GPO to weed or disable FDsys or commercialize government information. The current bill will not be the last; we need to be able to make a convincing, persuasive case that government has an essential, inherent role in the life cycle of government information.
- We need to work with existing large digital repositories (e.g. HathiTrust, Internet Archive, LOCKSS-USDOCS, etc.) to see if they can host government information and make it freely accessible now -- particularly in the event of a scaled back or discontinued FDsys.
- We need to work with Depository Library Council, GODORT, ALA, GPO, our own local FDLP libraries and Regional Depositories to plan for an FDLP of the future that includes life-cycle management of digital government information. This will inevitably include, but not be limited to, digital deposit of Title 44 materials into FDLP libraries.
- We need to instruct ourselves in the requirements of Trusted Digital Repositories by learning about OAIS and TDR. Where there are learning opportunities we need to take them and where there need to be new opportunities we need to make them. Those of us with influence on the curricula of library schools need to make this a requirement.
- Building on our own individual knowledge we can then work at a local level within our own libraries and library consortia and library organizations to build our own digital infrastructures and digital collections that meet the requirements of OAIS and TDR. We need to make sure that the planning process is not overwhelmed by technical considerations to the exclusion of long-term sustainability and succession planning. Sustainability and succession planning need to be integrated into the planning process from the beginning, not addressed later as an afterthought. This will help us have better conversations with our colleagues and will lead to more cooperative projects and better cooperative planning.
- We need to work with national and regional organizations and professional associations to plan for a future information preservation ecosystem and infrastructure. Librarians need to work with different kinds of libraries; librarians and archivists and technologists need to work together. The ecosystem doesn't have to be a huge bureaucratic institution -- indeed, it probably should not be -- but it will benefit from collaborations and planning that stretch across traditional boundaries.
Ask or Act?
The future of long-term preservation of and free access to government information is in the hands of Congress today. That leaves us with the feeling that all we can do is ask Congress to do the right thing. But we can do more than ask; we can act. Indeed, we must act. We have the power to take that control out of the hands of Congress and put it into our own hands by building our own digital collections. For many libraries, that will mean a change in strategy: instead of relying on someone else to ensure long-term access to the information your Designated Community requires, you will rely on your own actions. This comes with costs, of course, but it also has big benefits. You will be providing the essential services that your community needs. And that means that you will have a built-in, inherent role that no one else has, which will make your library more sustainable for the long run.
- Ambacher, Bruce I., Government Archives and the Digital Repository Audit Checklist, Journal of Digital Information, 8 (2007).
- Consultative Committee for Space Data Systems, Reference Model for an Open Archival Information System (OAIS) CCSDS 650.0-B-1 BLUE BOOK January 2002’ (CCSDS Secretariat, 2002), CCDS.
- Consultative Committee for Space Data Systems, Audit And Certification Of Trustworthy Digital Repositories, "Red Book," Issue 1 (Washington D.C.: Council of the Consultative Committee for Space Data Systems, October 2009).
- Federal Executive Agencies Terminated, Transferred, or Changed in Name Subsequent to March 4, 1933, United States Government Manual 2009-2010 (Appendix B).
- Gellman, Robert M. Twin Evils: Government Copyright And Copyright-Like Controls Over Government Information, Syracuse Law Review 45:999 (1995).
- Government Printing Office Electronic Information Access Enhancement Act of 1993 [Public Law 103-40].
- House Report 112-148, to accompany H.R. 2551, Legislative Branch Appropriations Bill, 2012, Committee on Appropriations (July 15, 2011)
- H.R.2551, Legislative Branch Appropriations Act, 2012, Referred to Senate committee 7/22/2011.
- Jacobs, James A. FDLP: Services and Collections [preprint] by Against the Grain, 21(2) April/May 2009.
- Kelley, Michael. Bill Passed by House Would Provide No Money for GPO's Federal Digital System, Sharply Cuts Other Information Resources, Library Journal (Jul 27, 2011).
- Krasowska, Francine. A Message from STAT-USA’s Director (August 2010)
- MacGilvray, Daniel R. A Short History of GPO, Administrative Notes (1986).
- McGilvray, Jessica. ALA opposes cuts to Government Printing Office in Legislative Branch Appropriations Act, by District Dispatch, American Library Association, Washington Office (July 21, 2011).
- OMB Watch. House Questions Future of Government Printing Office, (July 27, 2011)
- Peterson, Karrie and Jacobs, James A. Government Information in the Digital Era: Free Culture or Controlled Substance?, paper presented at the symposium, "Free Culture and the Digital Library" at Emory University in Atlanta Georgia, October 2005.
- Relyea, Harold C. Public Printing Reform: Issues and Actions, Congressional Research Services report 98-687 (June 17, 2003),
- Robinson, David G., Yu, Harlan, Zeller, William P. and Felten, Edward W., Government Data and the Invisible Hand (2009). Yale Journal of Law & Technology, Vol. 11, p. 160, 2009.
- Sheketoff, Emily. Letter [MS Word document; available as a PDF document here] to Harold Rogers and Norman D. Dicks, Committee on Appropriations U.S. House of Representatives, from Emily Sheketoff, Executive Director ALA Washington Office (July 21, 2011). [includes attached "Resolution On Government Printing Office Fy 2012 Appropriations" Adopted by the Council of the American Library Association, June 28, 2011.
- STAT-USA. STAT-USA Office to Cease Operations September 30, 2010
- Stiglitz, Joseph E., Orszag, Peter R., and Orszag, Jonathan M. The Role of Government in a Digital Age, Commissioned by the Computer & Communications Industry Association. October 2000.
- Terry, Jenni. House passes Legislative Branch Appropriations Act with 20 percent cut to Government Printing Office, District Dispatch, American Library Association, Washington Office (July 25, 2011).
- U.S. Congress, Office of Technology Assessment, Informing the Nation: Federal Information Dissemination in an Electronic Age, OTA-C IT-396 (Washington, DC: U.S. Government Printing Office, October 1988). [Y 3.T 22/2:In 3/9:]
- U.S. General Accounting Office. Information Management: Electronic Dissemination of Government Publications, GAO Report 01-428 (March 30, 2001)
- U.S. Government Printing Office. Essential Titles for Public Use in Paper or Other Tangible Format. (Written on Monday, 24 November 2008 Last Updated on Monday, 20 June 2011)
- U.S. Government Printing Office. GPO Access: Status Report. (June 30, 1994).
- U.S. Government Printing Office. Printing Procurement Regulations (revised 2/11)
- U.S. National Commission on Libraries and Information Science. A comprehensive assessment of public information dissemination: final report, United States. National Commission on Libraries and Information Science, Washington, DC : The Commission, (2001) [Y 3.L 61:2 D 63].
- U.S. National Commission on Libraries and Information Science. Public Sector/Private Sector Interaction in Providing Information Services. Report to the NCLIS from the Public Sector/Private Sector Task Force. U.S. Government Printing Office, Washington, DC (1982). [Y 3.L 61:2 P 96/2].
- U.S. Office of Management and Budet. OMB Circular A-130, Transmittal Memorandum #4, Management of Federal Information Resources (11/28/2000)
- Wasch, Ken. Letter (May 13, 2004) "SIIA Comments Regarding New Economic Model for The GPO Sales Program," Letter to Bruce James, Public Printer, U.S. Government Printing Office from From Ken Wasch, President Software and Industry Information Association.
The National Archives is apparently preparing to reverse a long standing policy of providing free public access to Census Schedules when it releases the 1940 Census next year. (See "Update" below for additional information.)
Early next year the National Archives and Records Administration (NARA) can make the 1940 Census Schedules available to the public for the first time. (See "Background" below.) NARA has digitized these files and created metadata for them in preparation for making this valuable trove of information accessible on the web. It now only remains for NARA to decide who should provide access at what cost to users. Should NARA provide free access? Or should it contract this service out to a private company that will imposes fees on users and make a profit by providing access to this public information?
The answer should be obvious. For decades NARA has provided free access to Census Schedules at regional archive facilities and has sold microfilm to libraries that provide free access to their users. Now that online digital access is possible, NARA can provide better access online without having to maintain physical access at its regional facilities. It can distribute digital files to libraries for little or no cost so that libraries could further increase access and functionality for all of the information or subsets of it.
It seems, however, that NARA is choosing privatization instead of free public access. In an eight page RFI (Services Request for Information (RFI) NAMA-11-RFI-0004, 1940 Census [Microsoft Word .docx] or see the PDF version] NARA is seeking "industry input" for a "no-cost contract" to provide managed hosting and online access to the 1940 Census. The RFI is intended to explore options and "may or may not lead to a solicitation" for an actual contract. This means that NARA could, apparently, make a decision to do this work itself, but it is exploring the privatization route first. (Presumably, libraries could respond to the no-cost RFI as well. Responses were due on June 22.)
According to NextGov, the "no-cost contract" means that "the vendor would do the work for free and then charge the public a fee to access the records." (Archives Wants to Put 1940 Census Online, by Joseph Marks, NextGov TechInsider, July 15, 2011
The RFI does not explain why NARA is pursuing this path or what advantages it sees to privatization. I would guess the most likely reason is that NARA does not anticipate that it can get adequate funding to host the data online itself. But has it asked for funding? Has it made the case for continuing its historic provision of free public access to Census Schedules? Has it justified privatizing public information?
We have seen NARA follow this path before. NARA partnered with footnote.com to digitize selected holdings. This resulted in access restrictions including membership fees, per-page charges for downloading, and age restrictions to these digitized public documents. NARA partnered with Ancestry.com to make public records available for a fee. At the time, a spokesperson said that budget constraints and other priorities kept the Archive from making this information available itself.
"In a perfect world, we would do all this ourselves and it would be up there for free," she said. "While we continue to work to make our materials accessible as widely as possible, we can't do everything." -- Ancestry.com unveiled more than 90 million U.S. war records, New York Times (May 24, 2007).
In 2008, NARA contracted with TGN to digitize and provide access to some of NARA's holdings. The contract restricted free public access for five years. We've written about this here at FGI before (The NARA/TGN contract as a bad precedent) and believe that deals like this remain bad for NARA and bad for the country. We believe these kinds of deals set a bad precedent -- a precedent that is now being unnecessarily followed with the 1940 Census.
Those past deals were different from this one in one key way, however. They involved digitization of materials by the private contractors. In the case of the 1940 Census, the materials are already digitized, according to the RFI. The arguments we heard in the past were that digital access was so much better that it was worth privatizing access in order to get the digitization done. Without privatization, it was argued (even by some librarians and archivists), the materials could not be digitized and we'd be stuck with analog access. This is not the case for the 1940 Census since the materials are already digitized. The decision for online digital access has been made. The only question now is whether to make the existing digital files freely available or available through privatization.
Of course, the cost of any project providing access to all the 1940 Census Schedules and maps will not be insignificant. According to the RFI, NARA has created 3.8 million JPEG images, comprised of 20 terabytes of data.
Twenty terabytes is a lot of data, but it is fast becoming an almost modest size for a digital library. For comparison, the HathiTrust has over 3 billion pages and over 400 terabytes of data, OCLC has an over 600 Terabyte capacity, the Wayback machine contains 100 terabytes, the Library of Congress web archive is 235 terabytes, the University of California Curation Center has 70 TB in its Merritt digital preservation repository, and NARA's own Electronic Records Archives (ERA) has more than 90 terabytes. These terabyte-scale digital libraries are virtually the new norm and petabyte-scale digital libraries are already being built. Some of these are the Shoah Foundation Institute's digital library (8 petabytes), the Stanford Digital Repository anticipating a capacity of petabytes, and the Digital Hammurabi Project which is building a petabyte-scale digital library and museum of virtual 3D cuneiform tablets.
But the cost of providing access to microfilm at 13 regional offices was not cheap either. To me the question is whether or not the government is willing to continue its historic mission of providing free access or if it is ready to abandon that mission to the private sector.
There have always been those who argue that the fee-based private sector should take precedence over the public sector free-access. But, privatization of access to Census schedules would represent a reversal of long-standing policy. What was once an unquestioned government function is now, apparently, being considered a commercial function. Where, in the past, it was the government that provided free, public access to census schedules, now, when access can be improved, the government is abrogating its role and turning access over to private companies that will provide the information for a fee. The issues are not new. The precedents for providing free public access exist and have a long and respectable history. The only thing new is that NARA seems to have accepted privatization as inevitable.
Will there be funding for NARA to provide access to 1940 Census Schedules? There may not be. We have argued here at FGI for years that relying solely on Congressional funding for permanent, free public access to government information is risky because there is always the chance that Congress will not fund it. In these highly-politicized, economically troubled times it is easier to imagine a lack of any funding that to imagine adequate funding for the long term.
But this does not mean that privatization is the only option. There are precedents for government projects that are supported by donations and public-private partnerships. The American Memory Project is one notable example. And individual libraries or groups of libraries could step in and offer to provide free public access.
Now is the time for NARA, supported by researchers, libraries, and archivists to actively promote and pursue free public access solutions. There is no reason to accept privatization as inevitable.
Update: Note that the NARA website 1940 Census page says that "The digital images will be accessible at NARA facilities nationwide through our public access computers as well as on personal computers via the internet." Additionally, a comment on the Ancestry World web site said: "NARA will make the digitized copies of the 1940 Census population schedules available to the public, free of charge, on April 2, 2012 through our new Online Public Access search (http://www.archives.gov/research/search/)."
It is not clear from the above if the policy on free public access has changed with issuance of the RFI or not.
The Census Bureau conducts the Decennial Census every ten years. The Bureau summarizes its findings in reports that contain no information on individuals. The raw information collected, including names and addresses of those surveyed (sometimes called the "manuscript census" or the "census schedules"), is protected by law and is kept confidential for 72 years. After 72 years, that raw information is released by the National Archives and Records Administration. This information is invaluable to genealogists and other researchers. Typically, this raw information has been made available on microfilm at regional National Archives offices (Availability of Census Records About Individuals). This 72 year period for the 1940 Census expires on April 2, 2012.
- Measuring America: The Decennial Censuses From 1790 to 2000
- Census Of Population And Housing: Reports, 1790-2010
- All of the information that the U.S. Census Bureau collects under 13 USC 9 is confidential.
- The "72-Year Rule"
- Census information and records can be invaluable tools in genealogical research
The Sunlight Foundation announced today a new bill introduced by Congressman Steve Israel (NY-2) called the Public online Information Act (POIA) (read the bill (PDF)). POIA will require that all "public" executive branch documents be permanently available on the Internet at no cost. POIA also creates a:
"special federal advisory committee to coordinate the development of Internet disclosure policies. These policies promote information best practices, including data interoperability standards, and will keep the government up-to-date with new technology. The advisory committee’s 19 members – six appointed by each branch of government, plus one by GSA – are drawn from the public and private sectors and serve as watchdogs, synthesizing the needs of agencies and the public and making recommendations on updating federal law."
While I wholeheartedly support the spirit of POIA -- free permanent internet access to executive branch documents! -- and will definitely be contacting my representative to support its passage, I have 2 concerns that I hope will be discussed by the Sunlight community, the soon-to-be federal advisory committee, libraries and the public:
1) preservation: There was an article in today's NY Times -- "Fending Off Digital Decay, Bit by Bit" -- that highlights the many issues surrounding digital preservation. Just putting something on the Web does not mean that it will be preserved. The GPO has been working on their Federal Digital System (FDsys) since 2004 (and really since 1994 when they started GPOaccess) to deal with the inherent digital issues. Many researchers, librarians, academics, computer programmers etc have been working on these issues pretty much since the 1960s. And the issues are still here today.
So I'd like to see as part of this bill an acknowledgement that online information is expensive to preserve AND that there will be continued funding for research and sustainability of digital archives through the National Digital Information Infrastructure & Preservation Program (NDIIPP). Readers are encouraged to explore the issues here and here.
2) privatization of govt information: The following from the Sunlight announcement caught my eye and concerned me:
Freeing government information from its paper silos provides the private sector with raw material to develop new products and services and gives the public what they need to participate in government as active and informed citizens.
Federal government information is in the public domain. That's a good thing. However, there's a fundamental issue at stake here. One can't have "permanent free public access" to government information where the private sector is involved. The private sector has been involved in giving access to government information for a long time (see LexisNexis, Thomson West, Readex etc). They do it well but they certainly don't do it for free. Libraries and other organizations have paid many millions of dollars to license access to govt information for the communities they serve. Here's more background and context on privatization. For all intents and purposes, these private sector companies take public domain information and privatize it. Any digital govt information accessible on the internet should already be findable, usable and accessible in bulk at minimum.
But there needs to be more. What I'd like to see in this bill and in the discussion after it passes (devil's in the details right?!) is not only a requirement that all govt information is online permanently and for free, but that there be the inclusion of a viral GNU General Public License-like piece of the public domain whereby anything IN the public domain (i.e., govt information) has to STAY IN the public domain. There are plenty of folks (I'm looking at you Sunlight, Govtrack.us, OpenCongress, OpenCRS etc) excited about making govt information more available, more usable and more shareable and this would support their public service.
Open source intelligence -- not to be confused with Open-source software -- is "a form of intelligence collection management that involves finding, selecting, and acquiring information from publicly available sources (my emphasis) and analyzing it to produce actionable intelligence." Libraries in the Federal Depository Library Program have since the early 1940s received output from this process in the form of Foreign Broadcast Information Service (FBIS) materials *for free*. FBIS materials offered translation of foreign news sources, and via the Joint Publications Research Service (JPRS) foreign language books, newspapers, journals, unclassified foreign documents and research reports. FBIS became the World News Connection in 1996, but it is a severely limited version (about half) of what's available for internal government use.
All that background as context to a very troublesome turn of events as described by a recent post on the govdoc-l list (see the email below stripped of personal information). This important piece of the govt information universe is now only available via a very expensive commercial database (World News Connection), depriving the academic and larger research communities of full access to all that is done by FBIS at taxpayer expense. Please help us by contacting the Open Source Center (OSCinfo@rccb.osis.gov 202-338-6735, or 1-800-205-8615) and Robert Tapella (PublicPrinter@gpo.gov) at the Government Printing Office and request that the Open Source Center offer free access of opensource.gov to depository libraries. Thanks!
Date: Wed, 24 Feb 2010 10:25:58 -0600
Subject: OpenSource.gov access
Has any library successfully gained access to OpenSource.gov?
For those who are unfamiliar with this resource, here is the what their web page says about them:
"OpenSource.gov provides timely and tailored translations, reporting and analysis on foreign policy and national security issues from the OpenSourceCenter and its partners. Featured are reports and translations from thousands of publications, television and radio stations, and Internet sources around the world. Also among the site's holdings are a foreign video archive and fee-based commercial databases for which OSC has negotiated licenses. OSC's reach extends from hard-to-find local publications and video to some of the most renowned thinkers on national security issues inside and outside the US Government. Accounts are available to US Government employees and contractors. Register today to see what OpenSource.gov has to offer."
When we tried to register, they informed that we would have to justify why we needed access to the information and that we could get the information through World News Connection (via Dialog) OR, and I quote:
"In addition to the World News Connection, individuals may be able to access OSC products through university libraries, or the Federal Depository Library Program. Many Depository Libraries received CDs from the US Government Printing Office that contain select Open Source Center products." [The CDs that they are referring to are the FBIS materials (PREX 7.10/3:)]
In our response, we informed them that WNC was an expensive database they we could not afford and that their information regarding OSC being distributed through the FDLP was sorely out of date since the CDs have NOT been distributed for over 5 years.
In their response, they say they are considering adding additional agencies such as the Federal Depository Library (FDL) as part of the approved list of agencies in OpenSource.gov., but such a review would take a considerable amount of time to do. (I took this to mean, when 'ell freezes over.) Now here is the strange part--they think the FDLP is under the Dept of Interior and we could sign up that way--but our email address would need to have .gov or .mil in it. I am not sure, but I think they are actually referring to the Natural Resource Library in the U.S. Dept of Interior, which is a federal depository library, with which we are not associated, so this is NOT an option.
At this point I am stymied as to how we can have access to information that was formerly available FOR FREE through depository but is now only available through commercial ($$$) means. I know that GPO is aware that the CDs are no longer being distributed because of the creation of the OpenSource database. The only message I could find about this situation via the GOVDOC-L archives was from 2007 when they said "FDLP is still working with the agency OSC to get an agreement with how we are going to access their database." It is now 3 years later and we still do not have access to this information.
In the meantime, we have a professor on campus doing research in Middle East affairs and would like to have access to more recent information than what we have in our library via microfiche and CDs. We can not afford WNC, so I don't know what else we can do--except get access to OpenSource.gov. If anyone has been successful, I would be happy to hear how you did it.