Home » Library
Category Archives: Library
There is a lot of activity going on to ensure that government information on government servers does not get altered, deleted, or lost during the transition between administrations. As we have pointed out before, this is not a new issue even if the immediacy of the problem is more apparent than ever before.
Much of the effort going into these activities has to deal with the inherent problems of how federal government agencies create and disseminate information. There is, for example, no comprehensive inventory or national bibliography of government information. Agencies do not even provide inventories of their own information. This makes it hard to identify and select information for preservation. Also, some information is in databases or linked to Web applications that are not directly acquirable by the public. Finally, the "digital objects" that we can identify and acquire are often not easily preservable.
The inherent problem is that agencies are not addressing digital preservation up front. Librarians and Web archivists are left trying to solve the digital preservation problem too late in the life-cycle of information. We are trying to preserve information long after its creation and "distribution" — in the absence of early preservation planning by the agencies that created the information. This is understandable under current government information policies because most government agencies do not have a mission that includes either the long-term preservation of their information or free public access to it. The Federal Records Act [Public Law 81-754, 64 Stat. 578, TITLE V-Federal Records (64 Stat. 583)] and related laws and regulations only cover a portion of the huge amount of information gathered and created by the government. In addition, the preservation plans that do exist are subject to interpretation by political appointees who may not always have preservation as their highest priority.
What we need is a better approach to government information management that includes preservation planning at the beginning of the information life-cycle and that guarantees its long-term preservation and free public access to it even if the agency has no more need for it, or if Congress has no more funding for it, or if politicians no longer want it.
How can that be done?
At FGI, we believe that a long term solution will require a change of government policy. That is why we have proposed a modification to OMB Circular A-130: Managing Information as a Strategic Resource that would require every government agency to have an Information Management Plan.
This seems to us to be a reasonable suggestion with a good precedent. The government agencies that provide research grants already require researchers to have a Data Management Plan for the long-term preservation of data collected with government research grant funding. A modification of A-130 would simply put the same requirement onto information produced at government expense by government agencies that the National Science Foundation (NSF) and other government funding agencies put onto the data produced by researchers with government funding.
Here is an draft of such a requirement:
Every government agency must have an “Information Management Plan" for the information it creates, collects, processes, or disseminates. The Information Management Plan must specify how the agency’s public information will be preserved for the long-term including its final deposit in a reputable, trusted, government (e.g., NARA, GPO, etc.) and/or non-government digital repository to guarantee free public access to it.
We believe that such a requirement would provide many benefits for agencies, libraries, archives, and the General Public. It would make it possible to preserve information continuously without the need for hasty last-minute rescue efforts. It would make it easier to identify and select information and preserve it outside of government control. It would result in digital objects that are easier to preserve accurately and securely. It would accomplish many of these goals through the practical response of vendors that provide software to government agencies. Those vendors would have an enormous market for flexible software solutions for the creation of digital government information that fits the different needs of different agencies for database management, document creation, content management systems, email, and so forth, while, at the same time, making it easy for agencies to output preservable digital objects and an accurate inventory of them ready for deposit in Trusted Digital Repositories (Audit And Certification Of Trustworthy Digital Repositories [ISO Standard 16363]) for long-term preservation and access.
Perhaps most important for FDLP Libraries, we believe that this OMB requirement would provide a clear and practical opportunity for libraries to guarantee long-term free access to curated collections of government information to their Designated Communities. And this, we believe, will drive new funding and staffing to libraries and digital repositories.
Jame A. Jacobs and James R. Jacobs.
During this past week, there were many reports about the Trump administration’s actions that appear to be either removing information, or blocking information, or filtering scientific information through a political screen before allowing that information to be released.
How concerned should government information specialists be about these developments? Very.
What can we do? First, let’s be cautious but vigilant. As librarians, we are well aware that today’s information environment bombards us with fragments of news and demonstrably false news and speculation and premature interpretation of fragmentary speculation of unverified news. We should neither panic nor dismiss all this as noise. There is so much happening in so many areas of public policy right now that no one can keep up with everything; one thing that government information specialists can do is keep up with developments about access to government information so we can keep our colleagues and communities informed with accurate information.
We also need to evaluate what is happening critically. The Trump administration has attempted to normalize last week’s actions, saying, essentially, that removal of information and control of information is a normal part of a transition. On Tuesday of last week, for example, White House press secretary Sean Spicer addressed concerns about reports of censorship at the Environmental Protection Agency (EPA) by saying “I don’t think there’s any surprise that when there’s an administration turnover, we’re going to review the policies.” And on Wednesday, Doug Ericksen, the communications director for the President’s transition team at the EPA, said “Obviously with a new administration coming in, the transition time, we’ll be taking a look at the web pages and the Facebook pages and everything else involved here at EPA.” In short, this explanation is that the new administration is just updating and transitioning and making sure that information from agencies conforms to its new policies. This is “business as usual.” Nothing to see here; relax; move on. Even some govinfo librarians minimize the significance of what is going on.
This sounds reasonable on the surface. Indeed, since even the entire Executive Office of the President, which includes the Council of Economic Advisers and the National Security Council and the Office of Management and Budget, has been offline since Inauguration day and a temporary page asks us to “Stay tuned as we continue to update whitehouse.gov,” perhaps we should just be patient? Surely those will be back, right?
I think we need to realize that this actually is pretty odd behavior. And we need to help our communities who still need access to important policies (like OMB Circular A-130 [IA copy]) that are gone but, presumably, still in effect.
We need to be aware that this administration presents difficulties for the public in just figuring out what it is actually doing. It appears that the administration has reversed or modified some of its initial information policies or that they were incorrectly reported, and that these reversals — if they are reversals and if they are permanent — seem to have come about because of public outcry.
I think we need to take the administration’s actions seriously and let them know when they are doing something unacceptable or uninformed. We need to stand up for public access and transparency.
I suggest that it is our professional duty to address these issues. I suggest that the communities that our libraries serve expect and need us to do this. This administration is doing many troubling and controversial things and everyone cannot fight every battle. Ensuring long-term free access to government information should be a job responsibility for every government information librarian.
What can we do? What should we do? How can we best allocate our resources?
- We need to keep our library administrators informed. We can do that by putting Government Information on committee agendas and preparing accurate and well-informed briefings that address how political changes will affect the library’s ability to provide content and service and how they will affect library users’ ability to find and get and use government information.
- We need to talk to our user-communities. We need to provide them with accurate and well-informed information about how political changes are already affecting their ability to find and get and use government information. We need to provide alternate sources where necessary and update library guides and catalogs. We need to learn from them when they identify issues and problems and solutions.
- We need to keep our professional colleagues informed through local library meetings, informal communication, and professional activities.
- We can still contribute to the EOT. There are lots of things you can do.
- We can make the case for digital collections.
- We need to remind our administrators that when we depend on pointing instead of collecting we lose information.
- We need to remind them that even though preservation sites like obamawhitehouse.archives.gov and the 2016 EOT crawl are worthwhile and valuable, they still create the problem of link rot. We need to remind library administrators that pointing to remote collections that move is not a cheap way to provide good service. It is a time-consuming, never-ending task that is neither easier than nor as reliable as building local digital collections.
Sample of News Stories
by Dino Grandoni (Jan. 26, 2017)
By MICHAEL BIESECKER and SETH BORENSTEIN (Jan. 26, 2017)
I was honored last week to be part of a panel hosted by OpenTheGovernment and the Bauman Foundation to talk about the End of Term project. Other presenters included Jess Kutch at Coworker.org and Micah Altman, Director of Research at MIT Libraries. I talked about what EOT is doing, as well as some of the other great projects, including Climate Mirror, Data Refuge and the Azimuth backup project, working in concert/parallel to preserve federal climate and environmental data.
I thought the Q&A segment was especially interesting because it raised and answered some of the common questions and concerns that EOT receives on a regular basis. I also learned about a cool project called Violation Tracker, a search engine on corporate misconduct. And I was also able to talk a bit about what are the needs going forward, including the idea of “Information Management Plans” for agencies similar to the idea of “Data Management Plans” for all federally funded research. I was heartened to know that there is interest in that as a wider policy advocacy effort!
The full recorded meeting can be viewed here from Bauman’s adobe connect account.
Here’s more information on the EOT crawl and how you can help.
Coalitions of government, university, and public interest organizations have been working to ensure as much information as possible is preserved and accessible, amid growing concern that important and sensitive government data on climate, labor, and other issues may disappear from the web once the Trump Administration takes office.
Last Thursday, OTG and the Bauman Foundation hosted a meeting of advocates interested in preserving access to government data, and individuals involved in web harvesting efforts. James Jacobs, a government information librarian at Stanford University Library who is working on the End of Term (EOT) web harvest – a joint project between the Internet Archive, the Library of Congress, the Government Publishing Office, and several universities – spoke about the EOT crawl, and explained the various targets of the harvest, including all .gov and .mil web sites, government social media accounts, and more.
Jess Kutch discussed efforts by Coworker.org with Cornell University to preserve information related to workers’ rights and labor protections, and other meeting attendees presented some of their own projects as well. Philip Mattera explained how Good Jobs First is using its Violation Tracker database to scrape and preserve government source material related to corporate misconduct.
Micah Altman, Director of Research at MIT Libraries, presented on the need for libraries and archives to build better infrastructure for the EOT harvest and other projects – including data portals, cloud infrastructure, and technologies that enhance discoverability – so that data and other government information can be made more easily accessible to the public.
It is January and time once again to review what last year brought to libraries and the FDLP and where we should put our energies in the coming year.
In 2016 GPO issued a series of policies that express its intentions to enhance both access to and preservation of government information. While we applaud GPO’s intentions, we are dismayed because the policies are fatally flawed and will endanger preservation and access rather than protect and sustain them.
The biggest threat to long-term free public access to government information is government control of that information. Regardless of the good intentions of the current GPO administration, and regardless of the hopes of government information librarians, GPO cannot guarantee long-term free public access to government information on its own.
There are many reasons for this, but they all boil down to the simple fact that, when digital government information is controlled solely by the government that created it, it is only as secure as the next budget, the next change in policy, and the next change in administration. We have written about this repeatedly here at FGI and elsewhere for sixteen years, so we will not repeat all of those arguments (philosophical, technical, legal, economic, and professional) here today. (For those who wish to catch up, please see the FGI Library or the selected links below.)
GPO has come a long way since its first early attempts to deal with the shift from paper-and-ink publications to born-digital information. To its credit, GPO today emphasizes in its policies (including the new ones) its intent to preserve as much digital government information as it can through its own actions as government publisher, through harvesting agency content, and through partnerships with others. GPO has also wisely reversed an earlier policy and is now partnering with LOCKSS to create copies of FDsys in thirty-seven Federal depository libraries. Indeed, supporting the LOCKSS partnership which puts copies of FDSys/govinfo.gov in the hands of FDLP libraries is the most positive step GPO has taken. The LOCKSS archives are not, however, publicly available, so this is only a first step.
These are good intentions and positive steps. But we must ask: Are these steps sufficient? We must ask not only how good they will be if they succeed but how bad can the damage be if they fail? Can GPO really guarantee long-term free public access to government information?
The simple answer to these questions is: No, GPO cannot guarantee long-term free public access to digital government information. Why? First, regardless of its current intentions, GPO does not have a legislative mandate for long-term preservation. The wording of the law (44 USC 4101) does not mention long-term preservation or specify any limitations on what can be excluded or discarded or taken offline. It is limited to providing “online access to” and “an electronic storage facility for” two titles (the Congressional Record and the Federal Register). Everything else is at the discretion of the Superintendent of Documents. Previous SoDs have had completely different priorities and those bad policies could easily return. Federal agencies may request that GPO include agency information, but GPO is only obliged to do so “to the extent practicable.” This means that GPO’s commitment to long-term preservation is subject to changes in GPO administrations. Further, regardless of the intentions of even the most preservation-minded GPO administration, it can only do what Congress funds it to do and there are ongoing and repeated efforts to reduce GPO funding and privatize it.
Second, GPO does not have a legislative mandate to provide free public access. In fact, the law (44 USC 4102) explicitly authorizes GPO to charge reasonable fees for access. GPO’s current intentions are noble, but, alas, they lack the legislative and regulatory foundation necessary to provide guarantees.
So, even if GPO policies are successful in the short-term, the policies make the preservation and long-term free access ecosystem vulnerable to budget shortfalls and political influence because they are designed to consolidate GPO’s control of that information.
The shortcomings of such an approach have become more apparent to more people after the recent presidential election. Scientists, scholars, historians, news organizations, politicians, and even some government information librarians have announced their fears that government information is at risk of being altered, lost, or intentionally deleted because of drastic policy changes and leadership of the incoming presidential administration. (See a list of articles about this issue.)
To be clear, no one has suggested (yet) that information will be deleted from FDSys/govinfo.gov. And we are not predicting that the new President and his executive branch agencies will erase any valuable government information. We are simply saying that they have the authority to do so and, if we keep all our eggs in one GPO/government-agencies basket, they have the technical ability to do so. This is not a new problem. Agencies and politicians have a long history of attempting to privatize, withdraw, censor, and alter government information. Between 1981 until 1998, Anne Heanue and the fine folks at the Washington Office of the American Library Association (ALA) published an amazing series called Less Access to Less Information by and about the U.S. Government that chronicled such efforts to restrict access to government information.
What is new to this problem is the ability of a government that controls access to that information to remove access with the flick of a switch. Here at FGI we have written about this specific problem again and again and again and again and again and again and again and again.
To make matters worse, by explicit intent and inevitable effect, the new GPO policies will further consolidate GPO power and control and further weaken every individual FDLP library and the FDLP system as a whole.
What can government information librarians do in 2017? How should we focus our resources and actions?
- Monitor implementation of the Discard Policy. If successful, the new Regional Discard Policy (along with GPO’s National Plan and other policies) will, by design, further shift both access and preservation away from the FDLP into GPO. In addition, although the Policy claims that the digital surrogates it will rely on will be “complete and unaltered,” it lacks procedures to ensure this. At this point, the best we can do is hold GPO (and the Regionals that will be discarding documents) to their claims and not let the policy do even more harm than it is designed to do.
- Participate in PEGI. A loose group of individuals and organizations met last Spring and Fall to organize an effort called Preservation of Electronic Government Information (PEGI). Watch for developments and opportunities to participate in actions developed by this group.
- Support the EOT Crawl. 2016 will be the third national End of Term Crawl. The goal of the EOTs is to document the change in federal administrations by harvesting as much government information from .gov, .mil, and other domains before and after the inauguration. Follow their activities, contribute “seeds” and databases that need to be harvested, and promote their activities and visibility within your own communities.
- Support changes to OMB A-130. The Office of Management and Budget’s Circular A-130 lays out regulations for “Managing Information as a Strategic Resource.” The government policies that have done the most to affect preservation of information collected with government funding have been those that required “Data Management Plans” of those who get government research grants. These policies have prompted the creation of many new positions and programs to support data preservation in libraries. Oddly, there is no parallel regulation that requires government agencies to guarantee the preservation of the information they create. FGI recommended amending A-130 to require every government agency to have an “Information Management Plan” for the public information it acquires, assembles, creates, and disseminates. We will continue to push for this change. Watch for opportunities to support it.
- Support changing SOD 301. GPO’s Dissemination/Distribution Policy for the Federal Depository Library Program (“SOD 301”) is the policy that allows GPO to deposit digital government information with FDLP libraries, but limits the deposit to only those “products” that are the least preservable and most difficult to access. This policy actively impedes preservation and access and is the policy that GPO uses to enforce its consolidation of power and control over digital government information within the scope of the FDLP. Demand that GPO change this policy to allow FDLP libraries to select and acquire all digital government information.
- Support a truly digital FDLP. GPO’s policies since the mid-1990s have systematically minimized the participation of FDLP libraries in both preservation and access. At a time in which it is more obvious than ever that GPO needs legislatively mandated partners to guarantee long-term free public access to government information, support a truly digital, collaborative FDLP that uses new methods to support the traditional values of the FDLP.
James A. Jacobs, UCSD
James R. Jacobs, Stanford
In a recent thread on the Govdoc-l mailing list about a Congressional Publications Hub (or “pub hub” — more of the thread here), one commenter said that The American Memory project’s digital surrogates of the pre-Congressional Record publications "probably aren’t salvageable" because the TIFFs were captured at 300 ppi resolution and then converted to 2-bit bitonal black and white and that most of the text is too faded or pixelated to be accurately interpreted by optical character recognition (OCR) software. He concluded that this was "Kind of a shame."
It is indeed a "shame" that many of the American Memory Project’s "digital surrogates" probably are not salvageable. But the real shame is that we keep making the same mistakes with the same bad assumptions today that we did 10-15 years ago in regard to digitization projects.
The mistake we keep making is thinking that we’ve learned our lesson and are doing things correctly today, that our next digitizations will serve future users better than our last digitizations serve current users. We are making a series of bad assumptions.
- We assume, because today’s digitization technologies are so much better than yesterday’s technologies, that today’s digitizations will not become tomorrow’s obsolete, unsalvageable rejects.
- We assume, because we have good guidelines (like Federal Agencies Digital Guidelines Initiative (FADGI)) for digitization, that the digitizations we make today will be the "best" by conforming to the guidelines.
- We assume, because we have experience of making "bad" digitizations, that we will not make those mistakes any more and will only make "good" digitizations.
Why are these assumptions wrong?
- Yes, digitization technologies have improved a lot, but that does not mean that they will stop improving. We will, inevitably, have new digitization techniques tomorrow that we do not have today. That means that, in the future, when we look back at the digitizations we are doing today, we will once again marvel at the primitive technologies and wish we had better digitizations.
- Yes, we have good guidelines for digitization but we overlook the fact that they are just guidelines not guarantees of perfection, or even guarantees of future usability. Those guidelines offer a range of options for different starting points (e.g., different kinds of originals: color vs. B&W, images vs. text, old paper vs. new paper, etc.) and different end-purposes (e.g., page-images and OCR require different specs) and for different users and uses (e.g. searching vs reading, reading vs. computational analysis). There is no "best" digitization format. There is only a guideline for matching a given corpus with a given purpose and, particularly in mass-digitization projects, the given corpus is not uniform and the end-point purpose is either unspecified or vague. And, too often, mass-digitization projects are compelled to choose a less-than-ideal, one-size-does-not-fit-all, compromise standard in order to meet the demands of budget constraints rather than the ideals of the "best" digitization.
- Yes, we have experiences of past "bad" digitizations so that we could, theoretically, avoid making the same mistakes, but we overlook the fact that use-cases change over time, users become more sophisticated, user-technologies advance and improve. We try to avoid making past mistakes, but, in doing so, we make new mistakes. Mass digitization projects seldom "look forward" to future uses. They too often "look backward" to old models of use — to page-images and flawed OCR — because those are improvements over the past, not advances for the future. But those decisions are only "improvements" when we compare them to print — or more accurately, comparing physical access to and distribution of print vs digital access to and distribution over the Internet. When we compare those choices to future needs, they look like bad choices: page-images that are useless on higher-definition displays or smaller, hand-held devices; OCR that is inaccurate and misleading; digital text that loses the meaning imparted by layout and structure of the original presentation; digital text that lacks markup for repurposing; and digital objects that lack fine-grained markup and metadata that are necessary for accurate and precise search results finer than volume or page level. (There are good examples of digitization projects that make the right decisions, but these are mostly small, specialized projects; mass digitization projects rarely if ever make the right decisions.) Worse, we compound previous mistakes when we digitize microfilm copies of paper originals thus carrying over limitations from the last-generation technology.
So, yes, it is a shame that we have bad digitizations now. But not just in the sense of regrettable or unfortunate. More in the sense of humiliating and shameful. The real "shame" is that FDLP libraries are accepting the GPO Regional Discard policy that will result in fewer paper copies. That means fewer copies to consult when bad digitizations are inadequate, incomplete, or unusable as "surrogates"; and fewer copies to use for re-digitization when the bad digitizations fail to meet evolving requirements of users.
We could, of course, rely on the private sector (which understands the value of acquiring and building digital collections) for future access. We do this to save the expense of digitizing well and acquiring and building our own public domain digital collections. But by doing so, we do not save money in the long-run; we merely lock our libraries into the perpetual tradeoff of paying every year for subscription access or losing access.