Home » Posts tagged 'Data loss'
Tag Archives: Data loss
The government information crisis is bigger than you think it is
[This post is adapted from our forthcoming book, Preserving Government Information: Past, Present, and Future.]
Today we want to clarify something important about preserving government information. There is a difference between the government changing a policy and the government erasing information, but the line between those two has blurred in the digital age.
When a new president is inaugurated, one expects new policies. The number of changes and the speed of change may vary for different administrations, but we expect that every administration will be different in some ways from its predecessor. After all, that is part of the reason we have elections. Also, information that the government publishes is updated all the time, not just when administrations change. Laws and regulations are added and amended and rescinded, new economic and environmental and census data are collected and published, government recommendations to the public (like the Department of Agriculture’s “food pyramid” guidance) are revised.
Changes in government information are normal in a democracy.
Because change is normal, it is essential to preserve government information – even “non-current” and “out of date” information – in order to document those changes. This is not a new idea, but a long-accepted principle of democracy. Citizens need a record of what a government’s stated values were and when they changed, what actions it took and when it took them, what data it collected and generated at specific points in time, and so forth. It is important to preserve even information that later proves to be inaccurate in order to document what the government knew and when it knew it.
Because published government information is the evidence for a democracy, its preservation is essential.
In the era in which government information was published in paper formats, preservation of that information relied on libraries. The information was distributed to FDLP libraries based on the needs of the communities that those libraries served. Beginning in 1962, Regional FDLs received and retained all the paper publications in the FDLP system. When new information superseded or replaced old information, the old information was not erased or discarded; it was preserved in Regional FDLs and in every FDL whose community valued that older information. In the print era, it was taken for granted that, once government information was released to the public, it would not be withdrawn or altered or lost.1
In the digital age, government publishing has shifted from the distribution of unalterable printed books to digital posts on government websites. Such digital publications can be moved, altered, and withdrawn at the flick of a switch. Publishing agencies are not required to preserve their own information, nor to provide free access to it.
Some digital government information is actively preserved by GPO, NARA, and the Library of Congress. Some government-collected data are preserved by law or by tradition. But the laws that allow this are weak and government preservation of government information suffers from large gaps. Non-government projects (notably the Internet Archive and the End-of-Term Archive) use web harvesting to attempt to acquire and store government information, but these projects are, by their nature, incomplete and their long-term guarantees of access are fragile. As a result of all this, the public can no longer assume that any given piece of government information will not be withdrawn or altered or lost.
The early actions of the incoming Trump administration (as well as the actions of the first Trump administration) have brought the vulnerability of digital information to the public’s attention (see our previous post “Federal information scrubbing has begun”) and the public is rightfully worried. That vulnerability is, however, not limited to this administration. Digital government information was being lost before President Trump.
The current crisis of imminent loss of information exists not only because government information is being changed, but because it is being erased. The erasure is possible because of the gaps in the current preservation infrastructure.
The scale of loss and alteration of information under Trump may prove to be unprecedented and certainly requires immediate short-term action. But librarians and archivists and citizens should use this current crisis to demand more than short-term solutions. A new distributed digital preservation infrastructure is needed for digital government information.
James A. Jacobs
James R. Jacobs
- Even when information was withdrawn for some reason, there was a record of the withdrawals. (See this spreadsheet listing withdrawn documents 1981 – 2018, collated from GPO’s no-longer published “Administrative Notes” newsletter.) ↵
Terabytes of Enron data have quietly gone missing from the Department of Energy
This is yet another disturbing example of data loss documented by our friends at MuckRock. Evidently, a large amount of data from the Federal Energy Regulatory Commission (FERC) having to do with the infamous Enron Corporation has gone missing and even FERC staff do not know where it went.
How many examples will we need to post before libraries and archives get with the program and try and figure out ways to collect, archive, preserve and give access to born-digital information posted on .gov sites? And how does this particular example of data loss NOT happen again in the brave new world of open government data (aka H.R. 4174 the Foundations for Evidence-Based Policymaking Act of 2018 which was recently signed into law and described by Alex Howard)? If you’re as concerned as I am, you’ll contact FERC and request a copy of their data for your library.
Government investigations into California’s electricity shortage, ultimately determined to be caused by intentional market manipulations and capped retail electricity prices by the now infamous Enron Corporation, resulted in terabytes of information being collected by the Federal Energy Regulatory Commission. This included several extremely large databases, some of which had nearly 200 million rows of data, including Enron’s bidding and price processes, their trading and risk management systems, emails, audio recordings, and nearly 100,000 additional documents. That information has quietly disappeared, and not even its custodians seem to know why…
…While terabytes of information has disappeared, up to 4,516 documents remain available through a pair of predefined searches of FERC’s eLibrary. While FERC claims that they, not Lockheed Martin or CACI, do offer a trio of Enron datasets on CD, FERC has not responded to repeated requests for these datasets sent over the past two months.
via Terabytes of Enron data have quietly gone missing from the Department of Energy • MuckRock.
The EPA’s Website after a year of climate change censorship
Here’s a good article from Time Magazine — “Here’s What the EPA’s Website Looks Like After a Year of Climate Change Censorship” — which accurately reports how the Trump Administration and EPA Administrator Scott Pruitt have changed, skewed or deleted government information from the EPA Website for crass political purposes. For more in-depth analysis of the issue of information scrubbing from federal websites, one should look to the work of the Environmental Data and Governance Initiative (EDGI) and especially their reports: “Changing the Digital Climate” and “The EPA Under Siege”.
According to former government officials and EPA staffers, the level of scrutiny is without precedent. In the hands of an administration that has eschewed facts for their alternative cousins, the agency’s site is increasingly unmoored from its scientific core.
“In my experience, new administrations might come in and change the appearance of an agency website or the way they present information, but this is an unprecedented attempt to delete or bury credible scientific information they find politically inconvenient,” Heather Zichal, a senior fellow at the Atlantic Council’s Global Energy Center, and previously President Barack Obama’s top White House adviser on energy and climate change, tells TIME.
The EPA’s site is now riddled with missing links, redirecting pages and buried information. Over the past year, terms like “fossil fuels”, “greenhouse gases” and “global warming” have been excised. Even the term “science” is no longer safe.
Christine Todd Whitman, the EPA Administrator under George W. Bush, says the overhaul is “to such an extreme degree that [it] undermines the credibility of the site”…
Of the more than 25,000 web pages tracked by the Environmental Data and Governance Initiative (EDGI) since Trump’s election, they say the EPA’s have been hit hardest. One section, which provided local communities with resources for combating climate change, disappeared for months only to resurface heavily redacted, including just 175 of its 380 pages.
via The EPA’s Website After a Year of Climate Change Censorship | Time.
Rev up your FOIA engines! Trump Admin’s first FBI Crime Report Missing 70% of data
This is definitely bad. Government data collection has always been political and driven by legislative requirements. The FBI has published uniform crime reports since 1930. but FiveThrirtyEight’s report about missing data in the 2016 FBI Crime Report is a new and troubling turn of events. The Trump administration is just ignoring long standing data collection and publication for blatantly political reasons. According to the report, approximately 70% of the tables from the FBI’s most important crime report have been taken offline. For example, there were 51 tables of arrest data in the 2015 report, and there are only seven in the 2016 report.
The Inter-university Consortium for Political and Social Research (ICPSR) curates and archives this data for the Bureau of Justice Statistics (BJS). In fact, I’m told it’s usually the most downloaded data from their site. But they can’t collect and archive what’s not there. Hopefully someone will FOIA the FBI for the missing data, but get ready to have to explain to our users about data gaps across the US government from 2016 – 2020. 😐
These removals mean that there is less data available concerning a perennial focus of Trump and his attorney general, Jeff Sessions: violent crime. Trump and Sessions have frequently talked about MS-13, a gang with Salvadoran roots, as a looming problem in the country. MS-13 has been cited in 37 Department of Justice press releases and speeches in 2017, compared to only nine mentions in 2016 and five in 2015. Sessions gave a speech on the organization last month, while Trump gave a speech on Long Island in July, saying the gang had “transformed peaceful parks and beautiful quiet neighborhoods into bloodstained killing fields. They’re animals.” Trump also frequently refers to gun violence in Chicago, and at the beginning of his presidency, he established a Victims of Immigration Crime Engagement Office, which aims to study and promote awareness of crimes committed by immigrants who entered the country illegally.
Although the removal of the tables makes it more difficult to get information on one of the White House’s most prominent causes, it also seems like part of a trend in the Trump administration: the suppression of government data and an unwillingness to share information with the press and public. About two weeks after Hurricane Maria devastated Puerto Rico, the FEMA website stopped displaying key metrics relating to island residents’ access to drinkable water and electricity. The data was later restored. The early days of the Trump administration were marked by reports that federal agency employees had been instructed not to talk to the press and to restrict social media postings.
Since Trump took office, government watchdog groups have been concerned about access to government data and maintaining the integrity of that data.
via The First FBI Crime Report Issued Under Trump Is Missing A Ton Of Info | FiveThirtyEight.
Latest Comments