Home » Posts tagged 'open formats'

Tag Archives: open formats

Our mission

Free Government Information (FGI) is a place for initiating dialogue and building consensus among the various players (libraries, government agencies, non-profit organizations, researchers, journalists, etc.) who have a stake in the preservation of and perpetual free access to government information. FGI promotes free government information through collaboration, education, advocacy and research.

Another digital preservation problem: Microsoft lacks specifications for its own old formats

Chris Rusbridge, retired Director of the UK Digital Curation Centre (DCC), sent an open letter to Tony Hey of Microsoft asking that they publish the specifications for older file formats. He has received a reply:

But the good new is that Microsoft is willing to work on the problem! The response from Microsoft continues:

  • We can look into creating new licensing options including virtual machine images of older operating systems and old Office software images licensed for the sole purpose of rendering and/or converting legacy files.
  • One approach we could consider is for Microsoft to participate in a “crowd source” project working with archivists to create a public spec of these old file formats.

Of course, this is a closing-the-barn-door-after-the-horse-is-gone solution, but such kludgy solutions are necessary when born-digital information is produced in proprietary formats rather than open formats — and when libraries accept these formats rather than insists on preservable digital objects.

Crazy born digital content

One of the best things that could happen for digital preservation is for producers of digital content to understand that they need to produce preservable content. If producers and publishers created digital content in neutral, preservable formats, we would not have to spin our wheels with the Sisyphean task of constantly trying to fix un-preservable content with techniques such as emulation and format migration.

For publishers (and of course this includes government agencies) to create preservable content, they would have to understand that their essential role in the information lifecyle includes creating information that is preservable — not just instantaneously deliverable in the short-term. Although there are efforts to develop this understanding and to create appropriate formats and processes, most born-digital information seems to be generated with only the most short-term utility in mind. A lot of born digital content is badly-formed and uses non-standard, proprietary, and closed formats.

All that brings me to this example of what looks to be shoddy content creation.

What is interesting about this publication? It has no periods at the ends of sentences. Really! No punctuation at all at the end of sentences, in fact. I searched for them and only found a few and those were all on page 51 in bullet-points.

The publication does use periods in “U.S.” and in URLs and in things like “version 3.0” but never at the end of sentences. (The appendices have periods, but, presumably, they were prepared separately and just pasted in.)

Here is a sample from the Introduction (in this example, even “U.S.” becomes “U S”):

In June 2010, the Administration, through the IPEC‘s Joint Strategic Plan, emphasized the protection and enforcement of U S intellectual property rights These rights drive the economy, create jobs for American workers, promote innovation, and secure America’s position as the world’s leader for creativity and ingenuity The 2011 Annual Report provides an illustration of the coordinated efforts that the U S Government is undertaking to address the challenges of enforcing intellectual property of U S rightholders abroad, securing supply chains, pursuing sources of counterfeit and pirated goods, and meeting the challenges posed by emerging criminal trends such as the online sales of counterfeit pharmaceuticals, economic espionage, and targeted theft of trade secrets

This is probably just some odd, unique production mistake. But that is the point: When an agency cannot even take the time to make sure its born digital text has periods at the end of sentences, how can we expect agencies to produce digital content that conforms to preservation technology requirements? There is a lot of work to be done before we can ensure the public that we can preserve digital government information!

(Hat tip to Mike Masnick who, in a highly critical review of the report, says, with tongue in cheek, that the reason the report “seems to do away with the grammatical icon known as ‘the period’ at the end of sentences” might be that the Intellectual Property Enforcement Coordinator found it “too expensive to license.”)

Still no public access to the Constitution Annotated

Sunlight reports that “Seven months ago, the order was given for the legal treatise, known as the Constitution Annotated (or CONAN), to be published online, but so far without result.”

Speaker Boehner and Majority Leader Cantor on Legislative Data Release

Speaker Boehner and Majority Leader Cantor on Legislative Data Release, by John Wonderlich, Sunlight Foundation, (April 29, 2011).

Speaker Boehner and Majority Leader Cantor today sent a letter to the Clerk of the House calling for better access to the House’s electronic data.

The letter asks that the House release its “legislative data” in “machine-readable formats” and establish standards for the House and its committees to include open data formats such as XML.

Big week for open access to government information

You almost certainly have seen at least one story in the past week about “Open Government” and the release of new data. Reporters have slowly been picking up on a massive release of information spurred by President Obama’s Open Government Directive. (See: New ‘high value’ data posted to data.gov.)

Below are a few announcements and stories that you may find of interest.

But, in addition to all the data released this week was a new policy that will, potentially, affect usability of government information in the future. In the December 8, 2009 memo (Open Government Directive [pdf] Memorandum For The Heads Of Executive Departments And Agencies, M-10-06, Peter R. Orszag Director, Office of Management and Budget) that implemented the President’s Open Government Initiative, OMB specifically mandates open file formats.

To increase accountability, promote informed participation by the public, and create economic opportunity, each agency shall take prompt steps to expand access to information by making it available online in open formats.

And, OMB defines open formats as:

An open format is one that is platform independent, machine readable, and made available to the public without restrictions that would impede the re-use of that information.

This is big news for two reasons. First, it should lead the government away from proprietary formats which are hard to preserve, hard to re-use, and typically require either proprietary software or only operate on specific platforms, or both. Think: documents in ODF format rather than Microsoft Word. Second, the directive mandates formats “without restrictions [on] re-use.” Think: no DRM (and no licensing restrictions!).

As the ODF Alliance noted back in December when the OMB memo was released, much of government information is still released in “documents” which are not ideal for re-use of information even when the document formats are open. But, this is still an important, essential step:

Like it or not, government bureaucracies are still very document-centric and there is a lot of government “data” stored in documents, the challenge being how to provide easy access to this data.

…With today’s announcement, the Obama Administration has taken an important step on open government data and acknowledged the role open formats play in this regard. For document-centric governments, an open document format remains essential to delivering on this promise.
Obama Administration To Require Government Agencies to Make Information Available in Open Formats. ODF Alliance, December 08, 2009.

Open formats will help libraries that want to preserve digital government information by making it easier and less costly to do so.

Here are some of the announcements about releases of new government data: