open formats
Big week for open access to government information
Submitted by jajacobs on Mon, 2010-01-25 08:52.You almost certainly have seen at least one story in the past week about "Open Government" and the release of new data. Reporters have slowly been picking up on a massive release of information spurred by President Obama's Open Government Directive. (See: New 'high value' data posted to data.gov.)
Below are a few announcements and stories that you may find of interest.
But, in addition to all the data released this week was a new policy that will, potentially, affect usability of government information in the future. In the December 8, 2009 memo (Open Government Directive [pdf] Memorandum For The Heads Of Executive Departments And Agencies, M-10-06, Peter R. Orszag Director, Office of Management and Budget) that implemented the President's Open Government Initiative, OMB specifically mandates open file formats.
To increase accountability, promote informed participation by the public, and create economic opportunity, each agency shall take prompt steps to expand access to information by making it available online in open formats.
And, OMB defines open formats as:
An open format is one that is platform independent, machine readable, and made available to the public without restrictions that would impede the re-use of that information.
This is big news for two reasons. First, it should lead the government away from proprietary formats which are hard to preserve, hard to re-use, and typically require either proprietary software or only operate on specific platforms, or both. Think: documents in ODF format rather than Microsoft Word. Second, the directive mandates formats "without restrictions [on] re-use." Think: no DRM (and no licensing restrictions!).
As the ODF Alliance noted back in December when the OMB memo was released, much of government information is still released in "documents" which are not ideal for re-use of information even when the document formats are open. But, this is still an important, essential step:
Like it or not, government bureaucracies are still very document-centric and there is a lot of government “data” stored in documents, the challenge being how to provide easy access to this data.
...With today's announcement, the Obama Administration has taken an important step on open government data and acknowledged the role open formats play in this regard. For document-centric governments, an open document format remains essential to delivering on this promise.
-- Obama Administration To Require Government Agencies to Make Information Available in Open Formats. ODF Alliance, December 08, 2009.
Open formats will help libraries that want to preserve digital government information by making it easier and less costly to do so.
Here are some of the announcements about releases of new government data:
- Open Government Initiative White House.
- Another Milestone In Making Government More Accessible and Accountable. White House.
- U.S. Government, OSTP, Open New Troves of Data to the Public
- Justice Department Announces Release of New Information Online as Part of President’s Open Government Initiative
- How "Open Gov" Datasets Affect Parents and Consumers. White House.
- Open Government Initiative. White House.
- jajacobs's blog
- Add new comment
- 507 reads
The good and the bad of PDFs
Submitted by jajacobs on Wed, 2009-11-04 05:48.Following up on Can Proprietary Formats make Government More Open? :
Josh Tauberer of govtrack.us, points us to The good and the bad of PDFs (OpenGovData.org wiki) in which Kevin Lyons, who works for the Nebraska legislature, wrote up some guidelines for PDF in government.
Lyons reminds us that not all PDF files are equal and he enumerates some of the advantages and disadvantages of encapsulating government information in PDFs.
Given how popular the PDF standard itself is, it shouldn't be a surprise that the term PDF actually covers a wide variety of different types of files. While all PDF files fit the PDF standard, there are several different subtypes of PDF that are helpful in the government world.
- jajacobs's blog
- Add new comment
- 575 reads
FEC makes data available in multiple formts
Submitted by jajacobs on Thu, 2009-10-29 12:46.Disclosure Data Catalog, Federal Election Commission
"Each of the files listed here can be downloaded in either csv or xml formats. Each also has a metadata page that describes the information included and the structure of the file itself. There is a pdf version of each file if you need to print the information. You can also subscribe to RSS feeds for each of the files so you're notified whenever new data is available or a change is made."
Also see the Commission's Disclosure Data Blog where the FEC will post information about the files and its future plans. And: they say that "you can get help with any questions about the data we're providing here."
- jajacobs's blog
- Add new comment
- 754 reads
Can Proprietary Formats make Government More Open?
Submitted by jajacobs on Thu, 2009-10-29 11:41.Three interesting comments!
- Adobe is Bad for Open Government, by Clay Johnson, Sunlight Labs"Here at Sunlight we want the government to STOP publishing bills, and data in PDFs and Flash and start publish them in open, machine readable formats like XML and XSLT. What's most frustrating is, Government seems to transform documents that are in XML into PDF to release them to the public, thinking that that's a good thing for citizens. Government: We can turn XML into PDFs. We can't turn PDFs into XML."
-
Adobe is Bad for Open Government, comment, by kathy and ernie brandon, Open House Project mailing list, (10/28/2009).
"Once private ownership is established in something like this we aren't getting it out again, not without another draining fight. Most people start out saying "I don't really have a problem with this" until I take them down the road this leads - private ownership means 'trade secrets', 'trade secrets' is the opposite of transparency."
- Adobe is Bad for Open Government, comment, by Josh Tauberer, Open House Project mailing list, (10/28/2009).
"Government has a responsibility to issue print-ready documents in many cases, so PDFs are an important part of an open government. I would rather have PDFs over nothing electronic, over electronic image files, and over other formats suitable for printing --- PDF is an open standard (albeit proprietary).
"I wouldn't want to get rid of PDFs. Docs need to either be published in a second format, or --- more interesting --- we could get Adobe to revise the PDF format so that it can encode the document in structured form as well. That means govt publishes a single file that makes everyone happy."
- jajacobs's blog
- 2 comments
- 1389 reads
W3C Draft: Publishing Open Government Data
Submitted by PGarvin on Sun, 2009-09-13 13:13.The World Wide Web Consortium (W3C) has posted a first draft of their eGovernment Working Group's guidelines for governments putting data on the Web, Publishing Open Government Data. (And hey! It's not in PDF format.)
The W3C posted this notice on their website on September 9:
Today, the World Wide Web Consortium (W3C) announces a draft work plan for the eGovernment Interest Group, whose mission is to document, advocate, coordinate and communicate best practices, solutions and approaches to improve the interface between citizens and government through effective use of Web standards. The draft charter, in review by the W3C community until the end of September, focuses on two topics: Open Government Data (OGD), and Education and Outreach. In line with its anticipated focus on Open Government Data, the group also announces today a first draft of Publishing Open Government Data, which provides step-by-step guidelines for putting government data on the Web. Sharing data according to these guidelines enables greater transparency; delivers more efficient public services; and encourages greater public and commercial use and re-use of government information. Learn more about the W3C eGovernment Activity.
[hat tip DB/eCitizen]
- PGarvin's blog
- Add new comment
- 931 reads
Need for Open Standards Video on the Web
Submitted by jajacobs on Sun, 2009-09-06 09:40.An article in Technology Review reports on the current state of video on the web, its drawbacks and limitations, and what the future may bring.
- OurTube, By David Talbot, Technology Review (September/October 2009). (3400 words)
The article summarizes the story of Michael Dale and Abram Stern who wanted to use speeches in the U.S. Congress and discovered that they could not get the videos. "There was no online repository for download." Their efforts led to the development of http://metavid.org/ which offered legislative videos for free download, a copyright battle with C-SPAN, and a change in C-SPAN policy to make some of its videos freely available for some uses. (See also Who Owns What C-Span Airs?, and C-SPAN provides more access, but wants to retain control, etc..)
Dale and Stern's difficulties offer one small glimpse into a larger problem with online video: unlike much of the rest of the Web, it is accessed through a collection of closed, proprietary formats, such as Adobe's Flash and Microsoft's Silverlight. (Try a video search engine such as Blinkx; you'll get plenty of videos pulled from around the Web, but to watch them you may need to download or update software.) Certain websites, led by YouTube, convert uploaded content to Flash for ease of viewing. Today, however, a growing number of technologists and video artists want to see Web video adopt the kind of open standards that fueled the growth of the Web at large. HTML, the markup language that describes Web pages; JavaScript, the programming language that allows forms, graphics, and various special effects to be added to them; JPEG, the standard for images--all these building blocks of the Web can be used by anyone, without paying fees or asking permission. This openness was indispensable to the creation and then the explosion of blogs, search engines, social networks, and more.
Talbot quotes Chris Blizzard, director of technical evangelism at Mozilla, as he explains why open standards are so important:
Open standards create low friction. Low friction creates innovation. Innovation makes people want to pick it up and use it. But it's not something where we can guess what 'it' is. We just create the environment that lets 'it' emerge.
Too much government information (not just video) suffers from being locked in to proprietary formats and proprietary means of delivering that information. (See: What is wrong with this picture? and lots more at the open formats tag here at FGI.)
Blizzard says that we need to take "video out of the plug-in prison." Talbot says, "The goal isn't to make any one application possible but to bring about the next Internet revolution--one whose specific form is hard to foresee, except that it's likely to be televised."
- jajacobs's blog
- Add new comment
- 499 reads
Job losses in recessions visualized
Submitted by jajacobs on Mon, 2009-02-16 08:48.Recently this graphic was posted on the "The Gavel" blog of the Speaker of the House (What 3.6 Million Jobs Lost Over 13 Months Looks Like, by "Karina," February 6th, 2009). It shows the number of jobs lost (and recovered) in the recessions of 1990, 2001, and 2008. By juxtaposing the three time periods over each other starting with the peak job month and showing employment change by month, it gives a startling comparison that highlights the severity of the current situation. It also implies that we will have a long time to wait before we reach our previous peak month.
I was curious about this graph and did a little follow up that I share below. Data librarians may find this a bit tedious, but for those who have never used raw data, it may be useful as an illustration of the difference between "data" (the raw numbers that you put into statistical software) and "statistics" (the human-viewable tables and graphs that we see in publications).
Unfortunately, as is often the case with statistical information, the source given for the graphic is incomplete: simply "Bureau of Labor Statistics." I could not find the graphic itself on the bls.gov site, so I assume that the chart was constructed from BLS data, specifically, the Current Population Survey or the Current Employment Statistics Survey. These two surveys count employment differently -- one is a survey of individuals and the other is a survey of employers.
There is a similar, but not identical, chart ("Percent change in total nonfarm employment, from beginning of recession) in the January 2009 (released February 6, 2009) Current Employment Statistics Highlights, Monthly (Bureau of Labor Statistics), so my guess is that someone at the Speaker's office built the chart from the raw CES data.
Just out of curiosity, I went to the CES "Most Requested Statistics webpage and downloaded "Total Nonfarm Employment - CES0000000001" for 1990 through the end of 2008. Raw data suitable for analysis even look "raw," not even like a statistical table:
1990,Jan,109151 1990,Feb,109396 1990,Mar,109611 1990,Apr,109651 1990,May,109800 1990,Jun,109817 1990,Jul,109775 1990,Aug,109567 1990,Sep,109485 1990,Oct,109324 1990,Nov,109180 1990,Dec,109120 1991,Jan,109001 1991,Feb,108695 1991,Mar,108535 1991,Apr,108324 1991,May,108196 1991,Jun,108283 ...
Of course, it is relatively easy, using statistical software, to construct tables and graphs from raw data. Here, for example, is a published statistical table with essentially the same raw information (but from CPS, not CES) that I downloaded. (See the full table from Employment from the BLS household and payroll surveys: summary of recent trends, February 6, 2009).
By using the raw data to create a graph, one can tell a story that has more impact than just a table of numbers. It is relatively easy to get these data into statistical software. I used Excel and Stata to create a small time-series data file. I organized it by month (from month "1" to month "48") with each row of the data file having data for 3 recessions. The first row has data for the first month of the three recessions, the second row has data for the second month, etc. The CES data has employment totals in millions. For example, the employment for 2008:
138152 138080 137936 137814 137654 137517 137356 137228 137053 136732 136352 135755 135178
I had to compute a new variable for each recession: the cumulative number of jobs lost. So, for example, 2008:
138152 0 138080 -72 137936 -216 137814 -338 137654 -498 137517 -635 137356 -796 137228 -924 137053 -1099 136732 -1420 136352 -1800 135755 -2397 135178 -2974
The first 12 months with all three recessions (v1, v2, v3) and the computed variables (1990, 2001, 2008) look like this:
month v1 1990 v2 2001 v3 2008 1 109817 0 132530 0 138152 0 2 109775 -42 132500 -30 138080 -72 3 109567 -250 132219 -311 137936 -216 4 109485 -332 132175 -355 137814 -338 5 109324 -493 132047 -483 137654 -498 6 109180 -637 131922 -608 137517 -635 7 109120 -697 131762 -768 137356 -796 8 109001 -816 131518 -1012 137228 -924 9 108695 -1122 131193 -1337 137053 -1099 10 108535 -1282 130901 -1629 136732 -1420 11 108324 -1493 130723 -1807 136352 -1800 12 108196 -1621 130591 -1939 135755 -2397
Here is a complete tab-separated-values version of the data file I constructed. Then I used Stata to build a graph and it looks very much like the one at the Speaker's Blog.
Of course, when one tells one story, one leaves out other stories. This graphic doesn't show that the starting points of the recessions were different:
1990: 109 million
2001: 132 million
2008: 138 million
Open re-usable government information
One could use the raw data to tell a lot of different stories and analyze the data in many different ways. And that brings me to the connection between all this and why we need to be sure that government information is not just "free as in beer" but also "free as in open."
It is important for statistical agencies to publish statistics to help us understand their raw data. But, it is also essential that they provide us with the raw data so that we can better understand their statistics and do our own analyses. Most of the statistical agencies of the U.S. government do an excellent job of making their raw data easily available. In fact, the rest of government would do well to use statistical agencies as a model for instantiating their information in usable and re-usable formats (in addition to any publishing and presentation of their data/information) so that the information, whether it is text or images or video or sound or numbers, can be used, reused, analyzed, stored, and preserved.
- jajacobs's blog
- 4 comments
- 2807 reads
Recommendations on Open Government from computer scientists and computing practitioners
Submitted by jajacobs on Thu, 2009-02-05 13:24.The Washington policy committee of the Association for Computing Machinery, the professional association that represents computer scientists and computing practitioners, released its Policy Recommendations on Open Government today.
The really important thing about these recommendations is that they reflect an understanding that there is a difference between "presenting" information, usually in the form of a web site, and providing the underlying data that anyone can then "present" or use or re-use. As David Robinson said in an earlier proposal, "[I]s a government monopoly on 'presentations' of the data the best way...? Probably not. If Congress orders the federal bureaucracy to provide a web site for end users, then we will all have to live with the one web site they cook up" (The (Ironic) Best Way to Make the Bailout Transparent, By David Robinson, Freedom to Tinker, January 27th, 2009)
Among the recommendations:
- Data published by the government should be in formats and approaches that promote analysis and reuse of that data.
- Citizens should be able to download complete datasets of regulatory, legislative or other information, or appropriately chosen subsets of that information, when it is published by government.
- Citizens should be able to directly access government-published datasets using standard methods such as queries via an API (Application Programming Interface).
See also: New USACM Poilcy Recommendations on Open Government, By David Robinson, Freedom to Tinker, February 5th, 2009.
Today's statement puts the weight of America's computing professionals behind the push for machine-readable government data.
- jajacobs's blog
- Add new comment
- 721 reads
Open formats
Submitted by jajacobs on Wed, 2009-01-28 07:54.Over at the Open House Project mailing list there has been a long discussion recently of how to make sure we have long-term, open, free, public access to Congressional and Presidential videos. (See also: Congress on YouTube and Should Obama ditch YouTube?) Today, Clay Shirky posted a clear and concise and persuasive summary of why we need open formats. (Re: Open video, Clay Shirky, Jan. 28, 2009.)
I was the head of the technical work on the Library of Congress's digitial preservation network (NDIIPP) in the early part of the decade, and when we undertook a survey of threats to long-term preservation of digital material, we assumed, dumb bunnies that we were, that we were dealing with issues of storage, redundancy and migration costs.
It turned out, though, that the biggest threat to long-term preservation isn't hardware or cost, it's format; proprietary formats don't just happen to hamper unexpected future uses, that's what they are _designed_ to do. If you had an ASCII file and a WordStar file, both from 1987, and had to open and read each one as quickly as you could, what would the difference in elapsed seconds be?
There has been some back and forth on this list about possible threats to putting USG-created materials into the hands of commercial entities. I'm on record as worrying about that precisely because path dependence on commercial motivation can easily come a cropper, and I recently had an experience that seemed to highlight that risk.
Last Friday, I heard a talk at the Smithsonian from the awesome George Oates, who was instrumental in getting the Flickr Image Commons going. It was an inspiring talk, and when I went to introduce myself afterwards and we were talking about what might be next, she told me Yahoo had cut her job in the December round of layoffs.
Of course. Why shouldn't Yahoo have done that -- the Commons may have been good PR for them in 2008, but its hard to argue that the cost is going to be recouped in revenues somehow. When we are talking about public goods, we are talking about goods that don't flourish in the commercial market *by definition.* We need to make that part of all the systemic thinking we do about open data; Ogg for video would be a great addition to the arsenal.
There have been some excellent discussions going on over at the Open House Project Google Group lately. If you don't usually monitor that list, you might want to browse through the last couple of weeks posts.
- jajacobs's blog
- Add new comment
- 557 reads
SEC mandates open standard for financial records
Submitted by jajacobs on Mon, 2008-12-22 10:56.SEC mandates open standard for financial records, By Gautham Nagesh, NextGov.com, (12/19/2008).
The Securities and Exchange Commission passed a rule on Thursday requiring public companies and mutual funds to use a standard electronic format to publish financial information, bringing more transparency, and presumably oversight, to corporate balance sheets and earnings.
SEC commissioners voted 4-1 to require companies to use extensible business reporting language, or XBRL, when filing financial disclosures. The commission also required them to publish information on the SEC's Web site as well as their own.
XBRL is based on software programming called extensible mark-up language, or XML, which uses tags attached to documents so users can easily find and share electronic data....
- jajacobs's blog
- Add new comment
- 870 reads
Support OpenDocument
Submitted by jajacobs on Fri, 2008-11-14 14:03.OpenDocument campaign page gets a revamp: Support OpenDocument. This is where you can find out what it is, what you can do, who else is using it and more.
The OpenDocument format (ODF) is a format for electronic office documents, such as spreadsheets, charts, presentations and word-processing documents. The OpenDocument format is supported by free software applications such as OpenOffice.org, AbiWord and KOffice.
- jajacobs's blog
- Add new comment
- 547 reads
Senate Emergency Economic Stabilization Act of 2008
Submitted by jajacobs on Wed, 2008-10-01 14:16.Sunlight Labs is posting the new economic bill at PublicMarkup.org.
Gabriela Schneider of the Sunlight Foundation says:
In order to facilitate a conversation about the specifics of the Senate's new proposed bailout legislation (included within a 451-page document filled with additional provisions), we're parsing the bill's text, and have completed the first part, "Division A: the Emergency Economic Stabilization Act of 2008," posted on PublicMarkup.org at http://www.publicmarkup.org/bill/senate-emergency-economic-stabilization... for your review and commentary.
And the PublicMarkup site says:
Unfortunately, because Congress has yet to enter the 21st century by publishing legislative data -- instead of PDF files of bills -- we cannot present the entire bill section by section, for you to review and comment on. (However, if you notice, the file name on the top of the PDF they published shows that the bill originally was an XML file.)
- jajacobs's blog
- Add new comment
- 1536 reads
Why we need open documents, a real-life example
Submitted by jajacobs on Mon, 2008-09-29 16:35.Here is an excellent example of why we need government information distributed in truly-open, re-usable formats.
The very important and much discussed $700 billion economic "bail-out" bill, the "Emergency Economic Stabilization Act of 2008" (H.R. 3997), has gone through at least four versions between September 25th and today.
Josh Tauberer, the tireless open-government advocate and programmer who developed GovTrack.us, has adapted a bill comparison tool that he developed for GovTrack and posted the resulting analysis of changes between the different versions here:
- Special Feature: Economic Stimulus Bill Text Tracker, GovTrack. Sept 29, 2008.
Josh says that he had to use the PDF versions of these documents because that it all that is publicly available. He goes on to identify the problem of relying on PDF documents for re-use:
It's not very pretty because while House bill writers have been posting the PDFs, PDFs don't make it easy to make comparisons. They *are* composing the bills in XML, and if they made those available we the public would have an easier time. Maybe we wouldn't complain to our reps so much either because we could actually understand what is going on better! [source]
...Why is this so ugly? This is based on converting the PDF drafts into text, which doesn't always work right. If you think the public should be able to do this better, tell your representative to support The Open House Project report recommendations. [source]
What we need is not just easy-to-read human-consumable documents (e.g., PDF, HTML, word-processing documents), but also machine-processable documents that can be analyzed, parsed, re-formatted, and re-used (e.g., XML).
- jajacobs's blog
- 1 comment
- 954 reads
Microsoft Loses Vote on File Standards
Submitted by jajacobs on Tue, 2007-09-04 15:23.Microsoft Loses Vote on File Standards by The Associated Press, New York Times (September 4, 2007) "Microsoft Corp. has failed in a first step to win enough support to make the data format behind its flagship Office software a global standard, the International Standards Organization said Tuesday."
There is an ongoing battle between the proprietary "open" format that Microsoft is proposing and the Open Document Format standard created by open source developers. The result of the battle will help determine how governments and libraries will archive digital documents.
See also: Open Document Standards (2007-08-10)
- jajacobs's blog
- Add new comment
- 824 reads
Open Document Standards
Submitted by jajacobs on Fri, 2007-08-10 09:25.Open standards for document formats have been in the news recently.
- Massachusetts Falls to OOXML as ITD Punts, ConsortiumInfo.org August 01 2007
- Microsoft's Open Format Criticized In China. by Winter Casey, National Journal's Technology Daily InternationalRoundup, August 8, 2007 [subscription required].
A very useful report from JISC (the Joint Information Systems Committee, an independent advisory body in the UK that provides leadership in the innovative use of Information and Communications Technology to support education and research) helps sort out the issues and says there is "an urgent need for co-ordinated, strategically informed action over the next five years, if the higher education community is to facilitate a cost effective approach to the switch to XML-based office document formats." This is important for government information specialists as well, whether in higher education or not.
- XML-based Office Document Standards (TSW0702) by Walter Ditch, JISC, August 2007.
The tacit acceptance of proprietary office file formats as a way of achieving interoperability is becoming less acceptable. Government agencies, in particular, are becoming increasingly conscious of the need to provide easy access to electronic documents to all stakeholders, while not requiring them to purchase a particular software product in order to view or edit these documents. The requirement to provide long term availability and archiving of documents is also encouraging a move away from proprietary file formats. JISC and the wider HE/FE community, as part of the public sector, will be required to address these issues with respect to how they deal with the publication of electronic documents, and the internal/external transference of document files.
The drive to move to open file formats has been ongoing for several years, but as Microsoft, the market leader in office document formats, has arguably been slow to move from its proprietary, binary file formats, the net effect, for most people, has been the continued use of proprietary, de facto standards. However, due to pressure to move towards more open document file formats, Microsoft has implemented a staged introduction of XML-based formats for its Office suite. With the release of Office 2007, the transition has been formalised, and existing users who upgrade to Office 2007 will, essentially, be upgrading to a form of XML.
However, this has not been without considerable controversy, and recent developments related to XML-based office document standards have reached a watershed. This TechWatch report explains these issues and some of the standards involved. It proposes that although the UK higher and further education sector has, for a long time, understood the interoperability benefits of open standards, it has been slow to translate this into easily understandable guidelines for implementation at the level of everyday applications such as office document formats. As far as education is concerned, the use of modifiable office document formats has now reached a crucial stage. There is an urgent need for co-ordinated, strategically informed action over the next five years, if the higher education community is to facilitate a cost effective approach to the switch to XML-based office document formats.
The report itself is available in several formats including Open Office (OpenDocument) format. If you've never used it, you should try it! Check out this page on Wikiepedia for a list of tools for reading and using OpenDocument format. You can use stand alone applications on your Mac or PC, or web-based applications like Google Docs.
- jajacobs's blog
- Add new comment
- 967 reads



Recent comments
4 hours 11 min ago
1 week 2 days ago
2 weeks 4 days ago
2 weeks 5 days ago
2 weeks 6 days ago
4 weeks 3 hours ago
4 weeks 1 day ago
4 weeks 3 days ago
4 weeks 3 days ago
4 weeks 3 days ago