Ari Hershowitz, who runs Tabulaw (software tools for legal work), has converted California’s statutes into structured html, with most internal references now hyperlinked (calaw.tabulaw.com), and has written about the process of doing this on his blog:
- How to Convert All Files in a Directory: CA Legislation.
- How to Convert Citations to Hyperlinks: CA Laws.
- How to: Convert Sections Into Hyperlink Targets.
- How to convert Text to HTML: Using txt2html Perl Module.
- California Laws: Converting Plain Text to HTML.
- California Law: Recovering Meaning and Metadata with RegEx.
As he notes in his blog posts, California at least makes all of its codes available for FTP download (not all states do even this), but there are a lot of “challenges in recovering meaningful structural information (titles, paragraphs).” It takes many steps to add structure to plain text documents, to add back in the metadata that the Section’s original drafters intended, to help a reader understand and navigate the law. He wonders why governments don’t distribute documents with this structural/semantic information included. PDFs may look pretty to the eye, but they are not easily “understood” by software. If governments distributed documents that were “machine actionable” — that is, marked up with tags that denoted the structure of the documents and the meaning of the text, it would be easier to create indexes, link documents with other documents, and so forth.
In an email yesterday, Ari asks if there are others who would like to work with him on making California law more usable:
…tracking of bills or proposed legislation becomes more meaningful if it can be compared to existing legislation. Please get in touch if you are interested in brainstorming or working with me to connect proposed legislation to the existing statutes to create a “legislative diff”, or related improvements. Even better would be thoughts on how we can get California’s legislature to include this metadata in the original drafts of bills.
See www.tabulaw.com if you’d like to get in touch with Ari.
This work, unless otherwise expressly stated, is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 United States License.