blogs

Lunchtime listen: Razzle dazzle WWI ship camouflage

99% Invisible is one of my favorite podcasts. Roman Mars talks about architecture and design in a really thoughtful and compelling way. He had a recent episode about razzle dazzle ships' camouflage in which he included images from the Rhode Island School of Design (RISD)'s Fleet Library (which is NOT connected to the Navy in any way :-)). Check out this fascinating listen about ships and camouflage.



(Erik Gould, courtesy of the Fleet Library at RISD, Providence, RI..)

(Erik Gould, courtesy of the Fleet Library at RISD, Providence, RI.)

Becoming invisible with your surroundings is only one type of camouflage.  Camofleurs call this high similarity or blending camouflage.  But camouflage can also take the opposite approach.

(Erik Gould, courtesy of the Fleet Library at RISD, Providence, RI.)

State Agency Databases Activity Report 3/17/2013

With the advent of the March updating season, the past two weeks have been very busy for volunteers at the State Agency Databases project at http://wikis.ala.org/godort/index.php/State_Agency_Databases.

That is, except for Hawaii, Minnesota and Oklahoma. These pages have languished for over a year without a volunteer documents specialist to care for them.

If you would like to brighten the day of one of these pages by adopting them, read through the project's volunteer guide, then contact Daniel Cornwall at danielcornwall@gmail.com.

On to our extensive activities. For a full blow by blow report of updates and other activity for the past two weeks, visit http://tinyurl.com/statedbs14d. Here are the highlights:

DATABASES ADDED

DISTRICT OF COLUMBIA (Susan Paterson)

Recreation Centers and Pools - Find swimming pools, basketball courts, before- and after-school care programs, and other programs at parks and recreation centers in the District of Columbia. Also can find status of projects improvements and permit facilities.

KANSAS (Pam Crawford)

Photo Library - From the web site: "The KGS Photo Library is a collection of photos of different subjects from around the state. These photos may be used for any non-commercial purpose; if you use a photo, please credit the Kansas Geological Survey. To select photos to view: (1) click on one of the terms under the Subjects menu, (2) enter a term (or terms) in the Keywords search, or (3) use one of the three maps to select photos by county, physiographic region, or highway."

NEVADA (Kathy Edwards)

Teacher Licensure - Search for the credentials of a public school teacher by entering the teacher's name or license number.

NEW HAMPSHIRE (Linda Johnson)

New Hampshire Vital Records Information Network Web Query] Search for birth, death, marriage, and divorce events. Data very current for most statistics. Password required.

OHIO (Audrey Hall)

Utility Information - Find utility information by street address.

NEW MATERIAL ON "NOT DATABASES" PAGE

In the course of searching for databases, our project volunteers come across interesting and/or useful resources that fall outside of our project scope. Rather than just forgetting these resources, we post them to our "not databases" page.

In the past two weeks, two resources have been added to this page:

Smart Consumer - From the Connecticut Department of Consumer Protection, separate pages directed toward parents and children, teens, young adults, adults, and older adults raise awareness of potential scams. Recommendations about what to know and what to do are given for various scam topics.

New York State Vital Statistics - The tables listed here provide information on vital statistics in New York State, such as mortality, birth rate, marriage and population.

The Digital-Surrogate Seal of Approval

James and I are happy to announce that our new article appears in the current edition of D-Lib Magazine:

In the last few years, there have been a series of articles, reports and proposals that rely on the promises of digitization to address issues of physical space, cost control, access, and collection management for FDLP libraries. One of the reasons we created this Seal of Approval standard is to provide a clear, consistent way to help evaluate some of these promises of digitization.

There are those who continue to insist that we have too many copies of federal documents, that preserving those copies is too expensive, that GPO is being unreasonable when it does not allow libraries to discard materials, and so forth. Although proposals to digitize FDLP collections are often couched in terms of enhancing access, libraries can digitize and enhance access without discarding paper copies. The underlying motivation of such proposals is often explicitly to weed the paper collections and, when not explicit, it is always implied. These proposals raise many questions in our minds. For example:

  • Will digitizations include digital text as well as images and will the text be accurate and complete and (re)usable?
  • Will the digitizations be readable and usable on modern e-book devices?
  • Will digitizations create digital objects that are as good as the originals, or worse, or better?
  • Will digitizations be deposited into Trusted Digital Repositories to ensure their long-term preservation and access?
  • Will the library that contributes the original be in control of the digital copy, or will control be ceded to large mega-libraries?
  • Will the digitizations include adequate metadata for management, preservation, and discovery?
  • Will libraries develop and maintain discovery and delivery mechanisms that address the special requirements of federal documents?
  • Will libraries provide adequate digital services for the digital collections?
  • Will any cost savings be applied to collection management and services for these collections or will the cost savings be redirected to other collections and services?
  • Will there actually be cost savings if we adequately address the above questions?

But there is one other question that is more important than all of the above. The question we must ask first is: Are the digitizations accurate and complete? If they are not, the other questions become moot or irrelevant. The DS-SOA is intended to help us answer that question. The DS-SOA denotes that a digitization accurately and completely replicates the content and presentation of the original.

The standard is designed to be easily understood and usable, not just by digitization-specialists, but also by library administrators, collection managers, service providers, preservation officers, business managers, and others who are responsible for library collections and services. It is also meant to help communicate clearly to end users the accuracy and completeness of the digitizations libraries provide to them.

We believe that libraries fulfill a unique role in society, one that is different from that of producers, agencies, publishers, authors, and vendors. We believe that the value of libraries is dependent upon the collections we select, acquire, preserve, and maintain and the services that we provide for those collections. The FDLP collections are unique; they provide a primary-source, historical record of our democracy. The FDLP print collections are not "legacy collections" as they are often called by those who wish to discard them; (the use of the word "legacy" as an adjective means "outdated" and "unwanted"). They are, however, our legacy. The use of the word "legacy" as a noun means bequest, heritage, endowment, gift, and birthright. The DS-SOA is a simple tool that libraries can use to ensure the value of their digital collections and communicate that value to library users. We believe that failure to ensure completeness and accuracy of our digital collections will reduce the value of libraries. We believe that replacing paper-and-ink books with digital copies without first ensuring and documenting that those copies are complete and accurate representations of the original would be tantamount to redacting the historical record of our democracy.

Aaron Swartz to be awarded ALA’s James Madison Award. Ceremony streamed live

[Update 3/15/13: Here's the ALA announcement.]

It was just announced that Aaron Swartz will be awarded the American Library Association's James Madison Award awarded annually to "honor individuals or groups who have championed, protected and promoted public access to government information and the public’s “right to know” on the national level." It is fitting that Aaron win the award -- and be presented by Rep. Zoe Lofgren (D-CA), a strong advocate for digital rights in Congress who won the award last year and who introduced Aaron's Law to try and amend the Computer Fraud and Abuse Act (CFAA).

The ceremony will be webcast live tomorrow (Friday March 15, 2013) at 8:30am eastern time. We'll post the video as soon as its made available.

Searching for a savior: who will serve as steward of Canadian government information?

[Editor's note: This is a guest post from Amanda Wakaruk, Government Information Librarian at the University of Alberta Libraries.]

Over the past week, the British Columbia Freedom of Information and Privacy Association (FIPA) wrote about and then provided the public with access to documentation outlining a Web Renewal Action Plan that calls for the reduction of Government of Canada (GoC) websites from roughly 1500 down to 1 (see FIPA’s blog entries, linked below). This plan appears to exacerbate the problems I noted in an FGI blog post last year: Government of Canada Publications -– It’s About Access, Not Format. For example, there is no publicly available evidence that the GoC has implemented or plans to implement a comprehensive web archiving plan before reducing its web footprint.

As a practitioner, I run into the problem of missing (i.e., unarchived) born digital content on a regular basis. (And no, Library and Archives Canada is not collecting websites for public consumption – these programs stopped in 2009.) The question I lost sleep over last year is more pressing than ever: who is archiving the web content of the GoC?

A group of institutions is working hard to setup a LOCKSS network that will help preserve the content of the Depository Services Program’s (DSP) e-archive (see the nascent CGI-PLN Wiki – email me if you would like to become a member or can help with funding to try and make this content accessible in the event that we lose access to the DSP website). Our first collection -- as important and impressive as it is at over 110,000 pdfs -- only represents a fraction of the content produced by the GoC. (As you might recall, the DSP does not collect html, only pdfs… and the latter format is discouraged by current GoC web protocols).

I am proud of the fact that the University of Alberta Libraries, my home institution, was able to capture select GoC websites using a fee-based (and US-based) Archive-IT account but no single academic institution can afford to act as steward for the output of the federal government. Happily, we have a colleague in the University of Toronto Libraries, who started capturing GoC web content using Archive-IT a few weeks ago as part of a joint “rescue mission” to save the contents of the Aboriginal Portal of Canada before it was deleted from government servers (the results of these crawls are accessible here and here).

The bigger question, of course, is this: If not the government, then who is responsible for collecting and preserving the born digital content of the GoC? If it *is* the academic sector’s responsibility then where will the funding come from? Recent provincial budget cuts in Ontario and Alberta have been hard on this sector, to say the least. If there is a White Knight out there, now would be a great time to step forward!

The elimination of print publications coupled with a lack of web archiving and a directive to make only ‘current’ information available online marks an incalculable loss. Countless students describe the sessional papers as “life changing” and scholars from all walks of life routinely draw on statistical information produced by their governments to help make sense of our place in the world and inform ways to improve it (as an aside, Statistics Canada plans to remove publications more than a few years old from their website). It is unthinkable that future generations will not have access to information produced by their government today… information that should be informing our cultural narrative.

Reaction to Web Renewal Action Plan

Quigley and Lance re-introduce H.Res 110: public access to CRS reports

Man, this week is Sunshine-week-alicious! The Sunlight Foundation has long advocated for -- along with FGI, library- and open govt organizations -- the free public access to Congressional Research Service (CRS) reports. CRS Reports are commonly not available to the public as CRS has this arcane and outdated rule that CRS reports are privileged communication between Congress and CRS. But CRS reports ARE available randomly online and Proquest, Penny Hill Press and other commercial publishers have long published them for a fee (I've even heard that CRS subscribes to Proquest to get access to their own reports historically!).

But this all may change. According to the Sunlight Blog, Representatives Leonard Lance (R-NJ) and Mike Quigley (D-IL) have reintroduced the bipartisan House Resolution 110 "Public Access to Congressional Research Service Reports Resolution of 2013" (text not received by GPO yet so not publicly available on Thomas). The Resolution would direct the Clerk of the House of Representatives to provide members of the public with Internet access to certain Congressional Research Service publications. Easy-peasy right?!

More than 30 organizations -- including Sunlight Foundation and FGI -- have signed on to a letter supporting the resolution. Please consider contacting your Representative and ask them to support H.Res. 110!

Open CRS Resolution Support Letter


CASSANDRA writes letter to Public Printer regarding the NAPA report

Last month the National Association of Public Administration (NAPA) released a report entitled "Rebooting the Government Printing Office: Keeping America Informed in the Digital Age" -- FGI responded with an analysis of the report and were particularly disturbed by recommendation #4 which said that GPO should consider "cost recovery" for access to FDsys.

A group of long-time government information librarians writing under the moniker of CASSANDRA (Concerned Government Information Professionals), have co-written a letter to Public Printer Davita Vance-Cooks offering their strong support for NAPA's conclusion that "free access to government information is both an important tenet of a democracy and a critical responsibility" while calling into question the same recommendation #4.

With CASSANDRA's permission (FYI, both Jim Jacobs and James Jacobs are signatories to this letter), we've posted the letter here for public knowledge and so that others may also write letters to the Public Printer and cite this letter in support of free permanent public access to authentic government information now and in the long-term.



Sunlight ranks state legislatures on openness. CA gets a D

The Sunlight Foundation just put out their Open Legislative Data Report Card. California received a D grade :-| Find out how your state is doing. Below is the methodology that they used to grade state legislatures.


Methodology

Each state was evaluated in six categories based largely on the Ten Principles For Opening Up Government Information. Each score is based on at least two members of staff and a volunteer during our state survey. Additionally, state legislatures were contacted (unless noted in their score) to ensure that our information on bulk data availability and timeliness was as accurate as possible.

The specific criteria for each category are as follows:

COMPLETENESS

We evaluated each state on the data collected by Open States: bills, legislators, committees, votes and events. We also took note if a state went above and beyond to provide this information and other relevant contextual information such as supporting documents, legislative journals and schedules. Points were deducted for missing data, often roll call votes.

  • 0 State provides full breadth of legislative artifacts Open States collects: bills, legislators, votes, and committees.
  • -1 State does not provide stand-alone roll call votes.

TIMELINESS

Legislative information is most relevant when it happens, and many states are publishing information in real time. Unfortunately, there are also states where updates are more infrequent and showing up days after a legislative action took place. States were dinged if data took more than 48 hours to go online.

  • 1 Multiple updates throughout the day, real time or as close to it as systems will allow.
  • 0 Site updates once or twice daily, typically at the end of the legislative day.
  • -1 Updates take longer than 24 hours to appear on the site, often up to a week.

EASE OF ACCESS

Common web technologies such as Flash or JavaScript can cause problems when reviewing legislative data. We found that the majority of sites work fairly well without JavaScript, but some received lower scores due to being extremely difficult to navigate, impossible to bookmark bills, and in extreme cases, completely unusable.

  • 1 Site was considered exceptionally well layed out by multiple evaluators, no issues with Javascript.
  • 0 Site was deemed average by those that evaluated it and/or had minor Javascript dependencies.
  • -1 Site was considered more difficult than average to use by members of staff or volunteers or had more severe Javascript dependencies.
  • -2 Site was considered extremely difficult to use with a heavy reliance on irregular browser behavior and Javascript.

MACHINE READABILITY

For many sites, the Open States team wrote scrapers to collect legislative information from the website code—a slow, tedious and error prone process. We collected data faster and more reliably when data was provided in a machine-readable format such as XML, JSON, CSV or via bulk downloads. If a state posted PDF image files or scanned documents, it received the lowest score possible.

  • 2 Essentially all data can be found in machine-readable formats.
  • 1 Lots of data in machine readable format but substantial portions that still required scraping HTML.
  • 0 No machine readable data but standard screen scraping techniques applied.
  • -1 Site had information that was much more difficult than average to collect. (Data only accessible via PDF or that required screen scraper to emulate Javascript.)
  • -2 Site had information that was unaccessible to Open States due to use of scanned PDFs.

USE OF COMMONLY OWNED STANDARDS

Because our ability to access most of a state’s data is represented by the above “Machine Readability” metric, we decided to use this provision to measure how a state made their bill text available. Making text available in HTML or PDF is the norm, and was considered an acceptable commonly owned standard (PDFs are a commonly owned standard, but it would be certainly nice to see alternative options where bill text is only available via PDF). States that only make documents available in Microsoft Word or Wordperfect formats require an individual to purchase expensive software or rely on free alternatives that may not preserve the correct formatting. It is worth noting, all states except for two met the common criteria of providing HTML and/or PDF only, one state (Kansas) went above and beyond and another (Kentucky) did not even meet this threshold.

  • 1 State made an effort to go above and beyond.
  • 0 State provided bills in PDF and/or HTML format and nothing better (plaintext, ODT, etc.).
  • -1 State only provided bills in a proprietary format.

PERMANENCE

Many states move or remove information when a new session starts, much to the dismay of citizens seeking information on old proposals and researchers that may have cited a link (e.g. http://somelegislature.gov/HB1 vs http://somelegislature.gov/2011/HB1) only to see it point to a different bill in the following session. Tim Berners-Lee, inventor of the World Wide Web, wrote an article declaring Cool URIs Don’t Change and we agree.

This poses a particular challenge to us since every page on OpenStates.org points to the page we collected data from, but if a state changes their site then users lose the ability to check us against the original source. Most (but not all) states are good about at least preserving bill information, but few were equally as good about preserving information about out-of-office legislators and historical committees, equally important parts of the legislative process.

  • 2 All information is avaialble in a permanent location and data goes back a reasonable amount of time (a decade or so).
  • 1 Almost all information has a permanent location but a single data set doesn't. (Or a recent change to the site has wiped out historical links but information appears to be preservable going forward.)
  • 0 Legislator & committee information lacks a permanent location (such as committees and legislators) but most is acceptable.
  • -1 Ability to link to old information is badly damaged and and/or there is less than a decade of historical information.
  • -2 Vital information like bills or versions lack a permanent location.

Archive-it publishes Web archiving life cycle model

The Archive-it team announced today the publication of their White Paper Web Archiving Life Cycle Model. The model offers a thorough description of the entire process of Web archiving. Whether you've been Web archiving for 7 years or mulling about jumping in to the fray, this model will put you in a good headspace to do this critical work. Thanks Molly Bragg, Kristine Hanna, Lori Donovan, Graham Hukill, and Anna Peterson!

The Archive-It team is excited to publish our first white paper: The Web Archiving Life Cycle Model. With this paper we hope to share web archiving best practices and processes with organizations interested in developing and/or expanding their web archiving initiatives.

This white paper is the product of a collaboration between members of the Archive-It team as well as the larger Archive-It partner community. Several partners took part in in-depth interviews regarding their experiences using Archive-It and web archiving in general, and others helped with the design iteration phase of the model and read preliminary drafts of the paper.

The Web Archiving Life Cycle Model encompasses the following web archiving processes:

• Vision and Objectives
• Resources and Workflow
• Access/Use/Reuse
• Preservation
• Risk Management
• Appraisal and Selection
• Scoping
• Data Capture
• Storage and Organization
• Quality Assurance and Analysis

C-SPAN's "American Artifacts" to air history of GPO on March

C-SPAN's American Artifacts series will air a segment on the Government Printing Office (GPO) on March 17 at 8am and 7pm Eastern time. Check out the preview below, with GPO Historian George Barnum.

Syndicate content