James Staub's blog
NetSquared $100K mashup competition
Submitted by James Staub on Wed, 2008-03-19 06:26.TechSoup's NetSquared mashup challenge pays out a total $100K this year to the best of 122 submitted projects.
I'm going to have way too much fun browsing through the project proposals. Casting my votes, though, is already proving difficult... Using user downtime? Massive mashup calendars? Warnings for hidden corporate abuses every time I make an online purchase?
Ohboyohboyohboyohboy.
And of course, many of these envisioned projects would not even be imaginable were it not for widely available and mostly reliable government information data sources.
- 1 comment
- Email this blog
- 401 reads
2007 Fall FDLP/DLC proceedings, audio recordings, and unedited transcripts available!
Submitted by James Staub on Thu, 2007-11-15 07:32.The GPO has posted a recap of the 2007 Fall Federal Depository Library Conference and Fall Depository Library Council Meeting that includes audio files, photos, and unedited transcripts. Of particular note:Ric Davis, who has been serving as Interim Superintendent of Documents has accepted the job of Interim Superintendent of Documents; the FDSys has been renamed FDSys; and the Depository Library Council is now directly asking "What does FDSys mean for libraries?" Also of note: there were excellent discussions of official and authentic online legal materials and shared models for regional FDLP libraries.
- Add new comment
- Email this blog
- 641 reads
GPO's FDLP Podcasts available!
Only two weeks after the Spring FDLP/Depository Library Council meeting, GPO has released audio files via podcasts! Download them through iTunes, share them with friends and family, and use Audacity to couple them with phat beats.
- Add new comment
- Email this blog
- 427 reads
Fall 2006 Depository Library Council audio and notes on the way
Attendees of the 2006 Fall Federal Depository Library Conference and Council Meeting heard about a number of developments in government information and the Federal Depository Library Program, including:
- NTIS will investigate providing access to its digital content for federal depository libraries
- celebrating Judith Russell's tenure as Superintendent of Documents
- All Depository Library Council sessions from this meeting will be released as podcasts!
To learn about these developments and to hear low quality audio before the official podcasts come out, take a stroll through the FGI Fall 2006 FDLP/DLC roundup page. Virginia Rigby has graciously offered her detailed notes from the sessions she attended - I'll be posting these and my own notes over the next couple days. When listening to the audio files, wear headphones!
And, as always, we'd love for you to share your own notes and experiences from the conference!
- Add new comment
- Email this blog
- 905 reads
Not just blogs : Preservation issues
Welcome to another exciting episode of FGI's Not Just Blogs! When last we left our daring document do-gooders, they were examining clues pointing to the problem of Authenticity. Collective "Jenkies" let loose when they came to the realization that their online copies of Distinguishing Bolts from Screws were only as authentic as the trust they could invest in the Web sites hosting them, and no amount of pure technology could replace the security that comes from trust. Luckily, libraries - with a primary mission to deliver authentic information and a long history of success doing it - are well-positioned to continue the work we trust them to do in delivering government information.
Today we find our gutsy govdockers cracking into the Vault of Preservation.
To join along in the adventure, click on the "Issues" link at the top of FGI's pages.

Then click on "Preservation" in the Issues box.

Once inside the Vault of Preservation, stand in awe of the treasures therein. Catch a glimpse of the problems preserving digital data in the issue brief -- and shriek in terror!
Digital decay -- Yikes!
Obsolete file formats -- Zoinks!
Intentional, malicious removal of information from centralized repositories -- Oh, no!
Thank goodness libraries are working with projects like LOCKSS to secure permanent public access to electronic government information. Whew!
(Oh, and, by the way, if you happen to have the answer to these troubling preservation questions, could you please drop us a line in the comments? Thanks!)
Join us next week when Shinjoung leads us through government information issues of privacy, and you'll hear Daniel say:
And besides all that, what we need is a decentralized, distributed system of depositing electronic files to local libraries willing to host them.
- 2 comments
- Email this blog
- 1743 reads
Toward estimating file sizes of online federal documents
Earlier this year, Daniel estimated the average size of an online federal document as between 5MB and 10MB. Libraries investigating digital deposit and provision of permanent public access to these resources need to estimate the cost of storage for these documents.
For the past week, I've played around in an entirely nonrandom sample of online docs to try to get an accurate estimate. Although I'm not close to a reliable estimate, I'd still like to share what I've done...
The process:
- grabbed all (1,234) MARC records with 856 fields from DDM2 for the GPO Timestamp range 2006 06 01 - 2006 06 30
- used wget to retrieve all URLs listed in those 856 fields
- slapped the wget logs into a vaguely useful excel spreadsheet (thanks to liberal regexp-ing in jEdit)
The basic results:
| TOTAL URLS | 1342 |
|---|---|
| TOTAL SIZE MB | 2004.7 |
| AVG SIZE KB | 1530 |
'Course, these numbers don't mean much against a little scrutiny. The 856 field often points to table of contents pages (when it points to the document at all...), and that single page is all that gets counted in this simple investigation.
PDF files might offer a better estimate than HTML files. Although publishers can
split up documents into multiple PDF files and have a "Table of Contents" PDF file point to these multiple resources composing a single bibliographic unit, this doesn't appear to be too common. When 856 fields point to PDF files, they tend to be self-sufficient, whole bibliographic units. So here are the numbers for pdf files retrieved using the 856 fields:
| FILETYPE | |
|---|---|
| TOTAL URLS | 815 |
| TOTAL SIZE MB | 1961 |
| AVG SIZE KB | 2464 |
| STD DEV SIZE KB | 7605 |
| MAX SIZE KB | 148902 |
In a true demonstration of futility, I looked at 124 of the HTML files (of the 525 in the June 2006 DDM2 sample) that are stopping points for the 856 pointers. Most of these totally-non-random-sample HTML pages to not constitute the entire document described in the MARC record. I developed various wget capture strategies for 84 of these online documents, and the average size of the "cluster" of files captured per 856 pointer was 8.17 MB (median: 3.19 MB, std dev: 13.09 MB).
In a vaguely related exercise, I grabbed the various files composing Foreign Relations of the United States, vols. E-1, E-5, and E-7. Sure, they're outliers w/r/t size, but I thought I'd mention them anyway...
| VOL | MB | FILES |
|---|---|---|
| E-1 | 318 | 880 |
| E-5 | 143 | 687 |
| E-7 | 618 | 892 |
CONCLUSION:
I don't have one yet. At the end of the week, though, 5-10 MB seems like a pretty good estimate to me.
- 2 comments
- Email this blog
- 1380 reads
An open question on the open document format
What exactly are the preservation concerns for open source standards expressed by Massachusetts state supervisor of public records, Alan Cote?
Are any of them legitimate concerns?
from ZDNet:
"The rigid policy, such as the initiative before you that excludes any vendor or any process and relies on questionable, untested and unreliable practices or tools, does not suit the commonwealth well," Cote said in prepared remarks. "It may very well result in many electronic records being lost or destroyed."
Cote added that the state's records management system renders what format a document is saved in as moot.
- Add new comment
- Email this blog
- 1046 reads
200510161030 - Depository Library Council - GPO Update
[Anybody want to help me with timestamps on this "transcript"? - JS]
[I will. - Daniel]
A helpful list of abbreviations for personal names used in this posting
Download audio file of this session
The Basic Timestamps
[Audio recording begins after Ric Davis' reading of Judy Russell's letter]
0:00:00 Mike Wash describes progress on GPO's Future Digital System (FDSys).
0:34:00 Questions from Depository Library Council.
0:44:00 Questions from the audience.
The Detailed Version
JR has emergency family business and will not be joining the conference.
RD is sitting in as JR and delivers her prepared remarks. [And somehow James S. managed to not record this…]
RD offers a nod to the "unofficial" bloggers in the audience :)
The mechanics of Council business for this meeting: The Depository Library Council will develop a vision document to deliver to the Public Printer Wednesday morning.
- Add new comment
- Email this blog
- Read more
- 1345 reads
200510 - Depository Library Council - Personal name abbreviations James S. is using in his DLC posts
GPO STAFF
BJ: Bruce James: Public Printer of the United States: US Government Printing Office
JR: Judith (Judy) Russell: Managing Director: Information Dissemination, US Government Printing Office
MW: Mike Wash: Chief Technical Officer and co-Director of the Office of Innovation and New Technology.
RD: Richard (Ric) Davis: Acting Director: Library Services and Content Management, US Government Printing Office
TE: Thomas (TC) Evans: Assistant Chief of Staff for Strategic Initiatives, US Government Printing Office
DEPOSITORY LIBRARY COUNCIL
AM: Ann Miller: Head, Public Documents and Maps Department: Duke University
BS: Barbara S. Selby: Government Information Librarian: University of Virginia
CE: Charles D. Eckman: Principal Government Documents Librarian: Stanford University
CM: Cheryl Knott Malone: Associate Professor: University of Arizona
DA: Duncan M. Aldrich: DataWorks Coordinator: University of Nevada, Reno
- Add new comment
- Email this blog
- Read more
- 2647 reads
200510191030 – Depository Library Council - Plenary Session with Public Printer of US part 2
Download an audio file of this session
A helpful list of abbreviations for personal names used in this posting
0:00:15 MP: Law libraries want paper because the issuing agencies recognize paper as the only official version.
0:00:50 DA: Horse and buggy, paper and electronic.
0:02:25 MS: Publishers anguish over the loss of control of content as libraries seek to customize it. We want platform-neutral content.
0:05:10 BJ: responds. Carnegie-Mellon is developing automated language translation. We need to work with big agencies to help them prepare documents in a fashion that will produce accurate automated translations.
0:07:00 CM: asks for an update on the mass digitization of the legacy collection.
BJ: …
70% of the FDLP collection will be online by the end of 2007.
We need to determine what other documents need to go into the collection. E.g, the Federalist Papers are not FDLP, but are undeniably important.
- Add new comment
- Email this blog
- Read more
- 1446 reads
200510190830 – Depository Library Council - Plenary Session with Public Printer of US
Download an audio file of this session
A helpful list of abbreviations for personal names used in this posting
0:00:00 BS: Council Aerobics
0:03:45 BJ: The GPO troops reported that they felt this was the best meeting so far.
0:05:10 This is the best job I've ever had.
0:06:45 Tales of Thomas Benedict, a previous Public Printer.
In 1895 he moved from steam to electric power.
A few years later, he purchased electric trucks and got rid of the horses.
GPO still has the stables in the basement.
0:09:05 We fear that letting go of the past will lead to worse constituent service.
BJ describes the physical GPO building and its antiquated-ness.
0:11:00 Describes the plan to move GPO facilities
0:11:40 "a marvelous scheme" [referring to the plan to deal with the GPO building]
0:12:20 The GPO $35 million budget is mostly devoted to overhead costs associated with the building
- 1 comment
- Email this blog
- Read more
- 1908 reads
200510171930 - Depository Library Council - GODORT Meeting
200510171930 FDLP DLC GODORT
Secrecy, Privacy, and FOIA: Conflicts and Consistencies
Speakers: Bob Gellman (BG), Meredith Fuchs (MF), and Patrice McDermott (PM).
Download an audio file of this session
0:00:00 Arlene Weible introduces program
0:01:30 Vicki Phillips, Chair Nominating Committee: Call to service
0:05:00 John Phillips, Chair Awards Committee: All nominations due in December
0:08:30 Aimee Quinn introduces speakers
0:13:45 Bob Gelman (BG)
"Conflicts" between privacy and FOIA
Enumerates "Principles of fair information practices" and notes how each is not in conflict with privacy concerns, except for "finality" which is only partly in conflict
0:22:30 2. Libraries adamant about protecting use records, but demand openness in other agency arenas. Bookstores also demonstrate cognitive dissonance toward commercial ends.
0:30:00 Meredith Fuchs (MF), National Security Archives
- Add new comment
- Email this blog
- Read more
- 1268 reads
Fall 2005 Depository Library Council audio and notes on the way [update]
I'm getting audio files and their descriptions from the Fall 2005 DLC up and running. You can download the following PRETTY ROUGH (i.e., wear headphones, turn up the volume a little, and be prepared to block out some background noise), PRETTY LARGE (3-11 meg), VERY UNOFFICIAL mp3 files:
- Add new comment
- Email this blog
- Read more
- 1323 reads



Recent comments
4 hours 48 min ago
10 hours 49 min ago
1 day 4 hours ago
1 day 8 hours ago
3 days 13 hours ago
5 days 2 hours ago
6 days 7 hours ago
6 days 10 hours ago
6 days 22 hours ago
1 week 13 hours ago