statistics

Newsweek Article on the Statistical Abstract

I was pleasantly surprised to see an article about the wonders of the Statistical Abstract in the Jan. 18, 2010 issue of Newsweek, entitled "Suicide, Sex, and SUVs: This book covers them all—and more." The author, Robert J. Samuelson, states:

I confess to being an avid fan of the Statistical Abstract, published annually by the Census Bureau, because it tells so much so quickly. The just-published 2010 edition, as always, bulges with information. For me, the Stat Abstract is often the first go-to source for a story, because it substitutes evidence for speculation.

Of course, in the print version of the magazine's article, the author doesn't mention where you can find this document at your local library and/or Federal Depository Library and he doesn't mention that it is also available online for free via the federal government, and the online article links to a copy of the Statistical Abstract that you can purchase at Amazon. Huh?! I made a comment with the link to the free online version and provided a link to the FDLP Directory for those that want to look at a print copy of the document at their local FDL.

But it is good to see government documents in the news!

Free ICPSR Data Conference on the Web

ICPSR 2009: Real Data in a Virtual World

ICPSR (the Inter-University Consortium for Political and Social Research) is the large social science data archive at the University of Michigan. Every second year, ICPSR hosts a meeting (in Ann Arbor Michigan) for its "Official Representatives" -- one person at each ICPSR member institution. This year, the meeting (October 5-9) is open to all and the meeting is on the web instead of in Ann Arbor.

At the link above, you can find a list of the week long program and sign up for individual sessions.

Government information specialists, particularly those with responsibilities for data and statistics, should set aside time for this! Sessions are interesting and informative. Some examples from the program:

  • Census 2010 & American Community Survey
  • Delivering Research Opportunities to Undergraduates
  • Online Data Analysis Tools

US Census Bureau's DataFerret

DataFerrett (Federated Electronic Research, Review, Extraction, and Tabulation Tool) is a free data mining and extraction tool developed by the U.S. Census Bureau that allows users to search, browse, combine, tabulate, recode, and analyze statistical data from a network of online data libraries. The DataFerret software can be downloaded from the website or ran in the browser via a java applet.

Some material to read before getting started:

  1. DataFerret Brochure
  2. Getting Starting with DataFerrett Tour
  3. DataFerret User Guide

Available data sets included:

  • American Community Survey (ACS)
  • American Housing Survey (AHS)
  • Behavioral Risk Factor Surveillance System (BRFSS)
  • Consumer Expenditure Survey (CES)
  • County Business Patterns (CBP)
  • Current Population Survey (CPS)
  • Decennial Census of Population and Housing
  • Harvard-MIT Data Center Collection
  • Home Mortgage Disclosure Act (HMDA)
  • Local Employment Dynamics (LED)
  • National Ambulatory Medical Care Survey (NAMCS)
  • National Center for Health Statistics Mortality (MORT)
  • National Health and Nutrition Examination Survey (HANES)
  • National Health Interview Survey (NHIS)
  • National Hospital Ambulatory Medical Care Survey (NHAMCS)
  • National Survey of Fishing, Hunting, and Wildlife (FHWAR)
  • Small Area Income and Poverty Estimates (SAIPE)
  • Social Security Administration (SSA)
  • Survey of Income and Program Participation (SIPP)
  • Survey of Program Dynamics (SPD)

DataFerret is a wonderful tool for exploring and analyzing data. Enjoy!

(found via Open Access News)

EBRI Databook on Employee Benefits

Sometimes, the best source of government statistics may be published by someone other than the government. A case in point:

Drawing from the March Current Population Survey, the National Health Expenditures data from the Centers for Medicare & Medicaid Services, the Bureau of Labor Statistics' Employee Benefit Survey and National Compensation Survey, and the National Income and Product Accounts (NIPA) data from the Bureau of Economic Analysis, as well as the William M. Mercer National Survey of Employer-sponsored Health Plans and other sources, the ERBI Databook seeks to provide "a comprehensive analysis of how the employee benefits system works, who and what its various functions affect, and its relationship with the U.S. economy." It includes over 400 tables and charts presenting vital statistics on the employee benefit system.

EBRI is a nonprofit, nonpartisan organization established in 1978, whose mission is "to contribute to, to encourage, and to enhance the development of sound employee benefit programs and sound public policy through objective research and education."

Hat tip to Stuart Basefsky! (IWS Weekly Bulletin, 8 April 2009).

Transboundary issues are needful things

There is an obvious transboundary need for free flowing, current foreign / international government information. This transboundary need reflects the nature of our most critical 21st Century challenges -- climate change, crime, trade, labour rights, poverty, hunger, etc. -- they know few hard geo-political boundaries.

So how can we know what's going on in our extended community of nations, better known as the Western Hemisphere?

Supra national sources of information like the United Nations and it's subsidiary regional commission ECLAC (Economic Commission for Latin America and the Caribbean) do what some of our finest U.S. government agency publications do -- they track the statistical universe of nations.

One of my favorite sources of free flowing, current foreign / international government information is UNPULSE, "Connecting to UN Information" (A Service of the UN Library).

UNPULSE links to the 2008 edition of the ECLAC's Statistical Yearbook for Latin America and the Caribbean, "... one of the main sources of statistical information of the region."

The full text of the report, published in English and Spanish, is divided into four chapters: "(1) Demographic and social areas, with special attention to gender; (2) Economical statistics such as prices, international trade, balance of payments and national accounts; (3) Information on natural resources and the environment; and, (4) Methodological aspects and other data on sources, definitions and coverage of the statistics cited."

About ECLAC: "ECLAC is one of the five regional commissions of the United Nations. It was founded with the purpose of contributing to the economic development of Latin America, coordinating actions directed towards this end, and reinforcing economic ties among countries and with other nations of the world. The promotion of the region's social development was later included among its primary objectives..."

~ Free government information flowing south to north.

Why is the 1957 Census of Govts deemed under copyright?

I just talked with a researcher who was interested in getting his hands on a digital copy of the 1957 Census of Governments. My momentary joy at finding a copy at the University of Michigan (my go-to library to find digital govt documents!) quickly turned to disappointment on seeing the message:

Page images and full text of this item are not available due to copyright restrictions.

There ought to be a way for people/librarians to check the document for copyrighted bits and then quickly flip a switch to release it into the public domain and make it accessible to everyone. Is that too much to ask? Over time, we could lessen the impact that Google's scorched earth copyright policy has on documents that should rightfully be in the public domain. And another thing, why didn't they scan statistical resources to .csv files?!

That is all.

A non-partisan U.S. Census Bureau

For fascinating, provocative reading about where the Census Bureau will reside in the federal government organization, read the Feb 10, 2009 Wall Street Journal: "Why Obama Wants Control of the Census: Counting Citizens is a Powerful Political Tool" (author: John Fund). http://online.wsj.com/article/SB123423384887066377.html.
Serious changes may be happening, and more quickly than imagined. Even the seven former Census directors who support turning the Census into an independent agency recommended doing so after the 2010 Census.

To quote the article, "[S]tatisticians at the Commerce Department didn't think [Obama's changes] would mean having the director of next year's Census report directly to the White House rather than to the Commerce secretary."

Lots of food for thought.

Good News from the Federal Bureau of Investigation (FBI)

FBI's Preliminary Semiannual Uniform Crime Report reveals that crime rates declined in 2008. The report revelas that from January-June 2008, violent crimes decreased by 3.5 percent and property crimes by 2.5 percent. FBI's press release states that a full report for 2008 will be released in fall. The main points are:

"Violent crimes in all population groups declined: murder by 4.4 percent, aggravated assault by 4.1 percent, forcible rape by 3.3 percent, and robbery by 2.2 percent.

On a regional basis, law enforcement in all four parts of the country reported a drop in violent crimes: 6.0 percent in the Midwest, 5.0 percent in the West, 2.9 percent in the Northeast, and 1.5 percent in the South.

Overall, property crimes fell in the Midwest (4.7 percent), the West (6.1 percent), and the South (0.4 percent).

Among population groupings, each category of property crime was down: motor vehicle thefts by 12.6 percent, larceny-theft by 1.2 percent, and burglaries by 0.8 percent.

Arsons dropped in all four regions of the country and among all population groups (with the exception of cities with populations of 250,000-499,999, where it actually increased 2.0 percent)."

- Main points quoted from the FBI's Press Release

Calls for Comment on Proposed Federal Data Collections

I'm forwarding this heads-up from the Association of Public Data Users (APDU) list. Over the last two weeks, there have been quite a few calls for comment on proposed data collections published in the Federal Register (see below with due date).

Maybe it's because I haven't had my coffee yet this morning, but I was a little peeved by my failed information search. I found the Census Bureau's FR posting (Federal Register: January 7, 2009 (Volume 74, Number 4) Page 672) (btw, I tried in FDsys.gpo.gov but they've not loaded Volume 74 yet) but at first was stymied because the summary page has no link to the Census Bureau's Web site, and does not have contact information or any link to more information. The full listing has the information, but comments must be written, no Web submissions :-|

Ok fine, I go to www.census.gov and after more than 5 minutes of search/browse, give up on finding exactly *how* to submit comments on the proposed "Quarterly Financial Report" or "Survey of Local Government Finances." What's even worse, census.gov does not have a "contact us" link on it's first page. I finally found it in the footer of a second level page, but could find nothing in the Question and Answer Center about RFCs, proposed data sets etc. *sigh*

I guess this is turning into an FDsys comment. I noticed in FDsys that you can sign up to receive the daily Federal Register Table of Contents which is cool. But there needs to be a way to browse only the requests for comments (or rules changes, notices...) of specific agencies in the FR as well as receive email or RSS of requests for comments. There also needs to be a link in the FR to the agency in question and not just their top level site but to the place on the site with information on the RFC and directions for how to submit comments. And lastly (this is not an FDsys comment but a general agency comment) RFCs should be submitted online.

So go ahead and submit comments for the proposed data collections below, I dare you.

  • Census Bureau
    Quarterly Financial Report (February 9, 2009)
    Survey of Local Government Finances (School Systems) (March 13, 2009)
  • Office of Management and Budget
    2007 North American Industry Classification System (NAICS)-Updates for 2012 (April 7, 2009)
  • Bureau of Labor Statistics
    Labor Market Information (LMI) Cooperative Agreement (March 3, 2009)
  • Employment and Training Administration, Department of Labor
    O*Net Data Collection Program (January 30, 2009)
  • Science Resources Statistics, National Science Foundation
    Survey of Research and Development Expenditures at Universities and Colleges (March 10, 2009)
  • National Institutes of Health
    Information Program on Clinical Trials: Maintaining a Registry and Results Databank (February 5, 2009)
  • Administration for Children and Families, Department of Health and Human Services
    Feasibility Test for Design Phase of National Study of Child Care Supply and Demand (February 5, 2009)
  • Agency for Healthcare Research and Quality, Department of Health and Human Services
    The AHRQ Data Inventory (January 30, 2009)
  • Centers for Medicare & Medicaid Services, Department of Health and Human Services
    CAHPS Home Health Care Survey (March 10, 2009)
  • Surface Transportation Board, Department of Transportation
    Class I Railroad Annual Report (February 9, 2009)
    Quarterly Report of Freight Commodity Statistics (February 9, 2009)

Statistical Abstract of the United States, 2009 edition

The 2009 edition of the Statistical Abstract of the United States is now available from two sites:

It mystifies me that, with the problems of relying on proprietary formats and distributing data on CD-ROMs that became effectively unusable after a relatively short time, the Census Bureau is not making this essential publication available in a software neutral format such as CSV.

New UN statistical database

A posting in Slashdot has information about the new UN data access system called UNdata that contains information from all major UN databases and those of several other international organizations. UNdata will improve the dissemination of statistics by UN's Statistics Division (UNSD) to the widest possible audience. It is an easy to use data access system that was developed to meet UNSD’s vision of providing an integrated information resource with current, relevant and reliable statistics free of charge to the global community. The design allows a user to access a large number of UN databases either by browsing the data series or through a keyword search.

Thanksgiving Statistics

Last year, Linda Zellmer from the University of Indiana, sent out an update to a Thanksgiving poster that details statistics for the various crops served during a Thanksgiving meal.  I immediately printed it out and it is currently on a wall iin the Maps Area.  The information comes from the Economic Census and it arose a great deal of curiosity from patrons.  I am sure Linda will update it once the 2007 Economic Census statistics are available in a couple of years.

The Census Bureau also publishes annually statistics about Thanksgiving Day.  Here's the information for 2007.

 

Thanksgiving Day
Nov. 22, 2007

In the fall of 1621, the religious separatist Pilgrims held a three-day feast to celebrate a bountiful harvest, an event many regard as the nation’s first Thanksgiving. It eventually became a national holiday in 1863 when President Abraham Lincoln proclaimed the last Thursday of November as a national day of thanksgiving. Later, President Franklin Roosevelt clarified that Thanksgiving should always be celebrated on the fourth Thursday of the month to encourage earlier holiday shopping, never on the occasional fifth Thursday.

272 million
The preliminary estimate of turkeys raised in the United States in 2007. That’s up 4 percent from 2006. The turkeys produced in 2005 together weighed 7.2 billion pounds and were valued at $3.2 billion.
Source: USDA National Agricultural Statistics Service
http://www.nass.usda.gov/

 

Weighing in With a Menu of Culinary Delights

46 million
The preliminary estimate of turkeys Minnesota expects to raise in 2007. The Gopher State is tops in turkey production. It is followed by North Carolina (39 million), Arkansas (31 million), Virginia (21.5 million), Missouri (21 million) and California (16.8 million). These six states together will probably account for about two-thirds of U.S. turkeys produced in 2007.

690 million pounds
The forecast for U.S. cranberry production in 2007, essentially unchanged from 2006 and 11 percent more than 2005. Wisconsin is expected to lead all states in the production of cranberries, with 390 million pounds, followed by Massachusetts (180 million). New Jersey, Oregon and Washington are also expected to have substantial production, ranging from 18 million to 52 million pounds.

1.6 billion pounds
The total weight of sweet potatoes — another popular Thanksgiving side dish — produced by major sweet potato producing states in 2006. North Carolina (702 million pounds) produced more sweet potatoes than any other state. It was followed by California (381 million pounds). Mississippi and Louisiana also produced large amounts: at least 200 million pounds each.

1 billion pounds
Total pumpkin production of major pumpkin-producing states in 2006. Illinois led the country by producing 492 million pounds of the vined orange gourd. Pumpkin patches in California, Ohio and Pennsylvania also provided plenty of pumpkins: Each state produced at least 100 million pounds. The value of all the pumpkins produced by major pumpkin-producing states was $101 million.

If you prefer cherry pie, you will be pleased to learn that the nation’s forecasted tart cherry production for 2007 totals 294 million pounds. Of this total, the overwhelming majority (230 million) will be produced in Michigan.

1.8 billion bushels
The total volume of wheat — the essential ingredient of bread, rolls and pie crust — produced in the United States in 2006. Kansas and North Dakota accounted for 30 percent of the nation’s wheat production.

841,280 tons
The 2007 contracted production of snap (green) beans in major snap (green) bean-producing states. Of this total, Wisconsin led all states (310,200 tons). Many Americans consider green bean casserole a traditional Thanksgiving dish.
Source: The previous data come from the USDA National Agricultural Statistics Service http://www.nass.usda.gov/

$9.5 million
The value of U.S. imports of live turkeys during the first half of 2007 — 99.5 percent from Canada. Our northern neighbor accounted for all of the cranberries the United States imported ($2.2 million). When it comes to sweet potatoes, however, the Dominican Republic was the source of 63 percent ($1.7 million) of total imports ($2.7 million). The United States ran a $4.9 million trade deficit in live turkeys during the period but had surpluses of $9.4 million in cranberries and $15.3 million in sweet potatoes.
Source: Foreign Trade Statistics http://www.census.gov/foreign-trade/www

13.1 pounds
The quantity of turkeys consumed by the typical American in 2005, with a hearty helping devoured at Thanksgiving time. Per capita sweet potato consumption was 4.5 pounds.
Source: Upcoming Statistical Abstract of the United States: 2008, Tables 205-206 http://www.census.gov/compendia/statab/

 

An Organic Feast

144,086
Number of certified organic turkeys on the nation’s farmland, as of 2005. Most of these turkeys were in Michigan (56,729) or Pennsylvania (48,815).
Source: USDA Economic Research Service
http://www.ers.usda.gov/data/organic/

 

The Turkey Industry

$3.6 billion
The value of turkeys shipped in 2002. Arkansas led the way in turkey shipments, with $581.5 million, followed by Virginia ($544.2 million) and North Carolina ($453 million). In 2002, poultry businesses whose primary product was turkey totaled 35 establishments, employing about 17,000 people.
Source: Poultry Processing: 2002 http://www.census.gov/prod/ec02/ec0231i311615.pdf

$3.86 billion
Forecast 2007 receipts to farmers from turkey sales. This exceeds the total receipts from sales of products such as rice, peanuts and tobacco.
Source: USDA Economic Research Service http://www.ers.usda.gov/Data/farmincome/finfidmu.htm

 

The Price is Right

99 cents
Cost per pound of a frozen whole turkey in December 2006.
Source: Upcoming Statistical Abstract of the United States: 2008, Table 709 http://www.census.gov/compendia/statab/

 

Where to Feast

3
Number of places in the United States named after the holiday’s traditional main course. Turkey, Texas, was the most populous in 2006, with 489 residents; followed by Turkey Creek, La. (363); and Turkey, N.C. (270). There also are nine townships around the country named Turkey, three in Kansas.
Source: Population estimates http://www.census.gov/Press-Release/www/releases/archives/population/010315.html, http://factfinder.census.gov/servlet/BasicFactsServlet

8
Number of places and townships in the United States that are named Cranberry or some spelling variation of the red, acidic berry (e.g., Cranbury, N.J.), a popular side dish at Thanksgiving. Cranberry township (Butler County), Pa., was the most populous of these places in 2006, with 27,509 residents. Cranberry township (Venango County), Pa., was next (6,900).
Source: Population estimates http://factfinder.census.gov/servlet/BasicFactsServlet
http://www.census.gov/Press-Release/www/releases/archives/population/010315.html

28
Number of places in the United States named Plymouth, as in Plymouth Rock, the landing site of the first Pilgrims. Plymouth, Minn., is the most populous, with 70,102 residents in 2006; Plymouth, Mass., had 55,516. Speaking of Plymouth Rock, there is just one township in the United States named “Pilgrim.” Located in Dade County, Mo., its population was 135.
Source: Population estimates http://www.census.gov/Press-Release/www/releases/archives/population/010315.html, http://factfinder.census.gov/servlet/BasicFactsServlet

114.4 million
Number of households across the nation — all potential gathering places for people to celebrate the holiday.
Source: Families and Living Arrangements: 2006 http://www.census.gov/Press-Release/www/releases/archives/families_households/009842.html

 

Editor’s note: The preceding data were collected from a variety of sources and may be subject to sampling variability and other sources of error. Facts for Features are customarily released about two months before an observance in order to accommodate magazine production timelines. Questions or comments should be directed to the Census Bureau’s Public Information Office: telephone: 301-763-3030; fax: 301-763-3762; or e-mail: <pio@census.gov>.

 

Happy Thanksgiving!! 

Pilgrim Family

 

Google Books/Fed Docs: Google Books Statistics--The Bigger Picture

Now that I had some statistics it dawned on me I had no idea whether or not this was a lot documents.  So I was off to the FDLP desktop and the  Catalog of U.S. Government Publications.

I looked around the desktop to see if GPO listed any statistics.  On the "about" page for the CGP, GPO says merely that there are more than 500,000 records in the database.  So I gave some thought to how I might get a better figure, and off I went to OCLC and the GPO database in FirstSearch.  On the database info page, OCLC lists 507,000+ as the number of records and that the database had its last monthly update on August 8, 2007.

So I went back to the CGP and its advance search page.  Searching for GPO in the publisher field is not terribly effective.  Of course, in this database everything is a government document so that is not a problem. 

But how to get a real number out of the database?  I tried using the most common of words--a and the-- but to no great effect.  A brings up 359,875 records and the brings up 411,493.  Neither result comes close enough to the supposed 507,000.

I had another realization that the CGP now includes records for electronic titles--titles that would not be fodder for the Google Book Project. Using the New Electronic Titles page is not really an option to count them as it only goes back to April of 2005 and since early 2006 the monthly lists are not numbered (leaving me to do a lot of counting).

So back to the advanced searching page in the CGP.  Happily here you can search for terms in the URL/PURL. I proceeded to search for every record that listed .gov, .mil, .us, .org, and .com. I came up with a total of 64,504 records.  So approximately 13% of the records in the CGP are electronic titles or are titles with an electronic counterpart.

Unfortunately I had another realization that these figures really only represent documents published from 1976 on. This is a really big problem in that most of the documents I found in Google Books dated to before 1923.  My only hope to get good numbers was to askGPO.  So late on August 8th I shot off a query to GPO asking for statistics on the number of documents GPO has distributed both before and after 1976.

Surprisingly enough, GPO called me first thing the next morning. askGPO is notoriously slow in providing answers to queries so I was very surprised!  I spoke with Nancy Faget at GPO and she was very pleasant though not exactly forthcoming with numbers.  It struck me that I got the quick call back as GPO viewed my query as the first step in getting out of the program.  As far as I know my director has no real intentions of doing that, but I don't think I convinced her on that point.  But aside from that she told me that GPO really didn't know how many documents went to depositories since the beginning.  Alas!!

I honestly don't know if would be fair to take the view that probably as much was distributed from 1813 to 1976 as was published after 1976.  But if you did, that would lead to believe that over one million documents have been produced.

So the bigger picture suggests that the 167,878 titles in Google Books is only about 17% of all the documents that could be digitized.  At a guess...

So I put a call out to everyone in GovDoc Land.  If you are a full depository and have been one since 1813 and have kept really good records, could you please send me the statistics?  Thank you very kindly in advance!

 

 

Google Books/Fed Docs: Google Statistics from Scratch

Having found no published statistics for numbers of digitized books in Google Books, and especially nothing about digitized government publications, I was left with coming up with them on my own.

So I went to the Advanced Book Search screen for Google Books.  Looking at the search options provided there I decided that the only way I could get reasonably useful statistics was to search for books published by GPO.  As you are all aware not all government documents are actually published by GPO. Many are merely distributed by them.  So I knew that my numbers would not be exact.  Another problem was that over the years GPO listed themselves as publishers using a variety of abbreviations and phrases.

My first try was to use GPO  in publisher and on August 8th I retrieved 141,600 hits. However just now when I ran it again, I only got 117,600. Hmmm.

Next search was for Government Printing Office, which retrieved both today and on the 8th, 43,600 hits.  This was followed by gov't, which on the 8th retrieved 2,322 titles but today only retrieved 2,258.

The grand total for using these three searches on August 8th was 187,522.

Today as I was double checking my results, I also tried gov. print. off. and got 4,420 hits.  So as of this morning the grand total is 167,878.  I find it rather disconcerting that the number as dropped so much in nine days!

 

 

Google Books/Fed Docs: Google Books and Documents Coverage--From the Beginning

I have a hard time trying to figure out where to begin this blog, so I have decided to start at the beginning even though I have written a bit about this in a message posted to GOVDOC-L on August 8th. So here goes...

I was asked to find out how much government information is available in the various Googles.  Over the past few months I had saved posts from GOVDOC-L that had Google in the subject line; so I thought this would be an easy assignment.  Turns out that the messages did not give statistics, instead they were questions about Google's practice of making the full text of all books published after 1923 unavailable.

Well I was a bit disappointed but I still thought that I would find the information on Google's website. I figured that Google would be tooting its own horn about the growth of this infamous project.  Not so.  There are no statistics anywhere, and there was very little that described the scope of the project.

Next I went into research mode.  I checked for articles in EBSCO's Academic Search Premier and Lexis-Nexis.  I found some interesting news articles on the project but again no statistics. I then tried to search the web pages of a few library partners I looked at the University  I had a little luck on Stanford's web site on Robotic Book Scanning.  There was a page a few statistics listed there but alas they dated from June2004.

I even Googled such keywords that I hoped would bring up statistics. But considering how many different way one might refer to statistical information, it was frustrating to do.  I didn't find any statistics this way but I did find some intersting Blog entries about the full-text copyright issue.

So I was on my own.

Syndicate content Syndicate content