Note: The project period was January 18, 2008 through April 18, 2008.
The participation period for this project has closed
Please see below for unique tags assigned to documents. To view the tagging directly on del.icio.us, please see http://del.icio.us/tag/epapilotproject
Update 4/23/2008 - Data has been compiled into a spreadsheet available at http://spreadsheets.google.com/pub?key=pybymZBlZ80PVat2ggty2GA
Interpretation to follow.
Update 5/7/2008 - The results report is finished and may be read and commented on at http://freegovinfo.info/node/1825.
Below is the original project announcement:
============================
Free Government Information needs your help to investigate whether social tagging of government documents is a viable idea.
We have stashed 32 documents from the Government Printing Office's EPA Web Harvesting Pilot Project in the Internet Archive. We would like as many people as possible to bookmark, tag and provide brief descriptions of all 32 of these test documents using the del.icio.us bookmarking service.
If you would like to join this effort and have a del.icio.us account, please follow this proceedure:
1) Visit http://www.archive.org/search.php?query=epapilotproject and go to a document on the list. Open the pdf file in a separate browser window.
2) In del.icio.us, tag the page for the Internet Archive record (i.e. not the PDF file) after examining the PDF file.
3) In the del.icio.us "notes" field, write a one or two sentence description of what the document is about.
4) In the tags field, please use epapilotproject, for:freegovinfo and then any tags that you feel describe this document.
Please do steps 2-4 for as many documents as you can, ideally all 32.
We are going to run this project for three months, then the FGI volunteers will compile data on the following:
A) How many people participated in the project.
B) How many documents were tagged.
C) How many documents were described.
D) The average number of tags per document.
We will also examine how much agreement on tags exist for a given document.
We have a belief, based on projects like NASA Clickworkers, GalaxyZoo and the Library of Congress' Flickr project, that the community of government documents users can improve the findability of government information and provide a valuable adjunct to traditional cataloging. We also believe that a successful tagging environment will provide better access than GPO's newly declared brief bibliographic records process. Time will tell. Help us find out!
=====================================================
List of harvested EPA test titles for this project:
Aerosol Test Facility at Research Triangle Park - Aerosol-propellants, Environmental-health-Research, Research-Triangle-North-Carolina, Terrorism-Prevention
Air Quality Data Analysis Technical Support Document for the Proposed Interstate Air Quality Rule - air, pollution, quality, data, ambient, monitoring
Air Sealing: Building Envelope Improvements - Air, air-sealing, airsealing, building-insulation, efficient, energy, energy-efficiency, Energy-Star-Branding, energyconservation, energystar, epa, EPA-advertising, globalwarming, greenhousegases, home-building, home-building-techniques, home-construction, home-improvement, homes, hvac, indoor, leakage, money-saving, quality, sealing, ventilation
Analysis of Atmospheric Deposition of Mercury to the Savannah River Watershed - water-quality, mercury-levels, water-pollutants, Clean-Water-Act, water-testing-results
Approval of Urban Bus Retrofit/ Rebuild Equipment - Air, Air-quality, Air-toxins, buses, emissions, engine, engines, engines-retrofit-and-rebuild-equipment, matter, particulate, pollution, retrofit, transit-buses-emissions, Urban-transportation
Approval of Urban Bus Retrofit/ Rebuild Equipment (Oct 1997) - Air, buses, Clean-Air-Act, emissions, engine, engine-retrofit-rebuild-kits, engines, matter, Particulate, particulate-matter, pollution, retrofit, transportation, Urban-transit
Are You One of the Top 20? - 2005, benefits, Best-Workplaces-For-Commuters, business, commuters, commuting, companies, emission, employers, epa, flexible-scheduling, fortune500, govdocs, incentives, misspellings, private-transportation, public-transportation, reduction, telecommuting
Are You Ready to Take Advantage of the New Commercial Tax Incentives - Energy-Star, Energy-saving-tax-decuction, Commercial-buildings, Commercial-building-improvements
Arsenic Rule Benefits Analysis: an SAB Review - Arsenic, Arsenic-levels, Cancer-causing-agents, costbenefitanalysis, drinking, environment, Environmental-health-Research, exposure, exposurelevels, Freegovinfo, health, public, standards, water, Water-quality, Water-treatment-costs
Best Workplaces for Commuters Application Form - Applicationforms, applications, audience:hr, benefits, bestworkplaces, carpool, carpools, communting, commuter, commuter.benefits, commuters, commutersepa, commuting, employerincentives, employers, environment, environmental, environmentalimpactofcommuting, epa, etc., forms, impact, incentives, program, publictransportation, telecommuting, telework, transportation, vanpool, vanpools
Best Workplaces for Commuters Graphic Standards and Usage Guide - EPA-branding, Best-Workplaces-For-Commuters, Government-agencies-public-relations
Boxed In? - 2004, airpollution, cleanair, emissions, environmentally-friendly-shipping, epa, fleet, gases, global, globalwarming, govdocs, greenhouse, greenhousegases, management, money-saving, posters, shipping, smartway, transportation, vehicle, warming
Business Case for Information Services: EPA's Regional Libraries and Centers - Environmental-libraries-United-States, Environmental-libraries-United-States-Costs-and-benefits
Calculation and Use of First-Order Rate Constants for Monitored Natural Attenuation Studies - attenuation, attenuation-rates, biodegration-rates, contaminants, contamination, epa, govdocs, ground, groundwater, mna, monitored, natural, plume-concentrations, remediation, research, water
Carpool Incentive Programs: Implementing Commuter Benefits as One of the Nation's Best Workplaces - air, carpools, commuters, commuting, employers, Employers-and-employees, EPA-advertising, EPA-branding, incentives, pollution, transportation, Workplace-conditions
Chloroneb - Chloroneb, pesticides, pesticides-safety, Cotton-crop-management-and-control, ornamental-plants-and-grasses-pesticide-treatment
Community Involvement Plan for the Copper Basin Mining District Site, Polk County, Tennessee - Freegovinfo, Copper, Basin, mining, community, involvement, cleanup, environmental-cleanup, community-involvement-in-environmental-programs-details
Conformity SIP guidance - transportation-regulations-states, Conformity-state-improvement-plans-SIPs, transportation-federal-regulation, transportation, conformity, SIP, air, quality, standards
Development of a Performance-based Industrial Energy Efficiency Indicator for Automobile Assembly Plants - vehicle-assembly-plants-energy-use, assembly-plants-energy-efficiency, automobile-assembly-energy-use-studies, manufacturing-process-energy-used
Diclofop-Methyl - 2000, bioaccumulation, cancer, carcinogens, Commercial-use-of-pesticides, golf-courses, diclofop-methyl, epa, Freegovinfo, golf, govdocs, herbicides, pesticides, reregistration, toxicity, toxicology,
wild-oats-control
Diesel Retrofits: Quantifying and Using Their Benefits in SIPs and Conformity - emissions, Diesel-engines, engines, engine-rebuild-retrofit-kits, environmental-state-implementation-plans-SIPs, environmental-regulation-states environmental-regulation-federal
Difenzoquat - Difenzoquat, pesticides, wild-oats-control, barley-crop-yields, wheat-crop-yields, agriculture-crops-and-yields, difenzoquat, herbicides, pesticides
Energy Star Wins the Bid - 2006, bottled_water, efficiency, electricity, energy, energy_star, energy-efficiency, energy-efficient-water-coolers, energy.conservation, energy.efficiency, energy.star, energyconservation, energystar, environmental-benefits, epa, EPA-branding, Energy-Star, freegovinfo, govdocs, money-saving, pressreleases, purchasing, umaine, water
ENERGYSTAR Building Upgrade Manual - Energy-Star, Buildings-energy-saving-improvements, Energy-savings-plans, Energy-costs
Environmental Economics Research Strategy - Environmental-economics-influences, Behavioral-science-economic-impacts, Behavioral-science-effect-on-policy-development, Business-and-human-behavior
Environmental Results Under EPA Assistance Agreements - Tagged with (gmp), 2005, assistance.agreements, assistance, agreements, compliance, environment, environmental, environmental.protection, epa, epa.policy, epa.strategic.plan, evidence-based, funding, goals, objectives, governmentagreements, grant, grantee, grants, management, outcome-based, plan, policies, programs, regulations, research, results, results-oriented, strategic
EPA's Diesel Retrofit SIP and Conformity Guidance - emissions, Diesel-engines, engines, engine-rebuild-retrofit-kits, environmental-state-implementation-plans-SIPs, environmental-regulation-states, environmental-regulation-federal
Final Emission Standards for 2004 and Later Model Year Highway Heavy-Duty Vehicles and Engines
Guidance for Quality Assurance Project Plans - Quality, assurance, environmental, data, EPA-quality-assurance-project-plans, EPA-QA-project-plans, Organizational-quality product-quality
Guide to Technology Commercialization Assistance for EPA Small Business Innovation Research - Small-Business-Innovation-Research, Small-business-finances, small, business, innovation, technology, commercialization
Heavy-Duty Engine Emission Standards for Highway Trucks and Buses - trucks, transportation, Air-quality-history, trucking-industry, emissions, NOx-standard, Nitrogen-oxides, Global-warming greenhouse-effect
Preliminary Risk-Based Screening Approach for Air Toxics Monitoring Data Sets - Air, Air-quality, air-toxics, Air-toxins, assessments, Clean-Air-Act, data, Data-analysis, data-screening, dqo, freegovinfo, methodology, monitoring, pollution, r4-slt, risk-based, Screening, sets, toxics, Biomarkers








Comments
why in the Internet archive? why not the pages themselves?
As someone who has worked with EPA since pre-web days, I understand the desire to make their content more easily found. but it's not clear to me why you'd do that via copies stashed at the Internet Archive, rather than the document URL (on the EPA site) itself?
After all, EPA has some 750,000 documents, many of which are of transient value at best. Many of the documents become superceded by new regulations, new interpretation of the rules, new scientific data -- and while that obsolete information probably has SOME historical value, it is not the first thing someone ought to find when they are looking for information on, for instance, the Clean Air Act.
I'm a big believer in social tagging, and look forward to seeing what this experiment accomplishes -- but I'm also a big believer in the idea that the URL IS the document, and should be treated as -- well, perhaps not sacred, but at least with great respect for what it represents, which is the authoritative location of a given document.
In the case of the EPA, where documents may contain legal interpretations and rulings that directly impact how businesses and individuals behave, this is especially true -- pointing to the "wrong" document can have significant negative consequences for those trying to comply with the regulations.
IA is a testbed for tagging
Hi Scott,
I understand and sympatheize with your concern for authenticity and up-to-date information. These are concerns we at FGI share.
For the purposes of this test, we needed URLs we could guarantee would not be deactivated during our pilot project. That's why we saved the 32 documents to the Internet Archive.
If as a result of this project, social tagging is accepted as a way of improving findability for government documents, I would expect either the agency or some other suitably authenticated copy (GPO, LOCKSS-Distributed FDLP copy, etc) would be the copy of the document tagged.
The agency URL might not be the best place to go because of the volitile nature of information on the web. Some research has suggested that the average web document has an active life between 77 days and 4 years. Even the upper limit isn't very long for people wishing to document government information for longer than a presidential term. See this article for some reasonably up-to-date information on the topic of web volitility:
The Australian Library Journal (2005)
Still lost in cyberspace? Preservation challenges of Australian internet resources
Wendy Smith
http://www.alia.org.au/publishing/alj/54.3/full.text/smith.html
If other people know of more recent studies, please add them to comments.
Finally, let me assure you that FGI has no long term plan to post large volumes of government documents to the Internet Archive. That would be too much effort for any one organization.
------------------------------------
"And besides all that, what we need is a decentralized, distributed system of depositing electronic files to local libraries willing to host them." -- Daniel Cornwall, tipping his hat to Cato the Elder for the original quote.
Can't afford to rely on agencies alone
Social tagging is but one way to collect and give access to govt information. Carl Malamud at public.resource.org has been putting large amounts of govt information out on the open Web as a way to diffuse that information. That includes putting video and text at the internet archive, Smithsonian images on flickr, video on YouTube, and case law up on his own servers (read press release (PDF)).
I'm really heartened by Carl's work to get government information out to the public. Federal Agency CIO's, GPO and all government information producers would do well to follow these 8 Open Government Data Principles. Agencies, libraries and non-profit organizations will need to work together in order to assure easy access to and long-term preservation of government information. As Daniel points out, the public can't afford to rely on govt agencies alone in this endeavor.
2008 Study on Web Stability
I just found a brand new article about the persistance of web documents:
Casserly, M., & Bird, J. (2008, January). Web Citation Availability: A Follow-up Study. Library Resources & Technical Services, 52(1), 42-53. Retrieved February 19, 2008, from Professional Development Collection database.
If you have a subscription to EBSCOhost, you should be able to access the full text of the article at:
http://search.ebscohost.com/login.aspx?direct=true&db=tfh&AN=29379006&site=ehost-live
Overall, it looks like the persistance of URLs is not improving:
All the more reason to include libraries as custodians of digital content. They have a long term view to access and preservation that for some good reasons is not shared by commercial vendors and government agencies.
------------------------------------
"And besides all that, what we need is a decentralized, distributed system of depositing electronic files to local libraries willing to host them." -- Daniel Cornwall, tipping his hat to Cato the Elder for the original quote.
Post new comment