Home » Posts tagged 'cgp'
Tag Archives: cgp
Our mission
Reject GPO’s proposal to drop metadata from CGP
March 3, 2019 / Leave a comment
The Government Publishing Office has a brief proposal to omit some metadata in the Catalog of Government Publications (CGP). As written in the announcement, the proposed change in policy seems simple and obvious. GPO says that, for publications in govinfo, “historic URLs” (the original, source URL) and PURLs (the Persistent Uniform Resource Locators) are “identical.” GPO reasons that, since the the two fields are “redundant,” the historic URLs are unnecessary:
“LSCM proposes to cease the inclusion of historic URLs only in catalog records for resources in govinfo.”
The announcement ask for feedback on how the policy change would affect “cataloging/metadata and other operations and processes.”
The proposal should be rejected for three reasons.
1. The premise is wrong
GPO’s premise is flat out wrong. The information recorded in PURLs and “historic URLs” is neither identical nor redundant. PURLs change over time and “historic URLs” stay the same. The two fields record different information even when they are the same: PURLs record the current location of a resource and “historic URLs” record the original location of the resource. PURLs exist because URLs change.
(If GPO could guarantee that the public URL of govinfo items will never change, then it could just as easily advocate not including PURLs in CGP records. But that would, obviously, be crazy. PURLs are needed to account for the problem of “link rot.” Link rot is a well documented problem including for government information. GPO’s own digital repository has had already had three base addresses over the years requiring PURL redirects: access.gpo.gov, gpo.gov/fdsys, and govinfo.gov.)
Because PURLs are necessary, the issue GPO should be addressing is whether or not the original URL is a valuable piece of metadata — even after it “rots” (changes). And the question GPO should be asking is not how omitting that information will affect “cataloging/metadata and other operations and processes,” but how it will affect users and long-term access to digital resources.
2. The historic URL is valuable to users
The original URL of govinfo resources is a valuable piece of metadata for users. They can use it to locate a copy of a document in other archives (e.g., the Internet Archive, the End of Term Archive). They can use it to compare copies when more than one institution has archived the same content (perhaps at different times, reflecting changes over time).
The historic URL is essential to retrieving copies in other archives and this would be vital if technical, administrative, legal, or budgetary changes interfered with GPO keeping PURLs up to date, or keeping its PURL server online, or keeping its online copy available. It would also be vital if government information is intentionally withdrawn or privatized. There is no good reason to deny such essential information to users.
Some resources in govinfo (such as the Federal Register) are available in other versions from other government sources. The CGP records should record the historic URL so that users can understand which versions are described and to which version the PURL points.
The govinfo website provides users with the URLs of resources, but does not provide PURLs. Most users of resources in the TDR will have those public govinfo URLs and not PURLs. Users who cite those resources and provide links to them will likely use those URLs, not PURLs. Having those URLs in CGP would help users find the PURL when those URLs change (as they inevitably will).
3. The proposal ignores the evolution of govinfo
The proposed policy change is notable in what it leaves out: the future. At its inception, the policy would, apparently, affect only content in GPO’s Trusted Digital Repository (TDR).[1] The proposal should explicitly define the scope of the policy.
Currently, as far as we can tell[2], the TDR contains two kinds of resources with CGP records: Born-digital items that are first published by GPO in either fdsys or govinfo (example CGP record: America’s water infrastructure needs and challenges); and digitized paper items that GPO has ingested into fdsys or govinfo (example CGP record: Internal revenue cumulative bulletin.).
But CGP includes historic URLs for other kinds of resources. For example:
- Items harvested by GPO into permanent.access.gpo.gov and into wayback.archive-it.org. (Example CGP record: HealthCare.gov.)
- Digitized paper items not ingested into gpo.gov/fdsys or govinfo. (Example CGP record: Conservation and full utilization of water )
This prompts several questions that the policy proposal does not address: Will GPO’s TDR ever contain other resources? Will it evolve and expand over time? Will the policy affect other kinds of content if they become “resources in govinfo”?
This is important because the historic URLs in CGP records for resources that, apparently, are not affected by the new policy today, could be affected if they were ingested into the TDR (or if the policy is broader than we think it is, or if GPO interprets the policy differently in the future, etc.).
The “historic URLs” for that other content is even more important in some cases than it is for the content currently in the TDR.
The source URL of a harvested item establishes the provenance (origin) of the item archived. The standard for Trusted Digital Repositories requires TDRs to preserve information on the provenance of resources they archive (TDR 4.2.6.3). If GPO ever ingests harvested content into its TDR, it will need the historic URL. Having it in CGP records could provide a useful link between CGP and the GPO’s repository.
Digitized paper resources present a special problem for users. More than one digitization project may digitize the same paper document or different editions or versions of the same title. Differences can occur because of the use of different source copies and because of different standards and levels of accuracy of different digitization processes. This is further complicated in CGP by the fact that some CGP records point to non-government websites (example: Joan of Arc saved France), some point to the TDR (example: An account of the receipts and expenditures of the United States), and some point to harvested content (example: Pots and pans for your kitchen). There are probably other permutations and combinations that we have not discovered. The “historic” URL (e.g., its original location) would help the user identify the source of the digitization and would enable the user to obtain a copy from that source or from an archive of that source.
Conclusion
The “historic URLs” in CGP provide information to users that PURLs do not. That information is useful to users because it will help them identify, understand, and locate copies of resources. “Historic URLs” may seem unnecessary to GPO today, but they will increase in value to users over time. Making a decision for “resources in govinfo” today fails to take into account what resources may be in GPO’s TDR in the future (including harvested content and digitizations). The proposal to drop historic URLs is short-sighted. Dropping historic URLs today would be a mistake that users would resent in the future.
GPO should clarify the scope of the policy and how it would be applied in the future and evaluate its effects on users and long-term access.
Authors:
James A. Jacobs, University of California San Diego
James R. Jacobs, Stanford University
Endnotes
- The announcement mentions getting TDR certification and refers to “resources in govinfo.” GPO does not make it clear, however, if “govinfo” is a reference to its digital repository or the website www.govinfo.gov. Without a clarification, the scope of the policy proposal is unclear and ambiguous. The website runs using the content management system drupal. It appears that everything in the TDR is available through the website, but not everything listed on the website is available in the TDR (e.g., see: the browse page that points to GPO partner resources). For the most part, though, the website appears to be the public interface to the TDR. As far as we know, GPO has not said if that will always be the case. Presumably, the website could be used to point at content in more than one repository (permanent.gpo.gov, for example), or GPO might want to ingest into the TDR the content it has harvested at wayback.archive-it, or GPO might replace drupal with a different CMS. Any of these would result in a different scope of “resources in govinfo.” ↵
- Since GPO has not yet shared its TDR certification report publicly, our evaluation is based on what we can infer from using the govinfo website. ↵
GPO to share metadata with EBSCO
December 1, 2010 / Leave a comment
On the both/and front, this is good news indeed. GPO will soon begin sharing its metadata from the Catalog of Government Publications (CGP) with the EBSCO discovery service. This will break down to govt documents silo, combine non-documents metadata with that from the federal govt, extend the findability of US govt publications to students and researchers, AND point them to depository libraries for access — all the things we’ve been advocating here at FGI! I hope GPO is talking with other database vendors to do the same.
U.S. Government Printing Office content available through EBSCO Discovery Service
Metadata from the U.S. Government Printing Office (GPO) will soon be searchable through EBSCO Discovery Service (EDS) from EBSCO Publishing. EDS Customers will be able to search for federal records from the Government Printing Office’s Catalog of U.S. Government Publications.
The U.S Government Printing Office provides publishing & dissemination services for the official and authentic government publications to Congress, federal agencies, federal depository libraries, & the American public. GPO resources that will be available through EDS include federal publications from the following catalogues:
- Congressional Serial Set Catalog
- Congressional Publications
- GPO Access Publications
- Internet Publications
- Periodicals
- Serials
Three Spendid Govdocs
April 3, 2010 / 2 Comments on Three Spendid Govdocs
In the process of searching the March batch of publications reported to the Lost Docs Blog against the Catalog of Government Publications (CGP), I came across the amusing factoid that there appear to be three and only three federal government documents with the word “splendid” in their title, at least according to the CGP:
- Splendid vision, unswerving purpose : developing air power for the United States Air Force during the first century of powered flight. D 301.82/7:C 33
- Tearing up the ground with splendid results : historic mining on the Coronado National Forest / 1995 Farrell, Mary M. A 13.101/2:15
- The most splendid carpet / 1978 Anderson, Susan H. I 29.2:C 22
Hopefully soon, these three docs will be joined by a fourth which was reported to GPO in March 2010:
Three splendid little wars : the diary of Joseph K. Taussig, 1898-1901 D 208.210:16
Wouldn’t that be splendid? Anyone know of other words rarely used in govdoc titles? Do you think there are more or less than seven “magnificent” feddoc titles?
Latest Posts
- FOIA Advisory Committee 2018-2020 term recommendations coming into focus
- GAO report on improving public access to research results
- Toward a Shared Agenda: Report on PEGI Project Activities for 2017-2019
- Advocates Call on Congress to pass legislation to make PACER free
- Do Not Assume PDF files are all permanent
Blogroll
- ASU Gov Docs
- beSpacific
- Best. Titles. Ever. (Tumblr)
- Center for Effective Government
- Every CRS Report New Reports RSS Feed
- FDLP Desktop
- FDLP News & Events
- FullTextReports
- GISIG UW-SLIS: Gov Info, Sources, Data & Docs
- Government Book Talk
- Government Information Network (Canada)
- Government Information News from Fondren Library, Rice University
- GPO [twitter]
- INFOdocket
- Information Observatory
- Libraries+ Network
- Library Babel Fish by Barbara Fister
- NARA records express
- Open The Government
- Secrecy News
- SLA GovInfo [twitter]
- StatFountain
- Sunlight Foundation
- University of Washington Gov Pubs Finds
Latest Comments