Home » post » Lancet investigation shows data manipulation in major U.S. health datasets

Lancet investigation shows data manipulation in major U.S. health datasets

Well, this is disturbing and could have ripple effects through public health research that is sure to erode confidence in federal data.

A new study in the medical journal The Lancet entitled “Data manipulation within the US Federal Government” reports that more than 100 United States government health datasets have been altered this spring without any public notice. The investigation shows that nearly half of the files examined underwent wording changes while leaving the official change logs blank. This news was posted to PsyPost, a science news site that covers psychology, psychiatry, neuroscience, sociology, and similar fields.

To reach these findings, the researchers started by downloading the online catalogs —- known as harvest sources —- that federal agencies maintain under the 2019 Open Government Data Act (which was incorporated into the Foundations for Evidence-Based Policymaking Act of 2018). They gathered every entry from the Centers for Disease Control and Prevention (CDC), the Department of Health and Human Services, and the Department of Veterans Affairs that showed a modification date between January 20 and March 25, 2025.

After removing duplicates and files that are refreshed at least monthly, the team was left with 232 datasets. For each one, they located an archived copy that pre‑dated the study window, most often through the Internet Archive’s Wayback Machine.

They then used the comparison feature in a word‑processing program to highlight every textual difference between the older and newer versions. Only wording was assessed; numeric tables were not rechecked. Finally, the investigators opened the public change log that sits at the bottom of each dataset’s web page to see whether the alteration had been declared.

Across the full sample, the pattern was strikingly consistent. One hundred fourteen of the 232 datasets —- nearly 50 percent! -— contained what the authors judged to be potentially substantive wording changes. Of these, 106 switched the term “gender” to “sex.” Four files replaced the phrase “social determinants of health” with “non‑medical factors,” one exchanged “socio‑economic status” for “socio‑economic characteristics,” and a single clinical trial listing rewrote its title so that “gender diverse” became “include men and women.”

CC BY-NC-SA 4.0 This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Archives

Powered by WordPress / Academica WordPress Theme by WPZOOM