Medical AI Models Built on Questionable Data, Study Warns
A new analysis has uncovered a troubling practice in medical AI development: dozens of predictive models for conditions like stroke and diabetes were trained on datasets of dubious origin. The...
A new analysis has uncovered a troubling practice in medical AI development: dozens of predictive models for conditions like stroke and diabetes were trained on datasets of dubious origin. The findings, detailed in a recent preprint, suggest these models may have already influenced clinical decisions.
Researchers identified 124 peer-reviewed papers that relied on two popular open-access health datasets. Statistician Adrian Barnett, who led the investigation, noted multiple statistical oddities inconsistent with real patient information, raising suspicions the data could be fabricated. "It was an enormous surprise," Barnett stated.
At least two models trained on this data are documented in hospital use in Indonesia and Spain. Another appears in a 2024 patent application, and two are live web tools where individuals can assess personal health risks.
"Prediction models trained on provenance-unknown data have no place in clinical decision-making. They are intrinsically unreliable," said public-health researcher Soumyadeep Bhaumik. He warned such tools could lead to incorrect predictions, resulting in inappropriate treatment decisions.
The study calls for immediate action: journals should reject papers that don't disclose data sources, and platforms should remove the flagged datasets. One dataset, downloaded over 288,000 times from Kaggle, claims to be for "educational purposes only" but lacks a verifiable source. Its creator declined to comment on its origin.
This incident underscores a critical vulnerability in AI-driven medicine. As Barnett's team concludes, the integrity of the data is the foundation of any reliable model. Without transparency, the promise of diagnostic AI remains dangerously unfulfilled.
Source: Nature
Ready to Modernize Your Business?
Get your AI automation roadmap in minutes, not months.
Analyze Your Workflows →