Interesting

AI tools show limitations in diagnosing atypical emergency room cases

Artificial intelligence tools can assist emergency room physicians in accurately predicting disease but only for patients with typical symptoms, West Virginia University scientists have found.

Gangqing "Michael" Hu, assistant professor in the WVU School of Medicine Department of Microbiology, Immunology and Cell Biology and director of the WVU Bioinformatics Core facility, led a study that compared the precision and accuracy of four ChatGPT models in making medical diagnoses and explaining their reasoning.

His findings, published in the journal Scientific Reports, demonstrate the need for incorporating greater amounts of different types of data in training AI technology to assist in disease diagnosis.

More data can make the difference in whether AI gives patients the correct diagnoses for what are called "challenging cases," which don't exhibit classic symptoms. As an example, Hu pointed to a trio of scenarios from his study involving patients who had pneumonia without the typical fever.

In these three cases, all of the GPT models failed to give an accurate diagnosis. That made us dive in to look at the physicians' notes and we noticed the pattern of these being challenging cases. ChatGPT tends to get a lot of information from different resources on the internet, but these may not cover atypical disease presentation." 

Gangqing "Michael" Hu, Assistant Professor, WVU School of Medicine Department of Microbiology, Immunology and Cell Biology 

The study analyzed data from 30 public emergency department cases, which for reasons of privacy did not include demographics.

Hu explained that in using ChatGPT to assist with diagnosis, physicians' notes are uploaded, and the tool is asked to provide its top three diagnoses. Results varied for the versions Hu tested: the GPT-3.5, GPT-4, GPT-4o and o1 series.

"When we looked at whether the AI models gave the correct diagnosis in any of their top three results, we didn't see a significant improvement between the new version and the older version," he said. "But when we look at each model's number one diagnosis, the new version is about 15% to 20% higher in accuracy than the older version."

Given AI models' current low performance on complex and atypical cases, Hu said human oversight is a necessity for high-quality, patient-centered care when using AI as an assistive tool.

"We didn't do this study out of curiosity to see if the new model will give better results. We wanted to establish a basis for future studies that involve additional input," Hu said. "Currently, we input physician notes only. In the future we want to improve the accuracy by including images and findings from laboratory tests."

Hu also plans to expand on findings from one of his recent studies in which he applied the ChatGPT-4 model to the task of role playing a physiotherapist, psychologist, nutritionist, artificial intelligence expert and athlete in a simulated panel discussion about sports rehabilitation. 

He said he believes a model like that can improve AI's diagnostic accuracy by taking a conversational approach in which multiple AI agents interact.

"From a position of trust, I think it's very important to see the reasoning steps," Hu said. "In this case, high-quality data including both typical and atypical cases helps build the trust."

Hu emphasized that while ChatGPT is promising, it is not a certified medical device. He said if health care providers were to include images or other data in a clinical setting, the AI model would be an open-source system and installed in a hospital cluster to comply with privacy laws.

Other contributors to the study were Jinge Wang, a postdoctoral fellow, and Kenneth Shue, a lab volunteer from Montgomery County, Maryland, both in the School of Medicine Department of Microbiology, Immunology and Cell Biology; as well as Li Liu, Arizona State University. The work was supported by funding from the National Institutes of Health and National Science Foundation.

Hu said future research on using ChatGPT in emergency departments could examine whether enhancing AIs' abilities to explain their reasoning could contribute to triage or decisions about patient treatment.

Source:

West Virginia University

Journal reference:

Wang, J., et al. (2025). Preliminary evaluation of ChatGPT model iterations in emergency department diagnostics. Scientific Reports. doi.org/10.1038/s41598-025-95233-1.


Source: http://www.news-medical.net/news/20250523/AI-tools-show-limitations-in-diagnosing-atypical-emergency-room-cases.aspx

Inline Feedbacks
View all comments
guest

Study finds sharp rise in HIV prevention medication use among American youth

Eight times more American young adults now take medication to protect them from HIV than a decade ago,...

Exercise and diet advice misses the mark in improving heart health around the globe

A leading cardiovascular disease researcher from Simon Fraser University is ringing the alarm on universal recommendations intended to...

Tufts researchers develop dental floss sensor for real time stress monitoring

Chronic stress can lead to increased blood pressure and cardiovascular disease, decreased immune function, depression, and anxiety. Unfortunately,...

Blood cell-free RNA signatures can predict preterm birth months in advance

Children born before 37 weeks of gestation have a considerably increased risk of dying before they reach the...

Rare cancer gene found in sperm donor sparks European regulatory concerns

A case in which a sperm donor was later found to be carrying a cancer-causing pathogenic variant in...

Global female infertility rates surge, hitting women in their late 30s hardest

A sweeping new analysis reveals that the burden of female infertility has soared over the past three decades,...

Republicans aim to punish states that insure unauthorized immigrants

President Donald Trump's signature budget legislation would punish 14 states that offer health coverage to people in the...

NIH scientists pioneer new retinal grafting technique for dry age-related macular degeneration

National Institutes of Health (NIH) scientists have developed a new surgical technique for implanting multiple tissue grafts in...

Improved acoustics can lower stress and crying in preschool children

When children are dropped off at a school or day care for the first time, there can be...

Poorer countries face tenfold higher burn mortality due to treatment gaps

Missing evidence and limited treatment options mean deaths from burn injuries are ten times higher in poor countries...

Long-term study confirms safety and effectiveness of rivaroxaban for children

Venous thromboembolism (VTE) is a life-threatening complication in children with serious underlying conditions such as heart defects or...

Study: Millions still lack access to basic eye care worldwide

Millions of people across the world still lack access to basic eye care such as glasses according to...

Public views vary widely about neurotechnologies for brain-based conditions

Q: How would you summarize your study for a lay audience?  Given the rise in brain-based conditions and...

Social connection remains an overlooked health factor, research shows

Research confirms that social isolation and loneliness significantly impact health and mortality, even if not listed on death...

Study highlights economic burden of RSV in European children requiring primary care

Infections from respiratory syncytial virus (RSV) in children requiring primary care led to significant societal economic costs from...

New guideline aims to help primary care clinicians diagnose and treat hypertension

A new guideline to diagnose and treat hypertension is aimed at helping primary care clinicians, including family physicians,...

Wastewater monitoring offers new tool for cervical cancer prevention

Scientists in Uruguay have found genotypes of the Human Papillomavirus (HPV) linked to cervical cancer in urban wastewater, saying it...

Metagenomic next-generation sequencing improves pulmonary infection diagnosis

A recent study on the application of Metagenomic next-generation sequencing (mNGS) found that mNGS can achieve early detection...

Study uncovers new factor linked to the development of cardiac hypertrophy

When the workload on the heart increases, the ventricular wall may thicken too, known as cardiac hypertrophy. This...

Aldosterone synthase inhibitor offers hope for treatment of uncontrolled hypertension

Lorundrostat, a novel therapy which blocks the production of aldosterone from the adrenal glands, demonstrated clinically meaningful and...