Skip to main content
Skip to main content
Back to Grand Rounds
Grand RoundsWeekly Evidence Brief

Radiology

Edition

30-Second Takeaway

  • Most commercial radiology AI validations lack per‑subgroup performance reporting, limiting bias assessment.
  • A single non‑open VLM matched or exceeded radiologists for ED chest x‑ray report acceptability.
  • Radiomics ML for bladder cancer shows strong AUROCs but has high risk of bias and low certainty.

Week ending June 13, 2026

Selected AI, radiomics, and communication methods with immediate relevance to radiology practice

Most commercial radiology AI validations omit demographic subgroup performance.

EUROPEAN RADIOLOGYJun 13, 2026

This scoping review screened 545 validation studies of 252 commercial radiology AI products and found only 392 reported any demographic subgroup data. Just 77 studies presented subgroup performance results, preventing robust assessment of algorithmic bias across sex, age, and race/ethnicity. 14 of 21 tuberculosis datasets were likely underpowered for post‑hoc subgroup meta‑analysis, limiting minority‑group inference. Authors conclude fragmented reporting impedes clinician trust and call for mandatory, transparent subgroup performance reporting by stakeholders.

One vision‑language model produced more acceptable ED chest x‑ray reports than radiologists.

EUROPEAN RADIOLOGYJun 10, 2026

In 478 ED patients with same‑day CXR and CT, the VLM “AIRead” had higher clinical acceptability than radiologist reports (84.5% vs 74.3%). AIRead also showed a lower RADPEER 3b rate (5.3% vs 13.9%) with hallucination rates comparable to radiologists. Other tested VLMs had higher disagreement, lower acceptability, and more hallucinations, with variable sensitivity for common thoracic findings. These results support piloting select VLMs for preliminary CXR reporting but require local validation against CT and radiologist performance.

AI‑assisted DICOM label standardization reduced reading times and achieved high label accuracy.

EUROPEAN RADIOLOGYJun 10, 2026

A hybrid AI tool standardized DICOM labels across 422 CR images and 1503 CT series with labeling accuracies from 83–100% for body part and 91–100% for plane. After implementation, mean reading times fell significantly for several CT protocols (eg, CT abdomen −2.9 minutes; temporal bone −2.5 minutes). Extrapolated savings equaled 270 hours annually, with relative efficiency gains of 8–22% for impacted protocols. Evaluate accuracy on your local study mix and monitor for unchanged protocols (CT chest, sinus) that showed no time benefit.

References

Numbered in order of appearance. Click any reference to view details.

Additional Reads

Optional additional studies from this edition.

Edition context

Clinical signal

  • Require subgroup performance data when evaluating commercial AI before clinical deployment.
  • When piloting report VLMs, monitor RADPEER disagreement and hallucinations against local radiologist benchmarks.
  • Treat radiomics bladder models as investigational until prospective, low‑bias validation exists.