We are moving our newsletter to Substack for a better experience!
In Week #187 of the Doctor Penguin newsletter, the following papers caught our attention:
1. Breast Cancer Screening. European guidelines recommend double reading of screening mammograms for higher sensitivity, but double reading can be difficult to sustain because of a shortage of breast radiologists in many countries.
Lång et al. conducted the first randomized controlled trial with 80K women to assess the safety of an AI-supported mammography screen-reading process. The AI system triaged screening exams to single or double reading and identified suspicious calcifications and soft-tissue lesions with a regional risk score. Compared to standard double reading without AI, the AI-supported screening detected 20% more cancers, reducing the screen-reading workload by 44.3% without affecting the rates of recall, false positives, or consensus meetings. The AI system was deemed safe, as its cancer detection rate exceeded the prespecified lowest acceptable limit for safety. The effect of this AI system on screening outcome, with a primary outcome of interval cancer rate, will be investigated in 100,000 participants after a 2-year follow-up study.
The Lancet Oncology
2. Generalist Biomedical AI. The Generalist Biomedical AI system is a unified model capable of interpreting multiple biomedical data modalities and performing various downstream tasks using the same set of model weights.
Tu et al. introduce MultiMedBench, an open-source multimodal medical benchmark for evaluating generalist biomedical AI. The benchmark comprises language, medical imaging, and genomics modalities with 14 diverse biomedical tasks including question answering, visual question answering, medical image classification, radiology report generation and summarization, and genomic variant calling. Leveraging MultiMedBench, they develop Med-PaLM Multimodal (Med-PaLM M), a multimodal generative model based on pretrained PaLM (a large language model) and pretrained ViT (a vision encoder). Its sequence-to-sequence architecture effectively incorporates and interleaves various types of multimodal biomedical information. Med-PaLM M exhibits evidence of zero-shot medical reasoning, generalization to novel medical concepts and tasks, and positive transfer across tasks. Additionally, in a blinded side-by-side ranking of 246 retrospective chest X-rays, clinicians expressed a pairwise preference for Med-PaLM M reports over those generated by radiologists in up to 40.50% of cases.
arXiv preprint
3. Generalist Medical AI. Another multimodal generative medical AI.
Moor et al. introduced Med-Flamingo, a multimodal few-shot learner adapted to the medical domain. Similar to Med-PaLM M, Med-Flamingo is a multimodal generative model, developed on top of pretrained LLaMA-7B (a large language model) and pretrained CLIP ViT/L-14 (a vision encoder). The model was trained on an interleaved image-text dataset from over 4K medical textbooks and 1.6M image-caption pairs from PubMedCentral’s OpenAccess subset. To assess its performance, the authors created a challenging multimodal problem set of 618 USMLE-style questions augmented with images, case vignettes, and tables of laboratory measurements. Additionally, they designed an interactive human evaluation app for clinical experts to rate the quality of the generated answers for medical visual question answering (VQA). Med-Flamingo showcased few-shot generative VQA abilities and achieved up to a 20% improvement in generating answers that clinicians preferred.
arXiv preprint
4. AI Adoption. Technology adoption generally follows an S-curve, starting with the development of solutions, then piloting, followed by scaling and adaptation, and finally reaching maturity.
In this review article, Sahni and Carrus present insights into the emerging use of AI in healthcare delivery based on their conversations with dozens of healthcare leaders. Currently, most organizations are in the pilot phase of AI adoption, aiming to validate its benefits. The authors identify 9 domains of healthcare delivery for AI uses, with "back-office administrative functions" showing the highest adoption level. Specifically, AI adoption is common and advanced in the reimbursement domain, with examples like automating prior authorization and generating claims-specific root causes of denial. In the domain of clinical operations, AI is currently in the piloting stage, with research concentrated on improving operating-room management, predicting usage, and enabling real-time analytics. Lastly, AI's potential uses in the domain of quality and safety are still in the solution development stage, with efforts directed towards enhancing patient experience by identifying the most dissatisfied patients to address their concerns and improving patient safety, such as using AI to monitor vital signs and nurse reports for sepsis prediction.
The New England Journal of Medicine
-- Emma Chen, Pranav Rajpurkar & Eric Topol