In Week #181 of the Doctor Penguin newsletter, the following papers caught our attention:
1. EEG. Introducing the first comprehensive and fully automated AI for interpreting routine electroencephalograms (EEGs) with an accuracy level comparable to that of human experts.
Using 30K highly annotated EEGs, Tveit et al. developed SCORE-AI to separate normal from abnormal EEG recordings, then classify abnormal recordings into relevant categories for patient care decisions. Through evaluation on three independent multicenter test sets, SCORE-AI demonstrated diagnostic accuracy similar to that of human experts, achieving AUROC ranging from 0.89 to 0.96 for different EEG abnormalities. This development of automated EEG interpretation tool presents a potential solution to the shortage of expertise in reading clinical EEGs.
JAMA Neurology
2. Self-Supervised Learning. How can we develop more robust medical imaging models with fewer annotated data?
Azizi et al. proposed REMEDIS, a transfer-learning strategy for developing robust medical-imaging models with fewer annotated data. REMEDIS combines large-scale supervised pretraining on natural images with self-supervised learning on unlabelled domain-specific medical data. The process begins with pre-training on large-scale natural images, followed by adapting the pre-trained model to the medical domain via contrastive self-supervised learning without using any labeled medical data, and finally fine-tuning the model using annotated medical images for a specific task. Evaluation across 6 imaging domains and 15 test datasets demonstrated that REMEDIS improved in-distribution diagnostic accuracies by up to 11.5% compared to strong supervised baseline models. In out-of-distribution settings, REMEDIS achieved comparable performance to supervised models retrained using all available data, while requiring only 1-33% of the data for retraining.
Nature Biomedical Engineering
3. LLM. How can we effectively adapt large language models to the medical domain?
Wang et al. developed ClinicalGPT, a specialized language model designed specifically for medical and clinical applications. ClinicalGPT was trained on diverse medical datasets, including real medical records, patient consultations, medical knowledge base, and exam data. To further enhance its performance, the model underwent supervised fine-tuning and reinforcement learning, incorporating feedback from human experts on the generated responses. In addition, the authors proposed a comprehensive evaluation framework encompassing medical knowledge question-answering, medical exams, patient consultations, and diagnostic analysis of medical records. Notably, ClinicalGPT significantly outperformed other large language models such as ChatGLM-6B, LLAMA-7B, and BLOOM-7B in these specific medical tasks.
arXiv preprint
4. LLM. How well does GPT-4 perform in challenging medical cases?
Kanjee, Crowe, and Rodman evaluated the diagnostic capabilities of GPT-4 using 70 diagnostically challenging medical cases from the New England Journal of Medicine clinicopathologic conferences (NEJM CPCs). They used the first 7 case conferences from 2023 to develop a standard prompt for GPT-4, which instructs the model to provide a ranked list of potential diagnoses. GPT-4 suggested the correct diagnosis in the differential for 64% of the cases and listed the correct diagnosis as the top diagnosis in 39% of the cases. These results demonstrate promising performance in comparison to existing differential diagnosis (DDX) generators, which identified the correct diagnosis in 58% to 68% of the NEJM CPC cases in a 2022 study.
JAMA
5. Foundation Model. How can medical foundation models transform the future of medicine and healthcare?
In this perspective, Zhang and Metaxas highlight the challenges, opportunities and applications of medical foundation models. They classify medical foundation models into a spectrum, ranging from general vision models, modality-specific models, to organ/task-specific models. General vision models trained on massive natural images can serve as a fundamental building block for developing medical applications. Modality-specific models can leverage vision models as a starting point and undergo further training on a specific modality, enabling them to learn image-based features relevant to the specific intended use of that modality. Organ/task-specific models can be tailored to a particular medical organ or diagnostic task to address the challenges posed by the variability in organ appearance in medical images, as well as the diverse range of clinical tasks that rely on image analysis.
arXiv preprint
Read paper
-- Emma Chen, Pranav Rajpurkar & Eric Topol