Week 246
Real-Time Tumor Detection, LLMs for science, Foundation Model for DNA Methylation, Radiology Report Generation, Diagnostic Reasoning
We are moving our newsletter to Substack for a better experience!
In Week #246 of the Doctor Penguin newsletter, the following papers caught our attention:
1. Real-Time Tumor Detection. A critical challenge in brain tumor surgery is finding and removing all of the tumor tissue that has spread into surrounding areas. When tumor tissue is left behind after surgery, which happens in the majority of patients with glioma after surgery, the cancer often returns quickly and reduces patient survival.
Kondepudi et al. developed FastGlioma, an open-source visual foundation model that can rapidly (within 10 seconds) detect and quantify brain tumor infiltration in fresh, unprocessed surgical tissue during glioma surgery. The model was pretrained using self-supervised learning on approximately 4 million images from stimulated Raman histology microscopy and fine-tuned to output a normalized score indicating tumor infiltration levels in whole-slide images. In a prospective, multicentre, international testing cohort of 220 patients with diffuse glioma, FastGlioma achieved an average AUROC of 92.1% for detecting tumor infiltration, significantly outperforming current surgical approaches like image-guided and fluorescence-guided surgery. The model maintained high performance across diverse patient demographics, medical centers, and glioma subtypes, while also demonstrating zero-shot generalization to other adult and pediatric brain tumor diagnoses. FastGlioma represents a significant advancement in ensuring more complete tumor removal during surgery, as it can rapidly identify residual tumors at microscopic resolution without requiring tissue processing or staining, potentially improving patient outcomes by reducing the risk of tumor recurrence.
Read paper | Nature
2. LLMs for science. A multi-agent system for assisting interdisciplinary research.
Swanson et al. developed the Virtual Lab, an AI-human collaboration framework that enables interdisciplinary scientific research through a team of specialized AI agents. The system consists of a principal investigator (PI) agent who leads a team of domain-specific AI agents (e.g. chemists, computer scientists, and critics), with a human researcher providing high-level guidance. Virtual Lab conducts scientific research through a series of team meetings, where all the agents discuss a scientific agenda, and individual meetings, where an agent accomplishes a specific task. The different backgrounds of the various scientist agents lead to discussions that approach complicated scientific questions from multiple angles, contributing to comprehensive answers; the PI agent helps guide the discussions, makes key decisions, and summarizes conversations for the human researcher; the Scientific Critic agent pushes the other agents to improve their answers to maximize scientific quality; and the human researcher provides high-level guidance where the agents lack relevant context, such as choosing readily available computational tools and introducing constraints in experimental validation. To demonstrate its effectiveness, they applied the Virtual Lab to design nanobody binders to recent variants of SARS-CoV-2. The Virtual Lab created a novel computational pipeline that designed 92 new nanobodies, with experimental validation showing over 90% were expressed and soluble, including two promising candidates that demonstrated unique binding profiles to the recent JN.1 and KP.3 spike RBD variants. The Virtual Lab represents a shift from AI as a tool to AI as a partner for science research.
Read Paper | bioRxiv
3. Foundation Model for DNA Methylation. Despite a decade of advances in aging research, most epigenetic aging clocks still use simple linear models to analyze DNA methylation data. These traditional approaches have two key limitations: they ignore the broader genomic context of methylation sites and cannot provide personalized interpretations of epigenetic patterns for individual samples.
Sehgal et al. developed CpGPT (Cytosine-phosphate-Guanine Pretrained Transformer), a foundation model for DNA methylation analysis. The model integrates three key types of information - sequence context, local and global positional information, and epigenetic state - and was pretrained on an extensive dataset of over 100,000 human DNA methylation samples from more than 1,500 studies spanning diverse tissues, developmental stages, and disease conditions. As a foundation model, CpGPT is capable of performing a series of tasks in both zero-shot settings and when finetuned. For instance, the model can impute missing methylation values within a dataset, convert between different methylation platforms by reconstructing unmeasured CpG sites, perform zero-shot reference mapping to label samples without finetuning, and rank the importance of different CpG sites on a per-sample basis. It also excels when finetuned for chronological age prediction and methylation-based mortality prediction. Additionally, CpGPT is designed to handle missing data and can provide sample-specific interpretation by utilizing the attention mechanism.
Read Paper | bioRxiv
4. Radiology Report Generation. Evaluating an automatic report generation system that generates complete, free-text descriptions of medical images.
Tanno et al. developed Flamingo-CXR, a report generation system for chest radiographs, by fine-tuning the Flamingo vision-language foundation model. In a comprehensive evaluation involving 27 board-certified radiologists across the United States and India, they performed a direct comparison of clinicians' preferences for AI reports versus human reports using both the MIMIC-CXR dataset from intensive care in the United States and the IND1 dataset from in/outpatient settings across India. The results showed a wide distribution of preferences across the panel and clinical settings, with 56.1% of Flamingo-CXR intensive care reports evaluated to be preferable or equivalent to clinician reports by half or more of the panel, rising to 77.7% for in/outpatient X-rays overall and to 94% for normal outpatient X-rays. The study revealed notable error patterns in both AI-generated and human-written reports, with 24.8% of in/outpatient cases containing clinically significant errors in both report types, 22.8% in Flamingo-CXR reports only, and 14% in human reports only. To address these limitations, a collaborative setting was tested, where radiologists could edit AI-generated reports, which resulted in significantly improved performance. This work highlights both the importance of evaluation across different clinical contexts and geographic regions, as well as the complexity of assessing radiology report quality, as evidenced by high inter-rater variability.
Read Paper | Nature Medicine
5. Diagnostic Reasoning. Will the use of large language models (LLMs) improve physicians’ diagnostic reasoning?
Goh et al. conducted a randomized clinical trial to evaluate the impact of GPT-4 on the diagnostic reasoning of 50 physicians across general medical specialties (internal medicine, family medicine, or emergency medicine) compared to conventional resources such as UpToDate and Google. The study utilized clinical vignettes based on actual patients and a structured reflection approach, where physicians considered reasonable diagnoses and clinical features supporting or opposing their diagnoses. Participants provided free-text responses on their top three differential diagnoses, factors favoring or opposing each diagnosis, their final most likely diagnosis, and up to three next steps for further patient evaluation. The trial found that the use of GPT-4 did not improve diagnostic reasoning on challenging clinical cases, with similar results across subgroups of different training levels and chatbot experience. Surprisingly, the LLM alone performed significantly better than both physician groups. These findings suggest that mere access to LLMs will not enhance overall physician diagnostic reasoning in practice, which is particularly relevant given the increasing availability of HIPAA-compliant chatbots in healthcare settings, often with minimal or no training for physicians on their use.
Read Paper | JAMA Network Open
-- Emma Chen, Pranav Rajpurkar & Eric Topol