In Week #253 of the Doctor Penguin newsletter, the following papers caught our attention:
1. Echocardiograms. Can AI fully automate echocardiogram interpretation?
Holste et al. developed PanEcho, a view-agnostic deep learning model that automatically interprets complete echocardiograms using multitask learning on over 1.2 million echocardiographic videos from 32,265 studies. The model first uses a convolutional neural network to encode video frames. These frame-wise embeddings are then interpreted as an ordered sequence and modeled using self-attention to learn time-varying associations over the frames. Finally, a video-level embedding is formed and used as input to the task-specific output heads. Unlike previous single-task, single-view models, PanEcho can analyze any combination of 2D echocardiographic views simultaneously, performing the full range of reporting tasks, similar to how human cardiologists work in clinical practice. The system achieved a median AUC of 0.91 for 18 diagnostic classification tasks and a median normalized mean absolute error of 0.13 for 21 parameter estimation tasks, demonstrating high accuracy across diverse cardiac conditions including left ventricular dysfunction, valvular disease, and structural abnormalities. PanEcho maintained excellent performance across multiple international validation cohorts and adapted well to limited imaging protocols, successfully analyzing studies ranging from complete protocols with a median of 17 videos per study to simplified point-of-care acquisitions (median 6 videos per study). This system can potentially improve diagnostic accessibility in resource-limited settings and support clinical workflows in established laboratories.
Read paper | JAMA
2. Opportunistic Screening. Thoracic aortic aneurysms are rare but often "silent killers", with screening only offered to high-risk groups. Since breast MRI naturally covers the mediastinal thorax where the aorta is visible, can AI leverage routine breast imaging to detect these life-threatening aneurysms?
Bounias et al. developed and validated a fully automated convolutional neural network pipeline to screen for thoracic aortic aneurysms using routine breast MRI examinations across 5,057 patients from multiple hospitals and clinical trials. The system demonstrated high robustness (Dice coefficients 0.88-0.91) across three independent, multi-center datasets with varying vendors, field strengths, and acquisition protocols, spanning the full spectrum of breast MRI indications (including high-risk screening, screening of women with dense breast, staging, and follow-ups). The system improved aneurysm detection rates by 3.5-fold compared to routine clinical readings—in their Erlangen cohort of over 2000 breast MRI scans, radiologists initially detected 2 aneurysms while the AI detected 7 total cases. The study found a higher prevalence of thoracic aortic aneurysms and larger aortic diameters in breast cancer patients, though this finding may be biased by patient selection and wide confidence intervals due to low aneurysm prevalence. This approach leverages existing imaging data without requiring additional patient visits, offering cost-effective opportunistic screening for a rare but potentially fatal condition that disproportionately affects women.
Read Paper | Nature Communications
3. Retinal Phenotyping. Inherited retinal diseases (IRDs) are rare genetic disorders that cause blindness in children and adults. Genetic diagnosis is crucial for treatment and counseling, but remains elusive in over 40% of cases in the UK (and potentially much lower in other parts of the world) due to limited testing access and lack of specialists. While IRDs show distinct patterns on retinal images, few clinicians have the expertise to diagnose these rare diseases from images alone.
Pontikos et al. developed Eye2Gene, a model that predicts the causative gene for IRDs from retinal imaging scans. The system was trained on one of the world's largest datasets of genotyped individuals with IRDs (n = 2,451) and externally validated across five clinical centers, achieving 83.9% top-five accuracy for the 63 most common genetic causes—outperforming eight specialist ophthalmologists with 5-15 years of experience who averaged 29.5% accuracy in direct comparison. Eye2Gene uses an ensemble of 15 convolutional neural networks that analyze three retinal imaging modalities: fundus autofluorescence, infrared reflectance, and spectral-domain optical coherence tomography. Although the model can make predictions given a single imaging modality, combining all three modalities typically improves performance over any single modality on most genes. While Eye2Gene will not replace genetic testing or counseling at specialized centers, particularly when gene therapy decisions require confirmed genetic diagnosis, its main application is facilitating more efficient genetic diagnosis by indicating when molecular testing is warranted and guiding test interpretation.
Read Paper | Nature Machine Intelligence
4. Cognitive Debt. Early AI reliance may result in shallow memory encoding.
Kosmyna et al. examined the cognitive impact of using LLMs versus traditional search engines or no tools during essay writing tasks. They assigned 54 participants to three groups—LLM group (using ChatGPT), Search Engine group, and Brain-only group (tools-free)—across multiple writing sessions, using EEG to measure brain activity alongside natural language processing analysis of the essays and post-task interviews. EEG analysis revealed that brain connectivity (the coordinated activity and communication between different brain regions) systematically scaled down with external support: the Brain-only group exhibited the strongest, widest-ranging neural networks, the Search Engine group showed intermediate engagement, and LLM assistance produced the weakest neural coupling. Participants using LLMs demonstrated significantly impaired memory recall (83% reported difficulty quoting from their own essays), which corresponds to reduced low-frequency connectivity in theta and alpha bands that are crucial for episodic memory consolidation and semantic encoding. This likely reflected a bypass of deep memory encoding processes, whereby participants read, selected, and transcribed tool-generated suggestions without integrating them into episodic memory networks. Search Engine and Brain-only participants did not display such impairments. The findings suggest that withholding LLM tools during early stages might support memory formation. When LLM tools were later introduced to the Brain-only group, participants appeared to mentally compare their past unaided efforts with tool-generated suggestions, engaging in self-reflection and elaborative rehearsal—processes linked to executive control and semantic integration, as evidenced in their EEG profiles.
Read Paper | arXiv
-- Emma Chen, Pranav Rajpurkar & Eric Topol