Week 255
Sepsis, Alzheimer’s Disease, Biological Age, Human-AI Collaboration
In Week #255 of the Doctor Penguin newsletter, the following papers caught our attention:
1. Sepsis. As the clinical diagnosis of sepsis relies on the presence of organ dysfunction, only patients in advanced stages of the sepsis syndrome are typically identified, resulting in a delay in sepsis diagnosis when mortality escalates with each hour of treatment delay due to irreversible organ damage. Despite decades of research proposing over 250 potential biomarkers, no single marker has demonstrated outstanding sensitivity and specificity for early sepsis detection and mortality prediction.
In this prospective observational study, Seidlitz et al. collected hyperspectral imaging (HSI) data from the palms and fingers of more than 480 intensive care unit patients for sepsis prediction. HSI is an advanced imaging technology that captures detailed spectral information across many wavelengths of light far beyond what conventional cameras can see. They trained convolutional neural networks to detect microcirculatory dysfunction in the skin using HSI data, which occurs early in sepsis and drives organ failure. These models achieved AUROCs of 0.80 for sepsis diagnosis and 0.72 for mortality prediction using imaging data alone. When combined with clinical data, the models achieved AUROCs of 0.94 for sepsis diagnosis and 0.83 for mortality prediction. The study revealed that patients with sepsis and non-survivors had significantly lower tissue oxygen saturation and higher tissue hemoglobin and water content compared to patients without sepsis and survivors, with palm measurements consistently outperforming finger measurements. The key strengths of this HSI-based approach include its objectivity, non-invasiveness, cost-effectiveness, and speed—predictions can be obtained from a single HSI cube acquired at the bedside within approximately 7 seconds—and the models outperformed widely used clinical biomarkers and scores, offering particular promise for resource-limited settings such as emergency departments, ambulances, and low- and middle-income countries.
Read paper | Science Advances
2. Alzheimer’s Disease. The AMARANTH trial of lanabecestat—a BACE1 inhibitor designed to prevent amyloid plaque formation for slowing disease progression in Alzheimer’s Disease (AD)—was terminated early due to lack of cognitive benefits, but re-stratifying individuals in the AMARANTH trial data with AI in a recent study demonstrated significant treatment effects on primary trial outcomes.
Vaghari et al. demonstrated how an AI tool called the Predictive Prognostic Model (PPM) can improve outcomes in AD clinical trials through better patient stratification. The model takes three types of baseline data as input: β-amyloid levels from PET scans, APOE4 genetic status, and medial temporal lobe gray matter density from MRI scans, and creates prototype vectors that represent the "typical" characteristics of each class (in this case, "clinically stable" vs. "clinically declining" patients). Rather than simply classifying patients into binary categories, the PPM generates a continuous score that predicts how fast individual patients will progress from mild cognitive impairment to AD based on how different each patient is from the "clinically stable" pattern. Using PPM to stratify patients from the failed AMARANTH trial into "slow progressive" versus "rapid progressive" groups, they discovered that slow progressive patients—who were at earlier disease stages—showed a significant 46% slowing of cognitive decline with treatment, while rapid progressive patients showed no cognitive benefit. Additionally, this AI-guided approach could reduce required sample sizes by 90%, making clinical trials faster, cheaper, and more likely to succeed by enrolling the right patients at the optimal disease stage.
Read Paper | Nature Communications
3. Biological Age. For clinical practice, overall aging proxies provide better assessment of comprehensive health while organ-specific aging proxies offer insights into individual organ health, neither of which can be adequately represented by chronological age alone.
Li et al. demonstrate that Large Language Models (LLMs) can assess individual overall and organ-specific aging by analyzing routine health examination data. Their approach converts tabular health data into textual examination reports, then uses prompts to guide LLMs (including Llama and Qwen families) to predict overall age plus six organ-specific ages (cardiovascular, hepatic, pulmonary, renal, metabolic, and musculoskeletal) based on aging mechanisms and health patterns. Validation across six population-based cohorts with over 10 million participants showed that LLM-predicted overall age achieved a concordance index of 0.757 for all-cause mortality, outperforming traditional aging proxies such as telomere length, frailty index, and epigenetic ages. The derived age gaps (difference between LLM-predicted and chronological age) showed a hazard ratio of 1.055 for all-cause mortality. The study also used LLM-derived age gaps to identify proteomic biomarkers associated with accelerated aging and develop risk prediction models for 270 diseases. While their interpretability analysis primarily validated existing knowledge from routine health indicators rather than uncovering new prognostic factors, the approach offers advantages in cost-effectiveness and scalability, though the authors note that LLM performance may be limited in elderly populations because the LLMs do not perform well on all cohorts.
Read Paper | Nature Medicine
4. Human-AI Collaboration. Current evidence suggests that neither fully integrated assistive AI approaches nor complete automation represents an optimal solution for radiology workflows.
In this perspective, Rajpurkar and Topol challenge a prevailing view in radiology that assistive AI, where radiologists work "with" AI systems, necessarily improves diagnostic outcomes. Instead, they propose a "role separation" framework that divides responsibilities between AI and radiologists into distinct, complementary tasks to avoid problems like automation bias (over-relying on AI) and automation neglect (dismissing correct AI suggestions). There are three models in this framework: AI-first sequential (where AI handles initial tasks like analyzing medical records before radiologist interpretation), doctor-first sequential (where radiologists perform primary interpretation while AI handles secondary tasks like generating structured reports), and case allocation (where cases are routed to AI or radiologists based on complexity and risk factors). They cite evidence from successful implementations, including AI systems that autonomously clear normal chest X-rays in Europe and risk-based mammography screening that increased cancer detection by 29% while reducing radiologist workload by 44%. They conclude that rather than pursuing full integration or complete automation, a measured approach to role separation offers the most pragmatic path forward for implementing AI in radiology workflows.
Read Paper | Radiology
-- Emma Chen, Pranav Rajpurkar & Eric Topol

