In Week #185 of the Doctor Penguin newsletter, the following papers caught our attention:
1. Human-AI Collaboration. Could AI recognize when its prediction might be off and defer to human judgment?
Dvijotham et al. developed a system, CoDoC, that can learn to decide whether to rely on an AI model's prediction or defer to a standard clinical workflow for evaluation. Notably, CoDoC can work with any pre-existing predictive AI model that is accessible only as a black box. The author demonstrated that this approach enhances accuracy in clinical workflows for breast cancer and tuberculosis screening, outperforming clinician-only or AI-only methods. For instance, in a UK breast cancer screening program, CoDoC reduced false positives by 25% and clinician workload by 66% while maintaining the same false-negative rate. Similarly, in tuberculosis triage, CoDoC decreased false positives by 5-15% without increasing false negatives when compared with three out of five commercially available AI systems.
Nature Medicine
Read paper
2. Federated Evaluation. Can AI models be benchmarked on global medical data without sharing the data directly?
Karargyris et al. introduced MedPerf, an open-source platform designed for federated evaluation of AI models using local data from various institutions. With a focus on preserving data privacy and protecting model intellectual property, MedPerf securely distributes AI models to different facilities for evaluation and aggregates only the evaluation metrics back. This approach allows for quantifying the generalizability of models across institutions. Data owners can register their data on the platform (https://www.medperf.org) without the need to share it, while healthcare stakeholders can establish benchmark committees to define specifications and oversee analyses. MedPerf has facilitated the first-ever federated learning challenge and conducted four academic pilot studies so far.
Nature Machine Intelligence
Read paper
3. Large Language Model. Curious about how Large Language Models (LLMs) are evolving and how their strengths and limitations are influencing medicine?
A comprehensive review by Thirunavukarasu et al. dives into this subject. They explain the process of building LLMs, explore their published and prospective applications in healthcare, discuss the technical challenges for implementing these applications, and suggest the future direction of research in this domain. While recent studies benchmark the efficacy of LLMs through medical exams, the authors point out that clinical practice is not the same as answering examination questions correctly, and finding appropriate benchmarks to gauge the clinical potential of LLMs remains a substantial challenge for the field.
Nature Medicine
Read paper
4. Geriatric Care. What is the current state of digital health for aging populations?
In this review, Chen et al. summarize the current trends, challenges, and future potential of using digital technology for effective elderly care. They emphasize the growing significance of wearable devices in assisting seniors to track their health and live independently at home. Digital monitoring devices are sorted into four categories: wearable physical sensors (such as those that continuously track heart rate, ECG, respiration rate, body temperature, oxygen saturation, and blood pressure); wearable chemical sensors (such as those that monitor body fluids like sweat, tears, saliva, and interstitial fluid); hybrid wearables that concurrently track different chemical biomarkers and vital signs; and non-wearable sensors (such as cameras that observe the mobility and gait of Parkinson's disease patients). The authors underscore the importance of creating digital technologies with seniors' needs in mind, factoring in aspects like digital literacy and visual impairment, as well as providing proper training to both the user and the caregiver for the use of these technologies.
Nature Medicine
-- Emma Chen, Pranav Rajpurkar & Eric Topol