Week 172

Non-Small Cell Lung Cancer, Multimodal Benchmark, NASH, Proteomics, COPD, AI Evaluation Results

Apr 20, 2023

In Week #172 of the Doctor Penguin newsletter, the following papers caught our attention:

1. Non-Small Cell Lung Cancer. Understanding intratumor heterogeneity through genomes, transcriptomes and multi-region sampling.

In this paper, Martínez-Ruiz et al. examine 347 non-small cell lung cancer patients from the TRACERs study with paired whole-exome and RNA sequencing data from multiple primary and metastatic sites. They use multiple machine-learning approaches, including logistic regression, random forest, multilayer perceptron with support vector machine terminal layer, to understand the link between the evolution of mutations and their metastasis-seeding potential and increased proliferation. They also identify links between frequent copy number-independent allele-specific expression and epigenomic dysfunction

Nature

Read paper

2. Multimodal Benchmark. The Era of multimodal datasets in medicine.

In this paper, Wantlin et al. present a benchmark that tests how modality-agnostic methods, including architectures and training techniques perform on a diverse array of clinically-relevant medical tasks. BenchMD combines 19 publicly available datasets for 7 medical modalities, including 1D sensor data, 2D images, and 3D volumetric scans. The benchmark evaluates on different dataset sizes and out-of-distribution data. The authors conclude that no modality-agnostic technique achieves strong performance across all modalities.

In arXiv preprint

Read paper

3. NASH. A multi-modal study finds a new gene signature for NASH.

In this study, Conway et al. use paired transcriptomics and histopathology data from NASH patients and identify a 5-gene signature for distinguishing stage F3 (pre-cirrhotic) and F4 (cirrhotic) fibrosis. They also find the gene signature to be associated with disease progression and risk of clinical events. The gene signature (JAG1, NOTCH1, NOTCH2, HEYL, HES1) was enriched in the notch signaling pathway and genes implicated in liver-related diseases.

Cell Reports Medicine

Read paper

4. Proteomics. Predicting proteomics from transcriptomics.

In this paper, Wu et al. develop a new deep-learning method, TransPro, to predict proteomics profiles and corresponding phenotypes for uncharacterized cell lines using transcriptomics data. Their model explicitly models the information transmission from RNAs to proteins. They assess TransPro’s predictions of anti-cancer drug sensitivity and drug adverse reactions and find that TransPro’s accuracy mirrors that of experimental data. They suggest the possibility of using TransPro for facilitating the imputation of proteomics data and compound screening in systems pharmacology.

Cell Reports Methods

Read paper

5. COPD. ML framework for medical-record-based labels.

In this study, Cosentino et al. train a deep convolutional neural network on noisy self-reported and International Classification of Diseases labels to predict COPD case–control status from high-dimensional raw spirograms. They use the model’s predictions as a liability score, and find that this score is associated with overall survival and exacerbation events as well as predictive of COPD-related hospitalization without any domain-specific knowledge. They also perform a GWAS on the ML-based liability score and find that it replicates existing COPD and lung function loci and also identifies 67 new loci.

Nature Genetics

Read paper

6. AI Evaluation Results. Are aggregated results all-encompassing, or do they obscure darker truths?

In this perspective, Burnell et al. discuss the pitfalls of using aggregate metrics in AI models due to obfuscation of key information about where these models fail, and introduction of unwarranted assumptions into the evaluation process. They also discuss the importance of instance-by-instance evaluation in many instances, such as evaluation of bias against minority populations, and rarity of finding these results in published studies. Further, they make recommendations to improve the reporting of evaluation results of AI models.

Science

Read paper

-- Shreya Johri, Pranav Rajpurkar & Eric Topol

Doctor Penguin Weekly

Week 172

Non-Small Cell Lung Cancer, Multimodal Benchmark, NASH, Proteomics, COPD, AI Evaluation Results