In Week #173 of the Doctor Penguin newsletter, the following papers caught our attention:
1. Protein Design. Creating novel proteins that do not exist in nature has the potential to be extremely useful for a wide range of scientific and engineering purposes, but the cost of doing so has been too high.
Ni et al. proposed a generative AI approach to rapid, target-guided protein design. They developed attention-based diffusion models to generate amino acid sequences that meet a desired secondary-structure constraint by using a conditioning description of the desired structure as model input. The team then used OmegaFold and AlphaFold to predict the 3D structures of the sequences and classify their secondary structures, comparing these outputs with the input conditions. They also analyzed the novelty of the designed sequences by checking them against known proteins. By training the model on a set of Protein Data Bank (PDB) proteins, they demonstrate the ability to generate stable de novo protein structures that follow the given secondary-structure conditions, bypassing the iterative search process in previous optimization methods.
Chem
2. Protein Design. The "bottom-up" approach in de novo protein design, which assembles protein subunits, limits the properties of the assembly to what can be generated from the available oligomeric building blocks.
Lutz et al. proposed a "top-down" approach for designing protein complexes using reinforcement learning. This approach assembles monomeric subunits from protein fragments directly optimizing for prespecified global structural properties. They used Monte Carlo tree search to sample protein conformers by applying the desired geometric constraints at each step in the search tree. They evaluated the completed backbones with score functions that assess how well the overall generated structure satisfies the protein design criteria.They found that by providing different geometric constraints and score functions to guide the search, they were able to control various properties such as shape, size, porosity, and termini position from the top down.
Science
3. Automated Machine Learning. Is developing high-performing medical AI models possible without machine learning or data science experts?
Wagner et al. developed a code-free deep learning classifier (via Google Cloud AutoML) and their own bespoke model for the classification of plus disease, a hallmark of retinopathy of prematurity, in an ethnically diverse population in London, UK. Both models achieved similar performance to senior pediatric ophthalmologists in discriminating between healthy versus plus or pre-plus disease, and they demonstrated generalizability on external validation test sets from the USA, Brazil, and Egypt. This study highlights the potential for poorly resourced regions lacking data science expertise to develop models optimized for their specific populations using an automated machine learning platform.
The Lancet Digital Health
4. Public Perception. What can social media posts tell us about public perceptions of statins, which are underused despite well-established benefits and safety in lowering cholesterol?
Somani et al. analyzed over 10K discussions about statins on Reddit spanning more than a decade, using machine learning and Natural Language Processing techniques to uncover the predominantly neutral to negative sentiment surrounding statins. The study sheds light on public perceptions of statins and can inform strategies to address barriers to their use and adherence, while also showcasing the potential of AI to automate the extraction and analysis of social media data to complement manual qualitative analyses.
JAMA Network Open
-- Emma Chen, Pranav Rajpurkar & Eric Topol
