Zubair Khalid

Virologist/Molecular Biologist | Veterinarian | Bioinformatician

Conventional & Molecular Virology • Vaccine Development • Computational Biology

Dr. Zubair Khalid is a veterinarian and virologist specializing in conventional and molecular virology, vaccine development, and computational biology. Dedicated to advancing animal health through innovative research and multi-omics approaches.

Dr. Zubair Khalid - Veterinarian, Virologist, and Vaccine Development Researcher specializing in Computational Biology, Multi-omics, Animal Health, and Infectious Disease Research

Section: Computational Biology

Deep Mutational Scanning and Computational Protein Design for Predicting Zoonotic Spillover Risk in SARS-like Coronaviruses

Introduction

Zoonotic spillover of SARS-like coronaviruses from animal reservoirs, particularly bats and pangolins, into susceptible intermediate or accidental hosts remains a critical concern in veterinary virology and public health preparedness. The primary molecular determinant of host range and transmissibility is the interaction between the viral spike protein receptor-binding domain (RBD) and the host angiotensin-converting enzyme 2 (ACE2) receptor [1]. Mutations within the RBD can alter binding affinity, shift host tropism, and enable immune evasion from pre-existing antibody responses [2, 3]. Predicting which mutations confer these properties is essential for risk assessment and surveillance in animal populations.

Deep mutational scanning (DMS) has emerged as a powerful experimental technique to systematically measure the functional effects of all possible single amino acid substitutions in a protein [2, 3]. When combined with computational protein design tools such as Rosetta and AlphaFold, and further integrated with machine learning models, DMS data can be used to construct fitness landscapes that predict viral evolution and zoonotic potential [4, 5, 6]. This article reviews the current state of these integrated approaches, focusing on their application to SARS-like coronavirus RBDs and their utility in veterinary spillover risk assessment.

Deep Mutational Scanning of Coronavirus RBDs

DMS involves generating a library of variants, typically through site-directed mutagenesis or error-prone PCR, expressing the variants in a suitable system (e.g., yeast display or lentiviral pseudotypes), and measuring a phenotype of interest such as ACE2 binding affinity or antibody escape [2, 3]. For SARS-CoV-2 and related coronaviruses, DMS has been applied extensively to the RBD to map how each amino acid substitution affects receptor engagement and neutralization sensitivity [3, 7].

Recent DMS studies have revealed that epistatic interactions among residues within the RBD can alter the fitness effects of mutations over time [2]. For example, as the virus evolves, the preferred amino acid at a given position may shift due to compensatory mutations elsewhere in the spike protein [2]. This phenomenon, termed "changing amino acid preferences within epistatic hotspot residues," complicates simple additive models of fitness prediction [2]. DMS data from Omicron sublineages such as JN.1 and XEC have demonstrated that antibody escape and infectivity landscapes are highly dynamic, with certain mutations conferring simultaneous benefits in both receptor binding and immune evasion [3].

Importantly, DMS is not limited to SARS-CoV-2. Studies on seasonal coronavirus HCoV-229E have shown that mutations in the spike protein can have counterbalancing effects on serum antibody neutralization and receptor binding, highlighting the trade-offs that shape viral evolution in different host contexts [7]. These findings underscore the need for species-specific DMS libraries when assessing zoonotic risk from animal coronaviruses.

Computational Protein Design for Predicting ACE2 Binding and Stability

Computational protein design methods, particularly Rosetta and AlphaFold, are used to predict the structural and energetic consequences of RBD mutations. Rosetta employs a physically based energy function to calculate the binding free energy change (Delta Delta G) upon mutation, allowing rapid screening of thousands of variants [5]. AlphaFold, a deep learning-based structure prediction tool, can generate accurate models of RBD-ACE2 complexes even for sequences with limited homology, enabling structural analysis of mutations in novel coronaviruses [6].

These computational approaches are often validated against DMS data. For instance, machine learning-driven simulations that incorporate Rosetta energy terms and DMS-derived fitness measurements have been used to reconstruct the SARS-CoV-2 fitness landscape with high accuracy [5]. Such models can predict which mutations are likely to emerge under selective pressure from host antibodies or changes in receptor availability [5, 8].

A key application is the identification of mutations that enhance ACE2 binding in animal species. By modeling the RBD-ACE2 interface for bat, pangolin, and other mammalian ACE2 orthologs, researchers can pinpoint residues that, if mutated, could increase affinity for human ACE2 and thus elevate spillover risk [1]. These predictions can be cross-referenced with GISAID sequence data to monitor for the emergence of such mutations in circulating animal strains.

Machine Learning and Protein Language Models

Beyond physics-based methods, machine learning models trained on large sequence datasets have proven highly effective at predicting viral evolution. Protein language models (PLMs) such as those based on transformer architectures can capture evolutionary constraints from thousands of coronavirus spike sequences and forecast which mutations are likely to be tolerated or selected [4, 6]. A DMS-informed PLM has been shown to predict SARS-CoV-2 evolution dynamics with spatiotemporal resolution, correctly anticipating the rise of specific variants before they became dominant [4].

Similarly, deep mutational learning approaches that combine DMS data with neural networks can dissect polyclonal antibody escape patterns, identifying which RBD mutations are most likely to reduce neutralization by sera from infected or vaccinated animals [9]. These models can be extended to veterinary contexts by training on sera from relevant host species (e.g., bats, civets, mink).

Clonal interference, where multiple beneficial mutations compete within a viral population, is another factor that shapes escape from antibody pressure. DMS data combined with population genetics models have revealed that SARS-CoV-2 can escape hundreds of antibodies through a combination of mutations that arise and fix under changing selective pressures [10]. Understanding these dynamics in animal reservoirs is critical for predicting which lineages may acquire zoonotic potential.

Integration with Surveillance and Structural Visualization

The workflow for predicting zoonotic spillover risk using DMS and computational design is summarized in Figure 1.

flowchart TD
    A[Animal surveillance samples], > B[Sequencing & GISAID submission]
    B, > C[Phylogenetic analysis & lineage assignment]
    C, > D[Identify novel RBD mutations]
    D, > E[Deep mutational scanning library construction]
    E, > F[Phenotypic screening: ACE2 binding & antibody escape]
    F, > G[Computational modeling: Rosetta, AlphaFold]
    G, > H[Machine learning fitness landscape prediction]
    H, > I[Risk assessment: enhanced ACE2 binding & immune evasion]
    I, > J[Veterinary public health alert & targeted surveillance]

Figure 1. Integrated workflow combining field surveillance, deep mutational scanning, and computational protein design for zoonotic spillover risk assessment.

The 3D Protein Viewer can be used to visualize key RBD-ACE2 interaction interfaces and mutational hotspots identified through these analyses. For example, residues at positions 484, 501, and 498 in the SARS-CoV-2 RBD have been repeatedly implicated in both ACE2 affinity modulation and antibody escape [3, 1]. Visualizing these residues in the context of the three-dimensional structure aids in interpreting functional data and designing targeted surveillance assays.

GISAID data provide the raw material for phylogenetic surveillance, enabling the tracking of RBD mutations across animal hosts and geographic regions. By coupling GISAID sequence monitoring with DMS-informed predictions, veterinary diagnosticians can prioritize variants that pose the highest spillover risk for further experimental characterization.

Implications for Veterinary Zoonotic Risk Assessment

The methods described above have direct applications in veterinary medicine. Surveillance programs in bat colonies, pangolin trafficking routes, and mink farms can benefit from computational pre-screening of spike sequences to identify mutations that warrant deeper investigation. For instance, if a novel bat coronavirus sequence contains an RBD mutation predicted by Rosetta to increase human ACE2 binding and by a PLM to be evolutionarily viable, that lineage can be flagged for enhanced monitoring [1, 6].

Furthermore, DMS can be performed directly on RBDs from animal coronaviruses using yeast display systems, allowing empirical measurement of binding to panels of mammalian ACE2 orthologs [7]. Such data can inform host range predictions and identify species that may serve as bridging hosts.

The integration of DMS, computational protein design, and machine learning represents a paradigm shift in zoonotic risk assessment, moving from reactive characterization of emerged variants to proactive prediction of future threats. Veterinary virologists and computational biologists are central to this effort, as they provide the expertise in animal host biology, sample collection, and species-specific assay development.

Conclusion

Deep mutational scanning, when combined with computational protein design tools such as Rosetta and AlphaFold and with machine learning models including protein language models, offers a robust framework for predicting RBD mutations that enhance ACE2 binding and immune escape in SARS-like coronaviruses. These approaches enable the construction of fitness landscapes that capture epistatic interactions, clonal interference, and changing selective pressures [2, 5, 10]. By linking experimental DMS data to structural modeling and evolutionary sequence analysis, researchers can identify high-risk mutations before they become widespread in animal populations. Continued investment in veterinary surveillance, open data sharing through platforms like GISAID, and cross-disciplinary collaboration will be essential to operationalize these predictive tools for zoonotic spillover prevention.

References

[1] Soliman OA, Shahine Y, Baecker D, et al. Beyond the Mutation Abyss: Revisiting SARS-CoV-2 Receptor-Binding Domain Evolution from ACE2 Binding Optimization to Immune Epitope Remodeling. Pathogens. 2026. URL: https://pubmed.ncbi.nlm.nih.gov/41901725/

[2] Taylor AL, Starr TN. Deep mutational scanning of recent SARS-CoV-2 variants highlights changing amino acid preferences within epistatic hotspot residues. PLoS Pathog. 2026. URL: https://pubmed.ncbi.nlm.nih.gov/42330076/

[3] Shao C, Yang L, Xiao C, et al. Deep mutational scanning reveals the antibody escape and infectivity landscape of SARS-CoV-2 Omicron JN.1 and XEC receptor-binding domains. Emerg Microbes Infect. 2026. URL: https://pubmed.ncbi.nlm.nih.gov/42324717/

[4] Yang S, Luo X, Luo J, et al. A deep mutational scanning-informed protein language model predicts SARS-CoV-2 evolution dynamics with spatiotemporal resolution. Nat Microbiol. 2026. URL: https://pubmed.ncbi.nlm.nih.gov/42204343/

[5] Durumeric AEP, McCarty S, Smith J, et al. Machine Learning-Driven Simulations of the SARS-CoV-2 Fitness Landscape from Deep Mutational Scanning Experiments. J Chem Inf Model. 2026. URL: https://pubmed.ncbi.nlm.nih.gov/42089465/

[6] Lamb KD, Hughes J, Lytras S, et al. From single-sequences to evolutionary trajectories: protein language models capture the evolutionary potential of SARS-CoV-2. Nat Commun. 2026. URL: https://pubmed.ncbi.nlm.nih.gov/41714330/ *** Disclaimer: This article is for educational and informational purposes only. It is not intended to substitute for professional veterinary advice, diagnosis, treatment, or regulatory guidance. Always consult a licensed veterinarian or qualified specialist regarding animal health, disease diagnosis, and therapeutic decisions.

[7] Harari S, Eguia RT, Dadonaite B, et al. Mutations to the HCoV-229E spike have counterbalancing effects on serum antibody neutralization and receptor binding. bioRxiv. 2026. URL: https://pubmed.ncbi.nlm.nih.gov/42124731/

[8] Ding Z, Yuan HY. The role of receptor binding and immunity in SARS-CoV-2 fitness landscape: A modeling study. iScience. 2026. URL: https://pubmed.ncbi.nlm.nih.gov/41809055/

[9] Shlesinger D, Sadilek V, Minot M, et al. Dissecting serum polyclonal antibody escape to SARS-CoV-2 variants by deep mutational learning. Cell Rep Methods. 2026. URL: https://pubmed.ncbi.nlm.nih.gov/42030951/

[10] Haddox HK, Abdel Aziz O, Galloway JG, et al. Clonal interference and changing selective pressures shape the escape of SARS-CoV-2 from hundreds of antibodies. Virus Evol. 2026. URL: https://pubmed.ncbi.nlm.nih.gov/41767406/

[11] Nasir A, Lee D, Avena LE, et al. Predictive modeling of immune escape and antigenic grouping of SARS-CoV-2 variants. J Virol. 2026. URL: https://pubmed.ncbi.nlm.nih.gov/42037411/