What is Dr. Zubair Khalid's research focus?

Dr. Zubair Khalid specializes in molecular virology, mRNA vaccine development, and computational biology, with a focus on avian pathogens like IBDV and Avian Reovirus.

Where is Dr. Zubair Khalid currently working?

Dr. Zubair Khalid is a Postdoctoral Research Associate at the University of Maryland (UMD), specifically within the Department of Animal and Avian Sciences.

Deep Mutational Scanning and Computational Modeling of SARS-CoV-2 Spike-ACE2 Binding Dynamics for Predicting Zoonotic Risk

Introduction

The emergence of SARS-CoV-2 and its subsequent diversification into numerous variants has underscored the critical need for robust computational frameworks capable of predicting zoonotic spillover risk from animal reservoirs. Central to this endeavor is the characterization of the molecular interface between the viral spike protein receptor-binding domain (RBD) and the host angiotensin-converting enzyme 2 (ACE2) receptor. The binding affinity and structural compatibility at this interface are primary determinants of host range and transmissibility [1, 2]. Deep mutational scanning (DMS) has emerged as a powerful experimental technique to systematically measure the functional consequences of every possible amino acid substitution in the RBD, generating comprehensive fitness landscapes that inform both immune evasion and receptor binding [1, 3]. When integrated with computational modeling approaches such as molecular dynamics (MD) simulations and binding free energy calculations, these datasets enable the prospective identification of mutations that enhance ACE2 binding in diverse animal species, thereby flagging variants with elevated zoonotic potential [4, 5]. This article provides a detailed technical review of the methodologies, data integration strategies, and analytical frameworks that constitute this emerging field of computational virology.

Deep Mutational Scanning of the Spike Receptor-Binding Domain

Deep mutational scanning is a high-throughput technique that couples saturation mutagenesis of a target protein with a functional selection and deep sequencing to quantify the impact of each single amino acid substitution on a phenotype of interest [1]. For the SARS-CoV-2 spike RBD, DMS libraries typically encompass all 19 alternative amino acids at each position, yielding thousands of unique variants that are assayed in parallel for their ability to bind ACE2 or escape antibody neutralization [3, 6]. The output is a comprehensive mutational effect map, often expressed as an enrichment ratio or a log2 fold change relative to the wild-type sequence [1].

Recent DMS studies have focused on evolving variants such as Omicron JN.1 and XEC, revealing that the fitness landscape of the RBD is highly dynamic and shaped by epistatic interactions among residues [1, 3]. Epistasis, or the phenomenon where the effect of one mutation depends on the genetic background, is a critical feature of viral evolution that complicates simple additive models of mutational effects [1]. Taylor and Starr demonstrated that changing amino acid preferences within epistatic hotspot residues are a hallmark of ongoing SARS-CoV-2 adaptation, with certain positions exhibiting context-dependent tolerances for specific substitutions [1]. These findings have direct implications for zoonotic risk prediction, as a mutation that enhances ACE2 binding in a human-adapted background may have a different effect in a bat or pangolin ACE2 context [2].

The DMS approach also provides a direct readout of antibody escape potential. Shao et al. mapped the infectivity and antibody escape landscape of the Omicron JN.1 and XEC RBDs, identifying mutations that simultaneously maintain ACE2 binding while evading polyclonal serum neutralization [3]. This dual functional constraint is a key driver of viral evolution and must be accounted for in predictive models of zoonotic emergence [7, 8]. The integration of DMS data with structural information allows researchers to distinguish between mutations that directly alter the ACE2 contact interface and those that exert allosteric effects on RBD conformation or dynamics [2].

Computational Modeling of Spike-ACE2 Binding Dynamics

While DMS provides a static snapshot of mutational effects under controlled experimental conditions, computational modeling techniques such as molecular dynamics simulations and binding free energy calculations offer a dynamic and physically grounded view of the spike-ACE2 interaction [4, 5]. MD simulations solve Newton's equations of motion for a system of atoms over time, generating trajectories that reveal conformational fluctuations, hydrogen bond networks, and water-mediated interactions at the binding interface [4]. These simulations can be performed for wild-type and mutant RBD-ACE2 complexes to quantify changes in binding stability and identify structural determinants of affinity.

Binding free energy calculations, often performed using methods such as Molecular Mechanics Generalized Born Surface Area (MM-GBSA), Molecular Mechanics Poisson-Boltzmann Surface Area (MM-PBSA), or alchemical free energy perturbation (FEP), provide a thermodynamic estimate of the change in binding affinity (ΔΔG) associated with a given mutation [4]. Durumeric et al. demonstrated that machine learning-driven simulations trained on DMS data can accurately recapitulate the SARS-CoV-2 fitness landscape, achieving high correlation with experimental measurements of ACE2 binding affinity [4]. These computational models can be extended to predict binding to ACE2 orthologs from a wide range of potential reservoir and intermediate host species, including bats, pangolins, civets, and mink [2, 5].

The structural basis for cross-species binding is rooted in the specific amino acid contacts between the RBD and the ACE2 receptor. Key residues in the RBD, such as those at positions 417, 452, 484, 498, and 501, have been repeatedly identified as critical determinants of binding specificity and affinity [2]. Soliman et al. provided a comprehensive analysis of RBD evolution, tracing the trajectory from ACE2 binding optimization to immune epitope remodeling [2]. Their work highlights how mutations that enhance binding to human ACE2 often reduce binding to bat ACE2, and vice versa, creating a trade-off that shapes the evolutionary path toward human adaptation [2]. Computational models that incorporate these structural constraints can predict which combinations of mutations are most likely to facilitate a zoonotic jump.

Integration of Large-Scale Sequence Surveillance

The predictive power of DMS and computational modeling is greatly amplified when integrated with large-scale genomic surveillance data, such as that curated by the GISAID initiative. By mapping observed mutations from circulating animal and human strains onto the experimentally determined fitness landscape, researchers can identify variants that are both prevalent and possess elevated zoonotic potential [9, 10]. This integration requires sophisticated bioinformatic pipelines that can handle the scale and diversity of sequence data, as well as statistical frameworks to distinguish between neutral drift and adaptive evolution.

Protein language models (pLMs) represent a recent advance in this area. Yang et al. developed a DMS-informed pLM that predicts SARS-CoV-2 evolution dynamics with spatiotemporal resolution [9]. These models are trained on large corpora of protein sequences and can capture evolutionary constraints that are not immediately apparent from structural or biophysical data alone. When fine-tuned on DMS datasets, pLMs can predict the fitness effects of unseen mutations and forecast the emergence of new variants [9, 10]. Lamb et al. demonstrated that pLMs trained on single sequences can capture the evolutionary potential of SARS-CoV-2, generating trajectories that align with experimentally observed mutational pathways [10].

The combination of DMS, MD simulations, and pLMs enables a multi-scale approach to zoonotic risk assessment. At the molecular level, DMS and MD simulations provide mechanistic insight into binding affinity and structural compatibility. At the population level, sequence surveillance and pLMs track the real-world emergence and spread of variants. This integrated framework is essential for identifying high-risk mutations before they become fixed in animal or human populations [5].

Predicting Zoonotic Spillover: A Workflow

The following Mermaid diagram illustrates a typical workflow for integrating DMS data with computational modeling and sequence surveillance to predict zoonotic spillover risk.

flowchart TD
    A[Deep Mutational Scanning of Spike RBD], > B[Quantitative Mutational Effect Map]
    B, > C[Binding Free Energy Calculations (MM-GBSA, FEP)]
    B, > D[Molecular Dynamics Simulations of RBD-ACE2 Complexes]
    C, > E[Predicted ΔΔG for Human and Animal ACE2 Orthologs]
    D, > E
    E, > F[Identification of Mutations Enhancing Cross-Species Binding]
    F, > G[Integration with GISAID Sequence Surveillance Data]
    G, > H[Protein Language Model Predictions of Evolutionary Trajectories]
    H, > I[Risk Stratification of Emerging Variants]
    I, > J[Experimental Validation in Pseudovirus or Live Virus Assays]
    J, > K[Zoonotic Spillover Risk Assessment]

The workflow begins with DMS experiments that generate a comprehensive mutational effect map for the spike RBD [1, 3]. This map is then used to parameterize computational models, including binding free energy calculations and MD simulations, which predict the impact of each mutation on ACE2 binding affinity across multiple host species [4, 5]. Mutations that are predicted to enhance binding to animal ACE2 orthologs are flagged as potential zoonotic risk factors. These predictions are then cross-referenced with global sequence surveillance data to identify variants that are actively circulating in animal reservoirs [9, 10]. Protein language models can further refine these predictions by simulating evolutionary trajectories and identifying mutations that are likely to arise in the near future [9, 10]. The highest-risk candidates are then subjected to experimental validation using pseudovirus entry assays or live virus neutralization tests before being incorporated into formal risk assessments.

Epistasis and the Fitness Landscape

A major challenge in predicting zoonotic risk is the pervasive role of epistasis in shaping the viral fitness landscape. The effect of a given mutation is not constant but depends on the presence or absence of other mutations in the genetic background [1]. This phenomenon is particularly pronounced in the SARS-CoV-2 spike RBD, where epistatic interactions between residues can either enhance or suppress the effect of a mutation on ACE2 binding [1, 6]. Haddox et al. demonstrated that clonal interference and changing selective pressures shape the escape of SARS-CoV-2 from hundreds of antibodies, with epistatic interactions playing a central role in determining which escape pathways are accessible [6].

Computational models that ignore epistasis are likely to produce inaccurate predictions of mutational effects. To address this, recent approaches have incorporated pairwise or higher-order interaction terms into their fitness models [4, 5]. Durumeric et al. used machine learning to infer epistatic interactions directly from DMS data, generating a fitness landscape that captures both additive and non-additive effects [4]. These models can be used to predict the fitness of multi-mutant variants that have not been experimentally characterized, providing a more realistic assessment of zoonotic potential.

Implications for Veterinary Medicine and One Health

The methodologies described in this article have direct applications in veterinary medicine and One Health surveillance. Domestic animals such as cats, dogs, ferrets, and farmed mink are known to be susceptible to SARS-CoV-2 infection and can serve as potential reservoirs for viral evolution and spillback into human populations [2, 5]. Computational models that predict ACE2 binding affinity in these species can inform surveillance priorities and guide the development of species-specific diagnostic assays.

For example, the identification of a mutation that enhances binding to feline ACE2 would warrant increased surveillance of cat populations in regions where the variant is circulating. Similarly, predictions of enhanced binding to mink ACE2 could trigger targeted testing on mink farms, where high-density animal populations can facilitate rapid viral evolution [2]. The integration of DMS data with computational modeling provides a rational, data-driven framework for allocating surveillance resources and mitigating zoonotic risk at the animal-human interface.

Limitations and Future Directions

Despite the power of these integrated approaches, several limitations remain. DMS experiments are typically performed using a single ACE2 ortholog (usually human) and may not capture the full complexity of binding to diverse animal receptors [1, 3]. Extending DMS libraries to include multiple ACE2 orthologs is a logical next step but presents significant technical challenges. Computational models, while increasingly accurate, are still limited by force field accuracy, sampling efficiency, and the difficulty of modeling large conformational changes [4]. Protein language models can suffer from biases in the training data and may not generalize well to novel viral lineages [9, 10].

Future directions include the development of multi-species DMS platforms, the incorporation of glycan shielding effects into MD simulations, and the use of active learning strategies to iteratively refine computational predictions with targeted experimental validation. The ultimate goal is a real-time, computationally driven surveillance system that can flag emerging zoonotic threats before they cause widespread outbreaks.

Conclusion

Deep mutational scanning and computational modeling of SARS-CoV-2 spike-ACE2 binding dynamics represent a powerful paradigm for predicting zoonotic risk. By combining high-throughput experimental mutagenesis with physics-based simulations and machine learning, researchers can map the fitness landscape of the RBD, identify mutations that enhance cross-species binding, and track their emergence in real time through genomic surveillance. This integrated approach is essential for proactive risk assessment and the protection of both animal and human health.

References

[1] Taylor AL, Starr TN. Deep mutational scanning of recent SARS-CoV-2 variants highlights changing amino acid preferences within epistatic hotspot residues. PLoS Pathog. 2026. URL: https://pubmed.ncbi.nlm.nih.gov/42330076/

[2] Soliman OA, Shahine Y, Baecker D, et al. Beyond the Mutation Abyss: Revisiting SARS-CoV-2 Receptor-Binding Domain Evolution from ACE2 Binding Optimization to Immune Epitope Remodeling. Pathogens. 2026. URL: https://pubmed.ncbi.nlm.nih.gov/41901725/

[3] Shao C, Yang L, Xiao C, et al. Deep mutational scanning reveals the antibody escape and infectivity landscape of SARS-CoV-2 Omicron JN.1 and XEC receptor-binding domains. Emerg Microbes Infect. 2026. URL: https://pubmed.ncbi.nlm.nih.gov/42324717/

[4] Durumeric AEP, McCarty S, Smith J, et al. Machine Learning-Driven Simulations of the SARS-CoV-2 Fitness Landscape from Deep Mutational Scanning Experiments. J Chem Inf Model. 2026. URL: https://pubmed.ncbi.nlm.nih.gov/42089465/

[5] Ding Z, Yuan HY. The role of receptor binding and immunity in SARS-CoV-2 fitness landscape: A modeling study. iScience. 2026. URL: https://pubmed.ncbi.nlm.nih.gov/41809055/

[6] Haddox HK, Abdel Aziz O, Galloway JG, et al. Clonal interference and changing selective pressures shape the escape of SARS-CoV-2 from hundreds of antibodies. Virus Evol. 2026. URL: https://pubmed.ncbi.nlm.nih.gov/41767406/

[7] Nasir A, Lee D, Avena LE, et al. Predictive modeling of immune escape and antigenic grouping of SARS-CoV-2 variants. J Virol. 2026. URL: https://pubmed.ncbi.nlm.nih.gov/42037411/

[8] Shlesinger D, Sadilek V, Minot M, et al. Dissecting serum polyclonal antibody escape to SARS-CoV-2 variants by deep mutational learning. Cell Rep Methods. 2026. URL: https://pubmed.ncbi.nlm.nih.gov/42030951/

[9] Yang S, Luo X, Luo J, et al. A deep mutational scanning-informed protein language model predicts SARS-CoV-2 evolution dynamics with spatiotemporal resolution. Nat Microbiol. 2026. URL: https://pubmed.ncbi.nlm.nih.gov/42204343/

[10] Lamb KD, Hughes J, Lytras S, et al. From single-sequences to evolutionary trajectories: protein language models capture the evolutionary potential of SARS-CoV-2. Nat Commun. 2026. URL: https://pubmed.ncbi.nlm.nih.gov/41714330/ *** Disclaimer: This article is for educational and informational purposes only. It is not intended to substitute for professional veterinary advice, diagnosis, treatment, or regulatory guidance. Always consult a licensed veterinarian or qualified specialist regarding animal health, disease diagnosis, and therapeutic decisions.