What is Dr. Zubair Khalid's research focus?

Dr. Zubair Khalid specializes in molecular virology, mRNA vaccine development, and computational biology, with a focus on avian pathogens like IBDV and Avian Reovirus.

Where is Dr. Zubair Khalid currently working?

Dr. Zubair Khalid is a Postdoctoral Research Associate at the University of Maryland (UMD), specifically within the Department of Animal and Avian Sciences.

Deep Mutational Scanning and Machine Learning for Predicting SARS-CoV-2 Spike Protein Escape from Neutralizing Antibodies

Introduction

The continuous evolution of the SARS-CoV-2 spike protein, particularly its receptor-binding domain (RBD), drives the emergence of variants capable of evading neutralizing antibody responses [1, 2]. Deep mutational scanning (DMS) has emerged as a powerful experimental technique to systematically quantify the functional effects of all single amino acid substitutions in a protein [3]. When combined with machine learning (ML) models, DMS data can be extrapolated to predict the impact of combinatorial mutations on antibody escape, ACE2 receptor binding, and viral fitness [4, 5]. This article provides a comprehensive overview of the experimental and computational workflows that integrate DMS with ML to forecast escape mutations in the SARS-CoV-2 RBD. Although the primary focus is on SARS-CoV-2, the methodologies are directly transferable to animal coronaviruses of veterinary importance, such as feline coronavirus, canine respiratory coronavirus, and bovine coronavirus, for which similar spike-driven host range and immune evasion dynamics exist [6].

Experimental Deep Mutational Scanning of the RBD

DMS libraries are constructed by introducing all possible single nucleotide substitutions into the RBD coding sequence, typically using error-prone PCR or degenerate oligonucleotides [3]. Each variant is linked to a unique DNA barcode to enable multiplexed phenotypic measurement via high-throughput sequencing [3]. The library is expressed on the surface of yeast cells (e.g., Saccharomyces cerevisiae) as a fusion protein, allowing flow cytometric sorting based on binding to ACE2 or to specific monoclonal antibodies [3]. After sorting, the barcodes are sequenced to quantify enrichment or depletion of each variant under selective pressure [3].

The resulting mutational landscape reveals positions that tolerate substitution without loss of expression or ACE2 binding, as well as positions that are highly constrained [7, 3]. For antibody escape mapping, the library is incubated with a neutralizing antibody, and variants that bind poorly to the antibody (i.e., escape mutants) are enriched in the non-bound fraction [8, 9]. This approach has been applied to map escape mutations for hundreds of monoclonal antibodies and polyclonal sera [8].

Recent DMS studies have characterized the RBD of emerging SARS-CoV-2 variants, including Omicron sublineages JN.1 and XEC, highlighting how epistatic interactions shift amino acid preferences at key residues [1, 2]. For example, the N501Y mutation, which enhances ACE2 binding affinity approximately tenfold, also alters the tolerance of neighboring residues to further substitutions [10]. Such epistatic effects complicate simple additive models of escape and necessitate the use of ML approaches that can capture nonlinear interactions [4, 11].

Machine Learning Models for Escape Prediction

Machine learning models trained on DMS data can predict the antibody escape potential of unseen mutations, including combinations of multiple substitutions [12, 5]. The input features typically include structural and biophysical descriptors of the RBD-antibody interface, such as inter-residue contacts, interaction energies, and predicted binding free energy changes (ΔΔG) [12, 13]. Common algorithms include random forests, gradient boosting machines, and deep neural networks [12, 5].

A landmark study by Taft et al. developed deep mutational learning (DML), a machine learning-guided protein engineering platform that predicts ACE2 binding and antibody escape for billions of combinatorial RBD variants [5, 14]. DML uses a neural network trained on DMS data from single mutants to predict the effects of double, triple, and higher-order mutants, achieving high accuracy validated against experimental measurements [5]. The model identified a vast landscape of potential variants that could emerge through diverse evolutionary trajectories, including those with high escape from multiple antibody classes [5].

Protein language models (pLMs) represent another powerful class of ML architectures for escape prediction [4, 11, 9]. These models are trained on large corpora of protein sequences and learn evolutionary constraints without requiring explicit structural input [11]. Yang et al. demonstrated that a DMS-informed pLM could predict SARS-CoV-2 evolution dynamics with spatiotemporal resolution, accurately forecasting the emergence of escape mutations before they were observed in global surveillance data [4]. Similarly, Ehling et al. used a pLM trained on synthetic coevolution data to predict antibody escape trajectories, revealing antagonistic and compensatory mutational patterns [9].

Shlesinger et al. developed a deep mutational learning framework that dissects polyclonal antibody escape by modeling the contributions of individual antibody specificities within a serum [15]. This approach enables the prediction of how changes in the antibody repertoire, for example after vaccination or infection, alter the selective pressure on the virus [15].

Structural Modeling and Binding Free Energy Calculations

Structural information is critical for interpreting DMS results and for constructing informative features for ML models [7, 12, 13]. The RBD-ACE2 and RBD-antibody interfaces have been extensively characterized by X-ray crystallography and cryo-electron microscopy [10, 3]. Computational docking and molecular dynamics simulations can predict the binding free energy change (ΔΔG) associated with each mutation, which serves as a key input feature for escape classifiers [12, 13].

Sharma et al. assembled a dataset of 1,813 mutations at the interface of 83 RBD-neutralizing antibody complexes and used interaction energy, inter-residue contacts, and predicted ΔΔG to train a random forest classifier [12]. The model achieved an area under the receiver operating characteristic curve (AUC) of 0.91 on a test set of 217 mutations, and when applied to 29,165 interface mutations, it identified that the top 10% of high-escape mutations were dominated by charged-to-nonpolar substitutions [12].

Structural modeling also reveals constrained surfaces on the RBD that are less tolerant to mutation and therefore represent promising targets for broadly neutralizing antibodies [3]. By mapping DMS-derived constraint scores onto the three-dimensional structure, researchers can identify epitopes that are both accessible and evolutionarily conserved, guiding the design of vaccines and therapeutic antibodies that are less susceptible to escape [7, 3].

Workflow Integration: From DMS to Predictive Surveillance

The integration of DMS, ML, and structural modeling forms a predictive pipeline that can be applied to real-time genomic surveillance data from databases such as GISAID [4, 16]. The following Mermaid diagram illustrates the typical workflow:

flowchart TD
    A[Construct RBD DMS Library], > B[Yeast Surface Display]
    B, > C[Flow Cytometric Sorting with Antibody]
    C, > D[High-Throughput Barcode Sequencing]
    D, > E[Calculate Enrichment Scores per Variant]
    E, > F[Train Machine Learning Model]
    F, > G[Predict Escape for Combinatorial Mutants]
    G, > H[Validate with Independent Experiments]
    H, > I[Integrate with Global Surveillance Data]
    I, > J[Forecast Emerging Escape Variants]

This pipeline has been used to predict the immune escape of Omicron sublineages before their widespread circulation [4, 5]. Nasir et al. developed a predictive model that groups antigenically similar variants based on their escape profiles, enabling proactive vaccine strain selection [17].

Implications for Veterinary Virology and Vaccine Design

Although SARS-CoV-2 is primarily a human pathogen, the DMS-ML framework is directly applicable to animal coronaviruses that use spike protein-mediated entry and are targeted by neutralizing antibodies [6]. For example, the spike protein of feline infectious peritonitis virus (FIPV) contains an RBD that binds feline ACE2, and antibody escape has been documented in vaccinated cats. Applying DMS to the FIPV RBD could identify mutations that reduce vaccine efficacy, informing the design of next-generation vaccines for feline coronavirus.

Similarly, bovine coronavirus (BCoV) and canine respiratory coronavirus (CRCoV) utilize spike proteins with RBDs that interact with host ACE2 orthologs. The same experimental and computational approaches can be used to map escape mutations for these viruses, supporting the development of broadly protective vaccines for livestock and companion animals [6].

The integration of DMS data with structural modeling and ML also facilitates the prediction of zoonotic spillover risk. By comparing the mutational tolerance of RBDs from bat coronaviruses to that of SARS-CoV-2, researchers can identify which animal viruses have the potential to evolve human ACE2 binding and evade pre-existing immunity [7, 18].

Limitations and Future Directions

Despite its power, DMS is limited to single amino acid substitutions in most implementations, and the combinatorial space of double and triple mutants is vast [5]. ML models trained on single-mutant data can extrapolate to combinations, but their accuracy decreases for mutations involving multiple epistatic interactions [4, 11]. Experimental validation of predicted combinatorial mutants remains essential [5].

Another limitation is that DMS experiments are typically performed in yeast or other heterologous systems, which may not fully recapitulate the native viral context, including glycosylation patterns and quaternary structure [3]. Advances in mammalian surface display and pseudovirus-based DMS are addressing these issues [9].

Future directions include the incorporation of temporal and spatial metadata from global surveillance into ML models, enabling real-time forecasting of variant emergence [4, 15]. Protein language models trained on large coronavirus sequence databases can capture long-range evolutionary dependencies and predict escape mutations with high accuracy [11, 9].

Conclusion

Deep mutational scanning combined with machine learning provides a robust framework for predicting antibody escape mutations in the SARS-CoV-2 spike RBD. The integration of experimental mutagenesis, high-throughput sequencing, structural modeling, and computational prediction enables the proactive identification of variants that may undermine vaccine and therapeutic efficacy. These methodologies are directly transferable to veterinary coronaviruses, supporting the development of durable vaccines for animal health. Continued refinement of ML architectures and expansion of DMS datasets will further enhance our ability to forecast viral evolution and guide countermeasure design.

References

[1] Taylor AL, Starr TN. Deep mutational scanning of recent SARS-CoV-2 variants highlights changing amino acid preferences within epistatic hotspot residues. PLoS Pathog. 2026. URL: https://pubmed.ncbi.nlm.nih.gov/42330076/

[2] Shao C, Yang L, Xiao C, et al. Deep mutational scanning reveals the antibody escape and infectivity landscape of SARS-CoV-2 Omicron JN.1 and XEC receptor-binding domains. Emerg Microbes Infect. 2026. URL: https://pubmed.ncbi.nlm.nih.gov/42324717/

[3] Wang M, Lan T, Wang W. Constrained surfaces: promising therapeutic targets for COVID-19 determined by systematically mutational analysis. Signal Transduct Target Ther. 2021. URL: https://www.semanticscholar.org/paper/c1c1af81c5a3adedd3f5be7dfeb4d2014469c010 *** Disclaimer: This article is for educational and informational purposes only. It is not intended to substitute for professional veterinary advice, diagnosis, treatment, or regulatory guidance. Always consult a licensed veterinarian or qualified specialist regarding animal health, disease diagnosis, and therapeutic decisions.

[4] Yang S, Luo X, Luo J, et al. A deep mutational scanning-informed protein language model predicts SARS-CoV-2 evolution dynamics with spatiotemporal resolution. Nat Microbiol. 2026. URL: https://pubmed.ncbi.nlm.nih.gov/42204343/

[5] Taft JM, Weber CR, Gao B, et al. Deep mutational learning predicts ACE2 binding and antibody escape to combinatorial mutations in the SARS-CoV-2 receptor-binding domain. Cell. 2022. URL: https://www.semanticscholar.org/paper/e95d0d9fe7e759d15ceb484b1e3f472e39d5ca3b

[6] Harari S, Eguia RT, Dadonaite B, et al. Mutations to the HCoV-229E spike have counterbalancing effects on serum antibody neutralization and receptor binding. bioRxiv. 2026. URL: https://pubmed.ncbi.nlm.nih.gov/42124731/

[7] Soliman OA, Shahine Y, Baecker D, et al. Beyond the Mutation Abyss: Revisiting SARS-CoV-2 Receptor-Binding Domain Evolution from ACE2 Binding Optimization to Immune Epitope Remodeling. Pathogens. 2026. URL: https://pubmed.ncbi.nlm.nih.gov/41901725/

[8] Haddox HK, Abdel Aziz O, Galloway JG, et al. Clonal interference and changing selective pressures shape the escape of SARS-CoV-2 from hundreds of antibodies. Virus Evol. 2026. URL: https://pubmed.ncbi.nlm.nih.gov/41767406/

[9] Ehling R, Minot M, Overath MD, et al. Synthetic coevolution reveals adaptive mutational trajectories of neutralizing antibodies and SARS-CoV-2. bioRxiv. 2024. URL: https://www.semanticscholar.org/paper/208ca9488479f554a61f12e3876965dee4a801a3

[10] Liu H, Zhang Q, Wei P, et al. The basis of a more contagious 501Y.V1 variant of SARS-CoV-2. Cell Res. 2021. URL: https://www.semanticscholar.org/paper/cc52761fcc884bdd4119f5f1d90f51e040dd603f

[11] Lamb KD, Hughes J, Lytras S, et al. From single-sequences to evolutionary trajectories: protein language models capture the evolutionary potential of SARS-CoV-2. Nat Commun. 2026. URL: https://pubmed.ncbi.nlm.nih.gov/41714330/

[12] Sharma D, Rawat P, Greiff V, et al. Predicting the immune escape of SARS-CoV-2 neutralizing antibodies upon mutation. Biochim Biophys Acta Mol Basis Dis. 2023. URL: https://www.semanticscholar.org/paper/2a4c5e02f0ccee9654cc039a1a4ce2ea7e894944

[13] Desautels T, Zemla A, Lau E, et al. Rapid in silico design of antibodies targeting SARS-CoV-2 using machine learning and supercomputing. bioRxiv. 2020. URL: https://www.semanticscholar.org/paper/cf499ddf7d34e4ec0f230dbe84b521cdfe0f46a0

[14] Taft JM, Weber CR, Gao B, et al. Predictive profiling of SARS-CoV-2 variants by deep mutational learning. bioRxiv. 2021. URL: https://www.semanticscholar.org/paper/16f43924faca69c77f3db41a9a8b3ff5b3576bb8

[15] Shlesinger D, Sadilek V, Minot M, et al. Dissecting serum polyclonal antibody escape to SARS-CoV-2 variants by deep mutational learning. Cell Rep Methods. 2026. URL: https://pubmed.ncbi.nlm.nih.gov/42030951/

[16] Sokhansanj B, Zhao Z, Rosen G. Interpretable and Predictive Deep Modeling of the SARS-CoV-2 Spike Protein Sequence. medRxiv. 2021. URL: https://www.semanticscholar.org/paper/1da2c2147696184846e4aee556bdb3b8fd0dcfe3

[17] Nasir A, Lee D, Avena LE, et al. Predictive modeling of immune escape and antigenic grouping of SARS-CoV-2 variants. J Virol. 2026. URL: https://pubmed.ncbi.nlm.nih.gov/42037411/

[18] Ding Z, Yuan HY. The role of receptor binding and immunity in SARS-CoV-2 fitness landscape: A modeling study. iScience. 2026. URL: https://pubmed.ncbi.nlm.nih.gov/41809055/

[19] Youssef N, Ghantous F, Gurev S, et al. Deep generative models predict SARS-CoV-2 Spike infectivity and foreshadow neutralizing antibody escape. Journal. 2023. URL: https://www.semanticscholar.org/paper/ba0d0e9ebd301bb3542b1f35031aaafa7584dbe5

[20] Weber CR. Journal Pre-proof Deep Mutational Learning Predicts ACE2 Binding and Antibody Escape to Combinatorial Mutations in the SARS-CoV-2 Receptor Binding Domain. Journal. 2022. URL: https://www.semanticscholar.org/paper/1b1b864cf6a833a39033037aad0b100c8af31c7d