What is Dr. Zubair Khalid's research focus?

Dr. Zubair Khalid specializes in molecular virology, mRNA vaccine development, and computational biology, with a focus on avian pathogens like IBDV and Avian Reovirus.

Where is Dr. Zubair Khalid currently working?

Dr. Zubair Khalid is a Postdoctoral Research Associate at the University of Maryland (UMD), specifically within the Department of Animal and Avian Sciences.

Deep Mutational Scanning and Machine Learning Predictions of SARS-CoV-2 Spike Protein Receptor Binding Domain Escape Mutants

Introduction

The continuous evolution of the SARS-CoV-2 spike glycoprotein, particularly its receptor binding domain (RBD), presents a persistent challenge for vaccine design and monoclonal antibody therapy development across both human and veterinary medicine. The RBD mediates viral attachment to the angiotensin-converting enzyme 2 (ACE2) receptor and is the primary target of neutralizing antibodies [1]. Deep mutational scanning (DMS) has emerged as a powerful experimental approach to systematically quantify the functional effects of all possible single amino acid mutations in the RBD on ACE2 binding affinity, protein folding, and antibody escape [1, 2]. When integrated with machine learning algorithms, DMS data enable prospective prediction of viral evolutionary trajectories and the identification of high-risk escape mutations before they emerge in circulating variants [3, 4, 5]. This review provides a comprehensive examination of the computational workflow linking DMS experiments to machine learning predictions of RBD escape mutants, with emphasis on the biophysical mechanisms underlying mutation effects and the implications for veterinary vaccine strain selection.

Deep Mutational Scanning: Experimental Foundations

Library Design and Construction

DMS experiments begin with the construction of comprehensive mutant libraries encompassing all possible single amino acid substitutions in the RBD. For SARS-CoV-2, the RBD spans approximately 200 amino acids (residues 319-541 of the spike protein), yielding roughly 4,000 possible single mutants when accounting for all 19 alternative amino acids at each position [1]. Libraries are typically generated through oligonucleotide-directed mutagenesis followed by cloning into expression vectors for display on the surface of yeast or mammalian cells [1, 6]. Pseudovirus-based DMS platforms have been developed that enable measurement of mutation effects in the context of the full spike trimer, capturing conformational dependencies that may be missed in isolated RBD constructs [2, 6]. These pseudovirus libraries contain approximately 7,000 distinct amino acid mutations arranged in up to 135,000 unique mutation combinations, allowing simultaneous assessment of single and combinatorial effects [6].

High-Throughput Sequencing and Fitness Scoring

Following library expression and selection, deep sequencing is employed to quantify the enrichment or depletion of each variant under specific selective pressures. For ACE2 binding measurements, cells expressing the RBD library are incubated with soluble ACE2 receptor, and bound versus unbound populations are sorted by fluorescence-activated cell sorting (FACS) [1]. For antibody escape mapping, the library is incubated with neutralizing antibodies, and variants that retain binding are enriched [7, 8, 9]. Sequencing reads are aligned to the reference RBD sequence, and variant counts are used to compute enrichment ratios. A fitness score for each mutation is calculated as the log2 fold change in frequency between the selected and unselected populations, normalized to the wild-type sequence [1, 6]. These scores reflect the combined effects of mutations on protein expression, stability, and receptor binding or antibody evasion.

Epistatic Interactions and Background Dependence

A critical finding from DMS studies is that mutational effects are not independent but are modulated by epistatic interactions with other residues in the RBD [10, 11]. Epistasis occurs when the phenotypic effect of a mutation depends on the genetic background in which it is introduced. For example, the Q493E mutation was found to decrease ACE2 binding affinity in ancestral SARS-CoV-2 backgrounds but enhanced binding when combined with L455S and F456L in the KP.3 variant [11]. This sign reversal of epistasis highlights the importance of measuring mutational effects in contemporary strain backgrounds rather than relying solely on ancestral measurements [10, 11]. DMS experiments performed in Omicron BA.2.86 and XBB.1.5 backgrounds have revealed that while many mutational effects are conserved across lineages, a subset of residues exhibit background-dependent behavior that can alter evolutionary trajectories [10, 7, 11].

Machine Learning Integration

Feature Engineering from DMS Data

Machine learning models for predicting RBD escape mutants require informative feature representations derived from DMS measurements and structural data. Common features include per-mutation fitness scores from ACE2 binding and expression assays, amino acid physicochemical properties (hydrophobicity, charge, molecular volume), evolutionary conservation scores from multiple sequence alignments, and structural features such as solvent accessibility, residue depth, and distance to the ACE2 interface or antibody epitopes [4, 12]. Protein language model embeddings have been employed to capture higher-order sequence context and evolutionary information, enabling predictions of mutational effects without explicit structural input [3, 13].

Model Architectures

Several machine learning architectures have been applied to DMS data for predicting RBD escape. Random forest models have been used to classify mutations as escape or non-escape based on DMS-derived features, achieving high accuracy in retrospective validation against known antibody escape mutations [5, 12]. Neural network models, including multilayer perceptrons and graph neural networks, have been trained to predict continuous fitness scores from sequence and structure features [4, 14]. Protein language models such as ESM-1v and MSA Transformer have been fine-tuned on DMS data to predict mutational effects with spatiotemporal resolution, capturing how escape potential varies across viral lineages and geographic regions [3, 13]. These models learn the statistical regularities of protein sequence space and can generalize to mutations not present in the training data.

Training and Validation Strategies

Training datasets for escape prediction models are constructed from DMS measurements of antibody escape for panels of monoclonal antibodies and polyclonal sera [7, 14, 8]. Each mutation is labeled with an escape score reflecting the reduction in antibody binding relative to wild-type. Models are trained to predict these scores from input features and are validated using held-out mutations or independent DMS datasets from different antibody specificities [5, 12]. Cross-validation across antibody panels ensures that models learn general principles of antibody escape rather than memorizing specific epitope interactions. Prospective validation involves comparing model predictions to subsequently observed mutations in emerging variants, such as the accurate prediction of K444T and V445P escape from bebtelovimab in BQ.1 and XBB subvariants [9].

Key Escape Mutations Identified

E484K and the E484 Family

The E484K mutation, located at the ACE2 interface, was one of the first escape mutations identified through DMS and was subsequently observed in Beta, Gamma, and Omicron variants [1]. This mutation reduces neutralization by many class 1 and class 2 antibodies that target the receptor binding motif. DMS studies have shown that substitutions at position 484, including E484A, E484Q, and E484K, confer broad escape from antibody neutralization while maintaining or enhancing ACE2 binding [7, 1]. The E484K mutation introduces a positively charged lysine that disrupts electrostatic interactions with complementarity-determining regions of several neutralizing antibodies.

N501Y

The N501Y mutation enhances ACE2 binding affinity by introducing a tyrosine that forms additional pi-stacking interactions with Y41 and K353 of ACE2 [1]. This mutation was a defining feature of the Alpha, Beta, Gamma, and Omicron variants. DMS measurements confirmed that N501Y increases ACE2 binding affinity approximately 10-fold relative to wild-type, providing a fitness advantage that contributed to the rapid spread of these lineages [1, 2]. Machine learning models trained on DMS data correctly identified N501Y as a high-affinity mutation before its widespread emergence [4, 12].

K417N/T

The K417N and K417T mutations are located at the periphery of the ACE2 interface and reduce binding to class 1 antibodies that target this region [1]. These mutations were present in Beta, Gamma, and Omicron variants. DMS data show that K417N/T reduce ACE2 binding affinity modestly but confer substantial antibody escape, illustrating the trade-off between receptor binding and immune evasion that drives viral evolution [15, 16]. The combination of K417N with E484K and N501Y in the Beta variant produced a synergistic escape phenotype that was accurately predicted by epistasis-aware models [10, 11].

Emerging Escape Sites in Recent Variants

DMS studies of Omicron subvariants have identified additional escape sites including L455S, F456L, Q493E, and P499S [7, 11, 9]. The L455S and F456L mutations, present in the KP.3 variant, were shown to reverse the sign of epistasis at position 493, converting Q493E from a deleterious to a beneficial mutation for ACE2 binding [11]. This finding underscores the importance of continuous DMS surveillance in contemporary backgrounds to capture shifting mutational effects. Machine learning models that incorporate epistatic interactions have demonstrated superior accuracy in predicting lineage prevalence compared to models that treat mutations independently [17].

Computational Workflow

The following Mermaid diagram illustrates the integrated computational workflow for DMS and machine learning prediction of RBD escape mutants.

flowchart TD
    A[Design RBD Mutant Library], > B[Express Library in Yeast or Pseudovirus]
    B, > C[Apply Selective Pressure: ACE2 Binding or Antibody Neutralization]
    C, > D[FACS Sort Bound vs Unbound Populations]
    D, > E[High-Throughput Sequencing]
    E, > F[Compute Enrichment Ratios and Fitness Scores]
    F, > G[Feature Engineering: Physicochemical, Structural, Evolutionary Features]
    G, > H[Machine Learning Model Training: Random Forest, Neural Network, Protein Language Model]
    H, > I[Predict Escape Scores for All Single Mutants]
    I, > J[Validate Against Independent DMS Data and Emerging Variants]
    J, > K[Identify High-Risk Escape Mutations]
    K, > L[Inform Vaccine Strain Selection and Antibody Design]
    F, > M[Epistatic Interaction Analysis]
    M, > N[Background-Dependent Mutational Effects]
    N, > H

Structural Context and Visualization

The spatial distribution of key escape mutations on the RBD structure provides insight into their functional mechanisms. The receptor binding motif (RBM), spanning residues 438-506, forms the primary ACE2 contact surface and is the target of most neutralizing antibodies [1]. Mutations at positions 484, 501, and 417 are located within or adjacent to the RBM and directly affect both ACE2 binding and antibody recognition. Mutations at positions 455, 456, and 493 are located in a loop region that undergoes conformational changes upon ACE2 binding, explaining their epistatic interactions [11]. The 3D Protein Viewer can be used to visualize these mutations in the context of the RBD-ACE2 complex and antibody-bound structures, enabling detailed analysis of steric clashes, electrostatic changes, and hydrogen bond networks that underlie escape phenotypes. Readers are directed to the article Computational Visualization of Single-Point Mutations on Protein 3D Structures for technical guidance on structural analysis.

Implications for Veterinary Vaccine Design

The principles derived from DMS and machine learning studies of SARS-CoV-2 RBD escape are directly applicable to veterinary vaccine development for coronaviruses affecting companion animals, livestock, and wildlife. The spike proteins of feline coronavirus, canine coronavirus, and bovine coronavirus contain RBDs that mediate ACE2 or alternative receptor binding and are targets of neutralizing antibodies [15]. DMS platforms can be adapted to these veterinary coronaviruses to map mutational landscapes and predict escape from vaccine-induced immunity. Machine learning models trained on SARS-CoV-2 DMS data can be transferred to related coronaviruses through transfer learning, leveraging the extensive SARS-CoV-2 dataset to make predictions in data-limited veterinary contexts [3, 13]. The identification of conserved epitopes that are resistant to escape, such as the S2 fusion peptide region, is particularly relevant for designing broadly protective veterinary vaccines [18, 19].

Limitations and Future Directions

Despite the power of DMS and machine learning approaches, several limitations must be acknowledged. DMS measurements are typically performed in vitro using pseudovirus systems or yeast display, which may not fully recapitulate the in vivo selective pressures of host immune responses and viral replication dynamics [2, 6]. Polyclonal antibody escape is more complex to model than monoclonal antibody escape, as it involves diverse antibody specificities with varying potencies and epitope preferences [14, 8]. Machine learning models trained on DMS data may not generalize to mutations that involve insertions, deletions, or recombination events, which are not captured in single amino acid substitution libraries [3]. Future directions include the integration of DMS data with molecular dynamics simulations to predict the structural consequences of mutations [4], the development of active learning strategies to prioritize mutations for experimental validation, and the extension of DMS platforms to full-length spike proteins in authentic viral contexts [18].

Conclusion

Deep mutational scanning combined with machine learning provides a robust framework for predicting escape mutations in the SARS-CoV-2 spike RBD. The systematic measurement of mutational effects on ACE2 binding, protein expression, and antibody escape, coupled with computational models that capture epistatic interactions and structural context, enables prospective identification of high-risk mutations before they become fixed in circulating variants. Key escape mutations including E484K, N501Y, and K417N/T have been accurately predicted by these approaches, and emerging sites such as L455S, F456L, and Q493E continue to be identified through ongoing DMS surveillance. The integration of these methods into veterinary virology holds promise for improving vaccine strain selection and antibody therapy design for coronaviruses affecting animal populations.

References

[1] Starr TN, Greaney AJ, Hilton SK, et al. Deep mutational scanning of SARS-CoV-2 receptor binding domain reveals constraints on folding and ACE2 binding. bioRxiv. 2020. URL: https://www.semanticscholar.org/paper/5da0d587a01a7268c756d8714e570a01b4920776

[2] Dadonaite B, Brown JT, McMahon TE, et al. Spike deep mutational scanning helps predict success of SARS-CoV-2 clades. Nature. 2024. URL: https://www.semanticscholar.org/paper/7d284eda0dfe83cd8d12e73d6eebcc8c39190b5d

[3] Yang S, Luo X, Luo J, et al. A deep mutational scanning-informed protein language model predicts SARS-CoV-2 evolution dynamics with spatiotemporal resolution. Nat Microbiol. 2026. URL: https://pubmed.ncbi.nlm.nih.gov/42204343/

[4] Durumeric AEP, McCarty S, Smith J, et al. Machine Learning-Driven Simulations of the SARS-CoV-2 Fitness Landscape from Deep Mutational Scanning Experiments. J Chem Inf Model. 2026. URL: https://pubmed.ncbi.nlm.nih.gov/42089465/

[5] Nasir A, Lee D, Avena LE, et al. Predictive modeling of immune escape and antigenic grouping of SARS-CoV-2 variants. J Virol. 2026. URL: https://pubmed.ncbi.nlm.nih.gov/42037411/

[6] Dadonaite B, Crawford KH, Radford CE, et al. A pseudovirus system enables deep mutational scanning of the full SARS-CoV-2 spike. Cell. 2023. URL: https://www.semanticscholar.org/paper/63781f55b5c1a9578b5f0f60fb37b5a073991b88

[7] Shao C, Yang L, Xiao C, et al. Deep mutational scanning reveals the antibody escape and infectivity landscape of SARS-CoV-2 Omicron JN.1 and XEC receptor-binding domains. Emerg Microbes Infect. 2026. URL: https://pubmed.ncbi.nlm.nih.gov/42324717/

[8] Haddox HK, Abdel Aziz O, Galloway JG, et al. Clonal interference and changing selective pressures shape the escape of SARS-CoV-2 from hundreds of antibodies. Virus Evol. 2026. URL: https://pubmed.ncbi.nlm.nih.gov/41767406/

[9] Alcantara M, Higuchi Y, Kirita Y, et al. Deep Mutational Scanning to Predict Escape from Bebtelovimab in SARS-CoV-2 Omicron Subvariants. Vaccines. 2023. URL: https://www.semanticscholar.org/paper/ec9fca97643cfef02593156b5666ea02b8182ec6

[10] Taylor AL, Starr TN. Deep mutational scanning of recent SARS-CoV-2 variants highlights changing amino acid preferences within epistatic hotspot residues. PLoS Pathog. 2026. URL: https://pubmed.ncbi.nlm.nih.gov/42330076/

[11] Taylor AL, Starr TN. Deep mutational scanning of SARS-CoV-2 Omicron BA.2.86 and epistatic emergence of the KP.3 variant. bioRxiv. 2024. URL: https://www.semanticscholar.org/paper/3ae4032acde62fbf5c77417cf38d96332bb806bd

[12] Xia H, Wei D, Guo Z, et al. Machine Learning on the Impacts of Mutations in the SARS-CoV-2 Spike RBD on Binding Affinity to Human ACE2 Based on Deep Mutational Scanning Data. Biochemistry. 2025. URL: https://www.semanticscholar.org/paper/b4d4d22589fef92e9cc8092eac3735e1e0735f98

[13] Lamb KD, Hughes J, Lytras S, et al. From single-sequences to evolutionary trajectories: protein language models capture the evolutionary potential of SARS-CoV-2. Nat Commun. 2026. URL: https://pubmed.ncbi.nlm.nih.gov/41714330/

[14] Shlesinger D, Sadilek V, Minot M, et al. Dissecting serum polyclonal antibody escape to SARS-CoV-2 variants by deep mutational learning. Cell Rep Methods. 2026. URL: https://pubmed.ncbi.nlm.nih.gov/42030951/

[15] Soliman OA, Shahine Y, Baecker D, et al. Beyond the Mutation Abyss: Revisiting SARS-CoV-2 Receptor-Binding Domain Evolution from ACE2 Binding Optimization to Immune Epitope Remodeling. Pathogens. 2026. URL: https://pubmed.ncbi.nlm.nih.gov/41901725/

[16] Ding Z, Yuan HY. The role of receptor binding and immunity in SARS-CoV-2 fitness landscape: A modeling study. iScience. 2026. URL: https://pubmed.ncbi.nlm.nih.gov/41809055/

[17] Lei Z, Zhang X, Han J, et al. Integrating genomic epidemiology and deep mutational scanning data for prevalence forecasting of SARS-CoV-2 Omicron lineages. PLoS ONE. 2025. URL: https://www.semanticscholar.org/paper/2b8dcece39a51ca32e88bd0bf501580804532b95

[18] Lei R, Qing E, Odle AE, et al. Functional and antigenic characterization of SARS-CoV-2 spike fusion peptide by deep mutational scanning. Nat Commun. 2024. URL: https://www.semanticscholar.org/paper/88c17c37a8bcb9f62fead752aefc7fbd2d1424e3

[19] Ball C, Ramage W, Mate R, et al. Susceptibility of broad reactivity nanobodies to resistance mutations in the S2 domain of SARS-CoV-2 predicted by yeast display deep mutational scanning. Front Immunol. 2026. URL: https://www.semanticscholar.org/paper/3e2fe5c37ad31eb16dc7c472e98c2b2cad416159

[20] Dadonaite B, Brown JT, McMahon TE, et al. Full-spike deep mutational scanning helps predict the evolutionary success of SARS-CoV-2 clades. bioRxiv. 2023. URL: https://www.semanticscholar.org/paper/cc087b7a69a426bc3551db4c519c1b95c17ce43f

[21] Frank F, Keen MM, Rao A, et al. Deep mutational scanning identifies SARS-CoV-2 Nucleocapsid escape mutations of currently available rapid antigen tests. bioRxiv. 2022. URL: https://www.semanticscholar.org/paper/fd72846d8f7a79a9b9f511c18143b585237105c3 *** Disclaimer: This article is for educational and informational purposes only. It is not intended to substitute for professional veterinary advice, diagnosis, treatment, or regulatory guidance. Always consult a licensed veterinarian or qualified specialist regarding animal health, disease diagnosis, and therapeutic decisions.