Deep Mutational Scanning and Machine Learning Prediction of SARS-CoV-2 Receptor Binding Domain Escape Mutations
Introduction
The receptor binding domain (RBD) of the SARS-CoV-2 spike glycoprotein mediates viral attachment to the angiotensin-converting enzyme 2 (ACE2) receptor and constitutes a dominant target of neutralizing antibodies [1]. Mutations within the RBD can alter ACE2 binding affinity, modulate host range, and enable escape from antibody neutralization [2, 3]. The continuous emergence of viral variants with altered antigenic properties necessitates systematic approaches to prospectively identify escape mutations. Deep mutational scanning (DMS) experimentally quantifies the functional effects of all single amino acid mutations on protein folding, receptor binding, and antibody escape [1, 4]. Machine learning (ML) models trained on DMS datasets can generalize beyond measured mutations to predict viral evolutionary trajectories and inform surveillance [5, 6, 7]. This article reviews the integration of DMS and ML for predicting SARS-CoV-2 RBD escape mutations, with a focus on computational virology and veterinary implications.
Deep Mutational Scanning of the RBD
Methodology and Library Generation
DMS employs saturation mutagenesis to create libraries of variants covering all possible single amino acid substitutions in a target protein domain [1]. For the SARS-CoV-2 RBD, libraries have been constructed using yeast surface display [1], lentiviral pseudotype systems [4], and mammalian surface display [8]. In the seminal study by Starr et al. [1], a yeast-displayed RBD library was used to measure the effects of nearly 4000 mutations on protein expression and ACE2 binding affinity. Dadonaite et al. [4] extended this approach to full spike proteins using non-replicative pseudotyped lentiviruses, enabling DMS of the Omicron BA.1 and Delta spike libraries containing approximately 7000 distinct amino acid mutations each. Taylor and Starr [9] later applied DMS to the Omicron BA.2.86 RBD to characterize epistatic shifts in mutational effects.
Key Experimental Findings
DMS experiments have revealed that the RBD is highly constrained for folding and ACE2 binding, with most mutations being deleterious [1]. However, a substantial number of mutations, particularly at the ACE2 interface, are tolerated or enhance binding [1, 9]. Epistatic interactions modulate mutational effects across variant backgrounds; for example, the Q493E mutation decreased ACE2 binding in earlier backgrounds but enhanced binding in the context of L455S and F456L in the KP.3 lineage [9]. Taylor et al. [10] highlighted changing amino acid preferences within epistatic hotspot residues during the evolution of recent variants. Shao et al. [11] mapped antibody escape and infectivity landscapes for Omicron JN.1 and XEC RBDs.
Table 1 summarizes selected DMS studies from the provided literature that are directly relevant to RBD escape prediction.
| Study Reference | Platform | Domain | Key Outcome |
|---|---|---|---|
| Starr et al. 2020 [1] | Yeast display | RBD | Measured effects of all mutations on expression and ACE2 binding |
| Dadonaite et al. 2023 [4] | Pseudovirus | Full spike | High-throughput mapping of antibody escape and infection |
| Taylor & Starr 2024 [9] | Yeast display | BA.2.86 RBD | Identified epistatic sign reversal (Q493E) |
| Taylor et al. 2026 [10] | DMS (variant backgrounds) | RBD hotspots | Changing amino acid preferences with epistasis |
| Shao et al. 2026 [11] | DMS | RBD (JN.1, XEC) | Antibody escape and infectivity landscape |
| Alcantara et al. 2023 [12] | Inverted infection assay | RBD (BA.2) | Predicted bebtelovimab escape mutations |
Epistasis and Changing Selective Pressures
Haddox et al. [13] demonstrated that clonal interference and shifting selective pressures shape the escape of SARS-CoV-2 from hundreds of antibodies. Epistatic interactions can reverse the sign of mutational effects, complicating long-term predictions [9]. Therefore, DMS measurements must be periodically updated in contemporary strain backgrounds [10, 9]. Pseudovirus-based DMS of full spikes further revealed that mutations outside the RBD, such as in the N-terminal domain and S2 subunit, also contribute to ACE2 binding and antibody escape [14, 15]. Dadonaite et al. [14] identified strong serum escape mutations in the RBD at sites 357, 420, 440, 456, and 473, with individual variation in antigenic effects.
Machine Learning Prediction of Escape Mutations
DMS Data as Training Labels
DMS datasets provide quantitative functional scores (e.g., enrichment ratios, binding affinities, escape fractions) that serve as ground truth labels for supervised ML models [7]. Xia et al. [7] trained ML models on DMS data to predict the impact of RBD mutations on ACE2 binding affinity, using features derived from sequence, structure, and biophysical properties. Durumeric et al. [5] used machine learning-driven simulations of the SARS-CoV-2 fitness landscape from DMS experiments to model viral evolution.
Algorithmic Approaches
Various ML architectures have been applied to DMS data for escape prediction:
- Random forest and gradient boosting: These ensemble methods are effective for tabular data with mixed feature types and have been used to classify mutations as escape or non-escape [7, 12].
- Neural networks: Deep neural networks can capture nonlinear interactions and are used in protein language models (PLMs) trained on large sequence corpora [16, 17]. Yang et al. [16] developed a DMS-informed PLM that predicts SARS-CoV-2 evolution dynamics with spatiotemporal resolution.
- Protein language models: Lamb et al. [17] demonstrated that PLMs capture the evolutionary potential of SARS-CoV-2 from single sequences.
- Integration with genomic epidemiology: Nasir et al. [6] developed a predictive model for immune escape and antigenic grouping using ML. Shlesinger et al. [18] used deep mutational learning to dissect serum polyclonal antibody escape.
Features for Escape Prediction
ML models typically incorporate the following feature categories:
- Sequence-based: amino acid identity, substitution matrices (e.g., BLOSUM), conservation scores.
- Structure-based: solvent accessibility, secondary structure, distance to antibody epitope, hydrogen bonding networks.
- Biophysical: changes in hydrophobicity, charge, volume, and free energy of folding or binding (predicted by tools like FoldX or Rosetta).
- Evolutionary: phylogenetic conservation, co-variation, frequency in circulating variants.
Soliman et al. [2] emphasized that the RBD evolves from ACE2 binding optimization to immune epitope remodeling, features that can be encoded into predictive models. Ding and Yuan [3] modeled the role of receptor binding and immunity in the SARS-CoV-2 fitness landscape.
Predictive Performance and Validation
ML models trained on DMS data have successfully predicted escape mutations that later emerged in circulating variants [12]. Alcantara et al. [12] predicted bebtelovimab escape mutations (K444, V445, G446) using DMS of BA.2 RBD, and confirmed that XBB (V445P) and BQ.1 (K444T) were indeed resistant. Lei et al. [19] proposed CoVPF, a model integrating genomic epidemiology and DMS data for prevalence forecasting of Omicron lineages. They found that accounting for epistasis improved forecasting accuracy by 43%.
Integrative Frameworks for Surveillance and Forecasting
The combination of DMS experiments, ML predictions, and genomic surveillance enables proactive monitoring of emerging variants. The workflow typically proceeds as follows:
- Generate DMS libraries for the RBD in a relevant variant background.
- Measure functional scores (ACE2 binding, expression, antibody escape).
- Train ML models on these scores using sequence and structural features.
- Apply models to all possible single and combinatorial mutations to predict escape and fitness.
- Validate predictions against real-world genomic data (e.g., GISAID, Nextstrain).
- Forecast lineage prevalence and identify high-risk mutations for targeted surveillance.
The following Mermaid diagram illustrates this integrative pipeline.
flowchart TB
A[DMS Library Construction], > B[Functional Assays: ACE2 binding, antibody escape]
B, > C[Generate DMS Scores for Single Mutations]
C, > D[Extract Sequence, Structure, Biophysical Features]
D, > E[Train Supervised ML Model: RF, GBM, NN, PLM]
E, > F[Predict Escape and Fitness for All Mutations]
F, > G[Validate Against GISAID and Nextstrain]
G, > H[Identify Emerging Escape Variants]
H, > I[Update DMS in Current Backgrounds]
I, > A
G, > J[Forecast Lineage Prevalence]
J, > H
This workflow has been applied by Lei et al. [19] (CoVPF), who demonstrated 20.7% higher accuracy compared to genomic models alone. Durumeric et al. [5] used ML-driven simulations to recapitulate the fitness landscape and identify constraints on viral evolution. The inclusion of epistatic effects is critical, as ignoring epistasis leads to substantial forecasting errors [9, 19].
Links to External Resources
For interactive visualization of RBD mutations and antibody escape, readers should consult the 3D Protein Viewer (link available on the site). Genomic surveillance data can be accessed through GISAID (the Global Initiative on Sharing All Influenza Data) and Nextstrain, which provide real-time phylogenetic tracking of emerging mutations. These tools are essential for contextualizing DMS and ML predictions within actual viral evolution.
Implications for Veterinary Medicine and Zoonotic Risk
SARS-CoV-2 has a broad host range, infecting domestic and wild animals including cats, dogs, mink, ferrets, white-tailed deer, and non-human primates. ACE2 receptor conservation across species determines susceptibility, and RBD mutations can alter host tropism [2, 3]. Soliman et al. [2] discussed the evolution of the RBD from ACE2 binding optimization to immune epitope remodeling in various animal hosts. DMS experiments in different ACE2 backgrounds (e.g., mink, cat, deer) can inform zoonotic risk assessment. Machine learning models trained on cross-species DMS data could predict mutations that expand host range or enable spillback into human populations. These approaches are directly applicable to veterinary surveillance programs aiming to detect emerging variants in animal reservoirs.
Cross-reference articles on this site provide additional context:
- Deep Mutational Scanning and Machine Learning Predictions of SARS-CoV-2 Spike Protein Receptor Binding Domain Escape Mutants
- Deep Learning-Driven Prediction of Viral Receptor-Binding Domain Evolution and Escape Mutations
- Spike Protein Mutational Landscapes and ACE2 Binding Affinity Prediction Using Machine Learning
Conclusion
Deep mutational scanning provides high-resolution maps of mutational effects on RBD function and antibody escape. Machine learning models trained on these data can predict escape mutations with appreciable accuracy, especially when epistasis is considered. Integration with genomic surveillance platforms enables proactive forecasting of viral evolution. Future efforts should extend DMS to additional host species, incorporate combinatorial mutations, and refine ML architectures using protein language models and structural embeddings. These computational virology approaches are indispensable for anticipating the emergence of variants with altered zoonotic potential and vaccine evasion.
References
[1] Starr TN, Greaney AJ, Hilton SK, et al. Deep mutational scanning of SARS-CoV-2 receptor binding domain reveals constraints on folding and ACE2 binding. bioRxiv. 2020. https://www.semanticscholar.org/paper/5da0d587a01a7268c756d8714e570a01b4920776
[2] Soliman OA, Shahine Y, Baecker D, et al. Beyond the Mutation Abyss: Revisiting SARS-CoV-2 Receptor-Binding Domain Evolution from ACE2 Binding Optimization to Immune Epitope Remodeling. Pathogens. 2026. https://pubmed.ncbi.nlm.nih.gov/41901725/
[3] Ding Z, Yuan HY. The role of receptor binding and immunity in SARS-CoV-2 fitness landscape: A modeling study. iScience. 2026. https://pubmed.ncbi.nlm.nih.gov/41809055/
[4] Dadonaite B, Crawford KH, Radford CE, et al. A pseudovirus system enables deep mutational scanning of the full SARS-CoV-2 spike. Cell. 2023. https://www.semanticscholar.org/paper/63781f55b5c1a9578b5f0f60fb37b5a073991b88
[5] Durumeric AEP, McCarty S, Smith J, et al. Machine Learning-Driven Simulations of the SARS-CoV-2 Fitness Landscape from Deep Mutational Scanning Experiments. J Chem Inf Model. 2026. https://pubmed.ncbi.nlm.nih.gov/42089465/
[6] Nasir A, Lee D, Avena LE, et al. Predictive modeling of immune escape and antigenic grouping of SARS-CoV-2 variants. J Virol. 2026. https://pubmed.ncbi.nlm.nih.gov/42037411/
[7] Xia H, Wei D, Guo Z, et al. Machine Learning on the Impacts of Mutations in the SARS-CoV-2 Spike RBD on Binding Affinity to Human ACE2 Based on Deep Mutational Scanning Data. Biochemistry. 2025. https://www.semanticscholar.org/paper/b4d4d22589fef92e9cc8092eac3735e1e0735f98
[8] Frank F, Keen MM, Rao A, et al. Deep mutational scanning identifies SARS-CoV-2 Nucleocapsid escape mutations of currently available rapid antigen tests. bioRxiv. 2022. https://www.semanticscholar.org/paper/fd72846d8f7a79a9b9f511c18143b585237105c3 *** Disclaimer: This article is for educational and informational purposes only. It is not intended to substitute for professional veterinary advice, diagnosis, treatment, or regulatory guidance. Always consult a licensed veterinarian or qualified specialist regarding animal health, disease diagnosis, and therapeutic decisions.
[9] Taylor AL, Starr TN. Deep mutational scanning of SARS-CoV-2 Omicron BA.2.86 and epistatic emergence of the KP.3 variant. bioRxiv. 2024. https://www.semanticscholar.org/paper/3ae4032acde62fbf5c77417cf38d96332bb806bd
[10] Taylor AL, Starr TN. Deep mutational scanning of recent SARS-CoV-2 variants highlights changing amino acid preferences within epistatic hotspot residues. PLoS Pathog. 2026. https://pubmed.ncbi.nlm.nih.gov/42330076/
[11] Shao C, Yang L, Xiao C, et al. Deep mutational scanning reveals the antibody escape and infectivity landscape of SARS-CoV-2 Omicron JN.1 and XEC receptor-binding domains. Emerg Microbes Infect. 2026. https://pubmed.ncbi.nlm.nih.gov/42324717/
[12] Alcantara M, Higuchi Y, Kirita Y, et al. Deep Mutational Scanning to Predict Escape from Bebtelovimab in SARS-CoV-2 Omicron Subvariants. Vaccines. 2023. https://www.semanticscholar.org/paper/ec9fca97643cfef02593156b5666ea02b8182ec6
[13] Haddox HK, Abdel Aziz O, Galloway JG, et al. Clonal interference and changing selective pressures shape the escape of SARS-CoV-2 from hundreds of antibodies. Virus Evol. 2026. https://pubmed.ncbi.nlm.nih.gov/41767406/
[14] Dadonaite B, Brown JT, McMahon TE, et al. Spike deep mutational scanning helps predict success of SARS-CoV-2 clades. Nature. 2024. https://www.semanticscholar.org/paper/7d284eda0dfe83cd8d12e73d6eebcc8c39190b5d
[15] Dadonaite B, Brown JT, McMahon TE, et al. Full-spike deep mutational scanning helps predict the evolutionary success of SARS-CoV-2 clades. bioRxiv. 2023. https://www.semanticscholar.org/paper/cc087b7a69a426bc3551db4c519c1b95c17ce43f
[16] Yang S, Luo X, Luo J, et al. A deep mutational scanning-informed protein language model predicts SARS-CoV-2 evolution dynamics with spatiotemporal resolution. Nat Microbiol. 2026. https://pubmed.ncbi.nlm.nih.gov/42204343/
[17] Lamb KD, Hughes J, Lytras S, et al. From single-sequences to evolutionary trajectories: protein language models capture the evolutionary potential of SARS-CoV-2. Nat Commun. 2026. https://pubmed.ncbi.nlm.nih.gov/41714330/
[18] Shlesinger D, Sadilek V, Minot M, et al. Dissecting serum polyclonal antibody escape to SARS-CoV-2 variants by deep mutational learning. Cell Rep Methods. 2026. https://pubmed.ncbi.nlm.nih.gov/42030951/
[19] Lei Z, Zhang X, Han J, et al. Integrating genomic epidemiology and deep mutational scanning data for prevalence forecasting of SARS-CoV-2 Omicron lineages. PLoS ONE. 2025. https://www.semanticscholar.org/paper/2b8dcece39a51ca32e88bd0bf501580804532b97
[20] Lei R, Qing E, Odle AE, et al. Functional and antigenic characterization of SARS-CoV-2 spike fusion peptide by deep mutational scanning. Nat Commun. 2024. https://www.semanticscholar.org/paper/88c17c37a8bcb9f62fead752aefc7fbd2d1424e3
[21] Ball C, Ramage W, Mate R, et al. Susceptibility of broad reactivity nanobodies to resistance mutations in the S2 domain of SARS-CoV-2 predicted by yeast display deep mutational scanning. Front Immunol. 2026. https://www.semanticscholar.org/paper/3e2fe5c37ad31eb16dc7c472e98c2b2cad416159