What is Dr. Zubair Khalid's research focus?

Dr. Zubair Khalid specializes in molecular virology, mRNA vaccine development, and computational biology, with a focus on avian pathogens like IBDV and Avian Reovirus.

Where is Dr. Zubair Khalid currently working?

Dr. Zubair Khalid is a Postdoctoral Research Associate at the University of Maryland (UMD), specifically within the Department of Animal and Avian Sciences.

Deep Mutational Scanning and Machine Learning for Predicting SARS-CoV-2 Spike Protein Escape Mutations from Antibody Neutralization

Introduction

The continuous evolution of viral surface glycoproteins under antibody selection pressure necessitates robust computational frameworks for predicting escape mutations. Deep mutational scanning (DMS) combined with machine learning (ML) has emerged as a powerful approach to systematically map the functional consequences of amino acid substitutions in the receptor-binding domain (RBD) of the SARS-CoV-2 spike protein [1, 2]. These methods enable the prospective identification of mutations that reduce antibody neutralization, thereby informing vaccine strain selection and surveillance efforts. Although the primary application of these techniques has focused on human SARS-CoV-2, the underlying principles and computational pipelines are directly transferable to veterinary coronaviruses such as feline coronavirus (FCoV), canine respiratory coronavirus (CRCoV), and betacoronaviruses of mink and other animal hosts. The present article provides a detailed, technical review of the DMS and ML workflow used to characterize spike antibody escape, the biophysical basis of fitness landscapes, and the translational relevance for veterinary vaccine design.

Deep Mutational Scanning: Experimental Generation of Escape Data

DMS libraries comprehensively represent all single amino acid substitutions in the spike RBD. These libraries are generated via site-directed mutagenesis or error-prone PCR and expressed on the surface of yeast or mammalian cells [2, 3]. The library is exposed to neutralizing antibodies or polyclonal sera, and cells bearing RBD mutants that escape binding are enriched by fluorescence-activated cell sorting. Deep sequencing of the sorted and unsorted populations quantifies the enrichment ratio for each variant, producing a functional score that reflects both antibody escape and intrinsic fitness (e.g., expression level and ACE2 receptor affinity). A key challenge is disentangling escape from fitness effects; recent studies have addressed this by incorporating orthogonal fitness measurements [4, 5].

The high-throughput nature of DMS permits simultaneous evaluation of thousands of mutations. For example, Shao et al. performed DMS on the RBDs of Omicron JN.1 and XEC sublineages, revealing that epistatic interactions within hotspot residues alter amino acid preferences over time [2]. Similarly, Taylor and Starr demonstrated that changing epistatic environments shift the escape profiles of previously neutral mutations [1]. These findings emphasize the need for continuous DMS as the virus evolves.

Construction of Fitness Landscapes from Sequencing Data

The deep sequencing reads from DMS experiments are processed to calculate mutational fitness effects. A typical pipeline involves aligning reads to a reference spike sequence, counting variant frequencies, and computing log2 enrichment ratios. These ratios are normalized to account for library biases and batch effects. The resulting matrix of scores constitutes a fitness landscape that maps each point mutation to a combined measure of escape and replicative capacity [4, 5].

Huot et al. formalized this landscape using the concept of constrained evolutionary funnels, wherein immune escape is limited by structural and functional constraints on the RBD [5]. Their model predicts that only mutations that preserve ACE2 binding while disrupting antibody recognition are evolutionarily viable. Durumeric et al. extended these ideas by using ML-driven simulations to interpolate fitness values for mutations not directly assayed, leveraging the DMS data as training labels [4]. This approach yields a continuous landscape over the entire sequence space of the RBD.

Machine Learning Models for Escape Prediction

Machine learning classifiers are trained on DMS-derived escape scores to predict whether a given mutation will reduce neutralization by a specific antibody or polyclonal serum. Common algorithms include random forests, gradient-boosted trees, and feedforward neural networks; each takes as input features such as the wild-type amino acid, substituted residue, structural environment (e.g., solvent accessibility, distance to the antibody paratope), and evolutionary conservation [6, 7]. Feature engineering often incorporates biophysical properties (hydrophobicity, charge, side-chain volume) and structural descriptors from the spike trimer.

Nasir et al. applied random forest models to DMS data from multiple RBD variants and successfully grouped antigenically related strains, demonstrating that ML can recapitulate serological clustering without direct binding assays [6]. Shlesinger et al. developed a deep mutational learning framework that predicts polyclonal antibody escape directly from DMS measurements of monoclonal antibody escapes, using a neural network to combine scores across epitope regions [7]. Their model accurately forecasted the escape of emerging Omicron sublineages several months before they dominated global surveillance.

A more recent advance involves protein language models (PLMs) that capture evolutionary patterns from millions of sequences. Yang et al. introduced a DMS-informed PLM that integrates experimental escape data into the model training, producing a spatiotemporal prediction of spike evolution [8]. Lamb et al. similarly demonstrated that PLMs trained on large sequence databases can predict the evolutionary potential of the RBD, identifying mutations that are likely to arise under antibody pressure [9]. These models do not require explicit structural features because they learn the grammar of protein sequences from large corpora.

Table 1 summarizes the main ML approaches used, their input modalities, and their demonstrated performance.

ML Model Type	Input Features	Output Predictions	Key Reference
Random Forest	Amino acid identity, structural environment, conservation	Binary escape classification	[6]
Deep Neural Network	DMS enrichment ratios, epitope weights	Continuous escape score	[7]
Protein Language Model	Unaligned spike sequences (hundreds of thousands)	Evolutionary fitness, mutation probability	[8, 9]
Simulation-driven landscape	DMS fitness values, biophysical constraints	Interpolated mutational effects	[4]

Links to Real-World Genomic Surveillance

The predictive power of DMS and ML is validated by comparing model outputs to sequences deposited in global repositories such as GISAID. The emergence of Omicron sublineages with convergent RBD mutations (e.g., K417N, E484A, N501Y) was anticipated by DMS experiments that flagged these substitutions as both escaping neutralization and maintaining receptor binding [1, 2]. Ding and Yuan demonstrated a quantitative link between modeled RBD fitness and observed population dynamics: mutations predicted to have high escape capacity and high receptor affinity consistently rose in frequency [10]. Soliman et al. reviewed how the evolutionary optimization of ACE2 binding versus immune remodeling is captured by fitness landscapes derived from DMS [11].

Clonal interference among co-circulating variants is a major determinant of escape dynamics. Haddox et al. showed that multiple antibody-escape mutations compete within viral populations, and the selective advantage of each mutation depends on the prevailing antibody repertoire [3]. Their work underscores that ML models must incorporate this interference to accurately predict fixed escape mutations.

Veterinary Applications and Comparative Host-Range Insights

The computational pipeline described above is directly applicable to coronaviruses that affect companion animals and livestock. For feline infectious peritonitis virus (FIPV), the spike protein RBD mediates entry via feline aminopeptidase N (fAPN). DMS libraries of the FIPV RBD could be screened against neutralizing antibodies from vaccinated or recovered cats to identify escape mutations that might compromise vaccine efficacy. Similarly, CRCoV, a respiratory pathogen of dogs, uses ACE2 as a receptor, and its spike RBD can be subjected to the same DMS and ML workflow.

The structural homology of the spike RBD across betacoronaviruses allows transfer of ML models trained on SARS-CoV-2 to animal coronaviruses after fine-tuning with a small number of species-specific DMS measurements. This transfer learning approach reduces the experimental burden while retaining predictive accuracy. Moreover, zoonotic risk models for SARS-like viruses can incorporate DMS data from bat coronaviruses to predict mutations that enable human ACE2 binding and antibody escape [see Deep Mutational Scanning and Computational Modeling of SARS-CoV-2 Spike Protein Receptor-Binding Domain Escape from Neutralizing Antibodies and Deep Mutational Scanning and Computational Protein Design for Predicting Zoonotic Spillover Risk in SARS-like Coronaviruses].

Veterinary vaccine manufacturers can use DMS-ML derived escape maps to proactively update vaccine antigens. For example, if a circulating animal coronavirus strain harbors a mutation that falls within a predicted high-escape region, the vaccine strain can be modified to include the mutated epitope, preserving neutralizing breadth in the face of antigenic drift.

Workflow Overview

The following Mermaid diagram illustrates the integrated computational and experimental workflow for predicting spike escape mutations.

flowchart TD
    A[Generate DMS library<br/> of spike RBD mutants], > B[Express on cell surface<br/> and sort with antibodies]
    B, > C[Deep sequence sorted<br/> and unsorted populations]
    C, > D[Compute enrichment ratios<br/> -> fitness landscape]
    D, > E[Extract features:<br/> sequence, structure, conservation]
    E, > F[Train ML classifier<br/> (Random Forest, DNN, PLM)]
    F, > G[Predict escape score<br/> for each point mutation]
    G, > H[Validate against GISAID<br/> circulating variant frequencies]
    H, > I[Update vaccine antigen<br/> design for animal coronaviruses]

Conclusion

The integration of deep mutational scanning with machine learning provides a systematic, high-resolution method for predicting antibody escape mutations in the SARS-CoV-2 spike RBD. This approach successfully anticipates viral evolution, identifies critical residues for immune evasion, and directly informs vaccine strain updates. The experimental and computational frameworks are highly transferable to veterinary coronaviruses, enabling proactive antigen design for species with pandemic or epizootic potential. Continued refinement of protein language models and incorporation of epistatic constraints will further improve the accuracy and generalizability of these predictive tools.

References

[1] Taylor AL, Starr TN. Deep mutational scanning of recent SARS-CoV-2 variants highlights changing amino acid preferences within epistatic hotspot residues. PLoS Pathog. 2026. URL: https://pubmed.ncbi.nlm.nih.gov/42330076/

[2] Shao C, Yang L, Xiao C, et al. Deep mutational scanning reveals the antibody escape and infectivity landscape of SARS-CoV-2 Omicron JN.1 and XEC receptor-binding domains. Emerg Microbes Infect. 2026. URL: https://pubmed.ncbi.nlm.nih.gov/42324717/

[3] Haddox HK, Abdel Aziz O, Galloway JG, et al. Clonal interference and changing selective pressures shape the escape of SARS-CoV-2 from hundreds of antibodies. Virus Evol. 2026. URL: https://pubmed.ncbi.nlm.nih.gov/41767406/

[4] Durumeric AEP, McCarty S, Smith J, et al. Machine Learning-Driven Simulations of the SARS-CoV-2 Fitness Landscape from Deep Mutational Scanning Experiments. J Chem Inf Model. 2026. URL: https://pubmed.ncbi.nlm.nih.gov/42089465/

[5] Huot M, Wang D, Shakhnovich E, et al. Constrained evolutionary funnels shape viral immune escape. Proc Natl Acad Sci U S A. 2026. URL: https://pubmed.ncbi.nlm.nih.gov/41984829/

[6] Nasir A, Lee D, Avena LE, et al. Predictive modeling of immune escape and antigenic grouping of SARS-CoV-2 variants. J Virol. 2026. URL: https://pubmed.ncbi.nlm.nih.gov/42037411/

[7] Shlesinger D, Sadilek V, Minot M, et al. Dissecting serum polyclonal antibody escape to SARS-CoV-2 variants by deep mutational learning. Cell Rep Methods. 2026. URL: https://pubmed.ncbi.nlm.nih.gov/42030951/

[8] Yang S, Luo X, Luo J, et al. A deep mutational scanning-informed protein language model predicts SARS-CoV-2 evolution dynamics with spatiotemporal resolution. Nat Microbiol. 2026. URL: https://pubmed.ncbi.nlm.nih.gov/42204343/

[9] Lamb KD, Hughes J, Lytras S, et al. From single-sequences to evolutionary trajectories: protein language models capture the evolutionary potential of SARS-CoV-2. Nat Commun. 2026. URL: https://pubmed.ncbi.nlm.nih.gov/41714330/ *** Disclaimer: This article is for educational and informational purposes only. It is not intended to substitute for professional veterinary advice, diagnosis, treatment, or regulatory guidance. Always consult a licensed veterinarian or qualified specialist regarding animal health, disease diagnosis, and therapeutic decisions.

[10] Ding Z, Yuan HY. The role of receptor binding and immunity in SARS-CoV-2 fitness landscape: A modeling study. iScience. 2026. URL: https://pubmed.ncbi.nlm.nih.gov/41809055/

[11] Soliman OA, Shahine Y, Baecker D, et al. Beyond the Mutation Abyss: Revisiting SARS-CoV-2 Receptor-Binding Domain Evolution from ACE2 Binding Optimization to Immune Epitope Remodeling. Pathogens. 2026. URL: https://pubmed.ncbi.nlm.nih.gov/41901725/