Predicting Spike Protein Evolution in Emerging Coronaviruses Using Structural Modeling and Machine Learning
Introduction
Coronavirus spike (S) glycoproteins mediate host cell entry by binding to species-specific receptors and catalyzing membrane fusion [1]. The receptor-binding domain (RBD) within the S1 subunit is the primary target of neutralizing antibodies and a major determinant of host range [2]. Rapid mutation of the S gene, driven by error-prone RNA-dependent RNA polymerase and selective pressure from host immunity, generates extensive genetic diversity [3]. Predicting which mutations will enhance receptor affinity or enable immune escape is critical for veterinary surveillance, vaccine design, and zoonotic risk assessment [4]. Computational approaches that combine structural modeling with machine learning (ML) now offer a systematic framework for forecasting spike protein evolution [5, 6]. This article reviews the biophysical principles, algorithmic tools, and integrative workflows used to predict mutational trajectories in emerging coronaviruses, with emphasis on applications in veterinary medicine and comparative virology.
Structural Modeling Approaches
Homology Modeling and Template-Based Prediction
When experimental structures are unavailable, homology modeling constructs three-dimensional (3D) models of spike proteins using known templates [7]. The accuracy of such models depends on sequence identity between target and template, typically requiring at least 30% identity for reliable backbone prediction [1]. For coronaviruses, the conserved core of the RBD allows robust modeling across species, including bat, pangolin, and feline isolates [2]. Tools such as MODELLER and SWISS-MODEL generate full-length S protein models that can be refined through energy minimization [7]. These models serve as starting points for downstream binding energy calculations and mutational scanning [8].
AlphaFold2 and Deep Learning-Based Structure Prediction
AlphaFold2 has revolutionized structural biology by predicting protein structures with near-experimental accuracy from amino acid sequences alone [9, 10]. For coronavirus spike proteins, AlphaFold2 reliably models the RBD in both open and closed conformations, capturing conformational ensembles relevant to receptor engagement [9]. The method uses multiple sequence alignments (MSAs) and a transformer-based architecture to predict inter-residue distances and torsion angles [10]. AlphaFold2-derived models have been used to study the Omicron BA.2.86 variant, revealing compensatory epistatic interactions that maintain ACE2 binding while evading antibodies [10]. Similarly, AlphaFold2 enabled atomistic modeling of JN.1, KP.2, and KP.3 variants, identifying key mutations that alter binding energetics [9]. These predictions are essential for understanding how emerging variants balance receptor affinity and immune evasion [11, 12].
Molecular Dynamics Simulations
Molecular dynamics (MD) simulations provide atomistic resolution of spike protein flexibility and binding kinetics [9]. By applying force fields such as CHARMM or AMBER, MD trajectories reveal conformational changes in the RBD upon receptor or antibody binding [6]. Free energy perturbation (FEP) and molecular mechanics generalized Born surface area (MM/GBSA) methods quantify the binding free energy contributions of individual mutations [13, 11]. For example, MD simulations of the SARS-CoV-2 spike complex with class I antibodies identified energetic drivers of convergent evolution and immune escape hotspots [13, 11]. Multiscale modeling approaches integrate coarse-grained and all-atom simulations to capture both global domain motions and local side-chain rearrangements [13]. These simulations are computationally intensive but provide mechanistic insights that sequence-based methods alone cannot offer [8].
Machine Learning Methods
Deep Mutational Scanning Datasets
Deep mutational scanning (DMS) experimentally measures the fitness effects of thousands of single amino acid substitutions in the spike RBD [14]. DMS libraries are expressed on yeast or mammalian cell surfaces, and enrichment of variants after selection for receptor binding or antibody escape is quantified by high-throughput sequencing [14]. These datasets serve as training labels for ML models that predict mutational effects on binding affinity and immune escape [15]. For instance, DMS data from the SARS-CoV-2 RBD have been used to train classifiers that distinguish escape mutations from neutral ones [15]. The combination of DMS with structural features (e.g., solvent accessibility, residue depth, hydrogen bonding) improves predictive accuracy [6, 15].
Protein Language Models
Protein language models (PLMs) such as ESM-1b and ProtBERT learn evolutionary patterns from large sequence databases without explicit structural input [5, 16]. These models capture coevolutionary constraints and can predict the likelihood of specific mutations arising in natural evolution [5]. Lamb et al. demonstrated that PLMs trained on SARS-CoV-2 sequences can forecast evolutionary trajectories by scoring the probability of mutations under selective pressure [5]. The SARITA model, a large language model specifically designed for the S1 subunit, generates plausible mutant sequences that maintain structural integrity [16]. PLMs are particularly useful for predicting mutations that have not yet been observed but are evolutionarily accessible [5, 17].
Supervised Learning for Binding Affinity and Immune Escape
Supervised ML algorithms, including random forests, gradient boosting, and deep neural networks, are trained on structural and biophysical features to predict binding free energy changes (ΔΔG) upon mutation [6, 15]. Features commonly used include van der Waals contacts, electrostatic potentials, hydrogen bond networks, and changes in solvent-accessible surface area [6, 11]. Wang and Xi used molecular fields to model mutational effects on biochemical phenotypes, achieving high correlation with experimental binding data [6]. Alshahrani et al. applied atomistic modeling to compute binding free energies for antibody-spike complexes and identified residues that confer resistance to neutralization [13, 11, 12]. These predictions are validated against neutralization assays and structural data [18, 19].
Bayesian and Generative Models
Bayesian frameworks incorporate prior knowledge of evolutionary constraints and experimental measurements to predict the probability of future mutations [17]. Ben Geoffrey and Gracia developed a Bayesian walker coupled with a computational workflow that simulates micro-evolution of SARS-CoV-2 and generates lists of likely new mutations [17]. Generative adversarial networks (GANs) and variational autoencoders (VAEs) have also been applied to generate novel spike sequences with desired properties, such as enhanced ACE2 binding or reduced antibody recognition [16, 20]. These models can propose mutations that are structurally plausible and evolutionarily favorable [20].
Case Studies
SARS-CoV-2 Variants: From Alpha to Omicron
The emergence of SARS-CoV-2 variants of concern (VOCs) provided a real-world test for predictive models. The Alpha variant (B.1.1.7) carried the N501Y mutation in the RBD, which increases ACE2 binding affinity [21]. Structural modeling predicted that N501Y forms new π-π stacking interactions with Y41 of ACE2, a finding confirmed by MD simulations [21]. The Delta variant (B.1.617.2) acquired L452R and T478K, which enhance immune escape while maintaining receptor binding [21]. Machine learning classifiers trained on DMS data successfully identified these mutations as high-risk before their widespread circulation [15]. The Omicron variant (B.1.1.529) harbored over 30 mutations in the spike protein, many in the RBD [9]. AlphaFold2 modeling and MD simulations revealed that Omicron mutations, such as Q493R and N501Y, increase ACE2 affinity, while others like E484A and K417N reduce antibody binding [9, 10]. Epistatic interactions between mutations were critical for maintaining spike function [22]. Rochman et al. showed that epistasis at the RBD interface constrains vaccine escape, limiting the number of viable mutation combinations [22].
Omicron Sublineages and Antibody Escape
Subsequent Omicron sublineages (BA.2, BA.4/5, XBB, JN.1, KP.2, KP.3) continued to evolve under antibody pressure [9]. Deep mutational scanning combined with structural modeling identified key escape mutations in the receptor-binding motif (RBM) [18, 19]. For example, the F486V mutation in BA.4/5 reduces ACE2 binding but is compensated by R493Q reversion [9]. Machine learning models that incorporate epistatic effects accurately predicted the rise of these compensatory mutations [9, 22]. Computational design of nanobodies and antibodies that target conserved epitopes has been guided by these predictions [18, 20, 19]. The Nanosota-9 nanobody, discovered through structure-based design, shows potent neutralization against Omicron variants [19].
Animal Coronaviruses and Zoonotic Potential
Coronaviruses circulating in animals, such as feline coronavirus (FCoV), canine coronavirus (CCoV), and bat SARS-like coronaviruses, pose zoonotic risks [23, 3]. The furin cleavage site (FCS) in the S1/S2 boundary is a key determinant of host range and pathogenicity [23]. Nagy et al. analyzed FCS evolution in SARS-CoV-2 variants from humans and animals, identifying mutations that enhance cleavage efficiency [23]. Structural modeling of bat coronavirus spike proteins with human ACE2 has identified residues that enable cross-species transmission [2, 1]. Machine learning models trained on host receptor binding data can predict which animal coronaviruses have the greatest potential to infect humans or livestock [24, 8]. Sarkar et al. used sequence and structural features to predict the prospective mutational landscape of SARS-CoV-2 spike ssRNA and its evolutionary basis for host interaction [24].
HKU1 and Other Human Coronaviruses
The human coronavirus HKU1, which causes mild respiratory disease, has evolved spike mutations over time [25]. Hikmat et al. documented the emergence of the H512R substitution in the spike protein of HKU1 in southern France between 2017 and 2022 [25]. Structural modeling suggested that this mutation alters the RBD conformation and may affect receptor binding [25]. Although HKU1 is not a veterinary pathogen, its evolutionary dynamics provide a comparative model for understanding spike protein evolution in animal coronaviruses [25].
Integration and Workflow
The following Mermaid diagram illustrates a typical computational workflow for predicting spike protein evolution:
flowchart TD
A[Viral Sequence Data], > B[Multiple Sequence Alignment]
B, > C[Phylogenetic Analysis]
C, > D[Identify Conserved and Variable Sites]
D, > E[Structural Modeling: AlphaFold2 / Homology]
E, > F[Molecular Dynamics Simulations]
F, > G[Binding Free Energy Calculations]
G, > H[Deep Mutational Scanning Data]
H, > I[Feature Engineering]
I, > J[Machine Learning Training]
J, > K[Predict Mutational Effects on Binding & Escape]
K, > L[Rank Potential Emerging Mutations]
L, > M[Experimental Validation]
M, > N[Update Surveillance & Vaccine Design]
This workflow integrates sequence evolution, structural biophysics, and ML to generate actionable predictions. Each step relies on validated computational tools and experimental data [5, 6, 9, 15, 8, 17, 14].
Implications for Veterinary Virology and Zoonotic Risk
Predicting spike protein evolution is directly relevant to veterinary medicine. Coronaviruses such as porcine epidemic diarrhea virus (PEDV), transmissible gastroenteritis virus (TGEV), and feline infectious peritonitis virus (FIPV) cause significant morbidity and mortality in livestock and companion animals [3]. Structural modeling of these spike proteins can identify mutations that alter host receptor usage or immune escape [8]. Machine learning models trained on SARS-CoV-2 data can be transferred to animal coronaviruses, provided sufficient sequence and structural data are available [24, 8]. For example, the same computational pipeline used to predict Omicron mutations can be applied to predict mutations in PEDV spike that might enhance binding to porcine aminopeptidase N (pAPN) [8]. Such predictions inform vaccine strain selection and diagnostic assay design [26, 27]. The SpikePro webserver, which predicts fitness of SARS-CoV-2 variants, could be adapted for veterinary coronaviruses [27]. Additionally, understanding the structural basis of cross-species transmission helps assess zoonotic risk from bat and pangolin coronaviruses [2, 1]. The article "Zoonotic Spillover Pathways and Receptor Binding Evolution in Bat Reservoirs" provides further context on this topic.
Conclusion
The integration of structural modeling and machine learning has transformed our ability to predict spike protein evolution in emerging coronaviruses. AlphaFold2 and molecular dynamics simulations provide atomic-level insights into mutation effects on receptor binding and antibody escape. Deep mutational scanning datasets and protein language models supply the training data needed for accurate ML classifiers. Case studies on SARS-CoV-2 variants demonstrate the power of these methods to forecast evolutionary trajectories before they emerge in nature. Extending these approaches to veterinary coronaviruses will enhance surveillance, vaccine design, and pandemic preparedness. Continued development of multiscale models and generative AI will further improve predictive accuracy and enable real-time monitoring of viral evolution.
References
[1] Wan Y, Shang J, Graham R, et al. Receptor Recognition by the Novel Coronavirus from Wuhan: an Analysis Based on Decade-Long Structural Studies of SARS Coronavirus. J Virol. 2020. URL: https://pubmed.ncbi.nlm.nih.gov/31996437/ *** Disclaimer: This article is for educational and informational purposes only. It is not intended to substitute for professional veterinary advice, diagnosis, treatment, or regulatory guidance. Always consult a licensed veterinarian or qualified specialist regarding animal health, disease diagnosis, and therapeutic decisions.
[2] Armijos-Jaramillo V, Yeager J, Muslin C, et al. SARS-CoV-2, an evolutionary perspective of interaction with human ACE2 reveals undiscovered amino acids necessary for complex stability. Evol Appl. 2020. URL: https://pubmed.ncbi.nlm.nih.gov/32837536/
[3] Luo R, Delaunay-Moisan A, Timmis K, et al. SARS-CoV-2 biology and variants: anticipation of viral evolution and what needs to be done. Environ Microbiol. 2021. URL: https://pubmed.ncbi.nlm.nih.gov/33769683/
[4] Van Egeren D, Novokhodko A, Stoddard M, et al. Risk of rapid evolutionary escape from biomedical interventions targeting SARS-CoV-2 spike protein. PLoS One. 2021. URL: https://pubmed.ncbi.nlm.nih.gov/33909660/
[5] Lamb KD, Hughes J, Lytras S, et al. From single-sequences to evolutionary trajectories: protein language models capture the evolutionary potential of SARS-CoV-2. Nat Commun. 2026. URL: https://pubmed.ncbi.nlm.nih.gov/41714330/
[6] Wang B, Xi Z. Modeling the Mutational Effects on Biochemical Phenotypes of SARS-CoV-2 Using Molecular Fields. Biomolecules. 2025. URL: https://pubmed.ncbi.nlm.nih.gov/41301456/
[7] Wierbowski SD, Liang S, Liu Y, et al. A 3D structural SARS-CoV-2-human interactome to explore genetic and drug perturbations. Nat Methods. 2021. URL: https://pubmed.ncbi.nlm.nih.gov/34845387/
[8] Thakur S, Planeta Kepp K, Mehra R. Predicting virus Fitness: Towards a structure-based computational model. J Struct Biol. 2023. URL: https://pubmed.ncbi.nlm.nih.gov/37931730/
[9] Raisinghani N, Alshahrani M, Gupta G, et al. AlphaFold2 Modeling and Molecular Dynamics Simulations of the Conformational Ensembles for the SARS-CoV-2 Spike Omicron JN.1, KP.2 and KP.3 Variants: Mutational Profiling of Binding Energetics Reveals Epistatic Drivers of the ACE2 Affinity and Escape Hotspots of Antibody Resistance. Viruses. 2024. URL: https://pubmed.ncbi.nlm.nih.gov/39339934/
[10] Raisinghani N, Alshahrani M, Gupta G, et al. AlphaFold2-Enabled Atomistic Modeling of Structure, Conformational Ensembles, and Binding Energetics of the SARS-CoV-2 Omicron BA.2.86 Spike Protein with ACE2 Host Receptor and Antibodies: Compensatory Functional Effects of Binding Hotspots in Modulating Mechanisms of Receptor Binding and Immune Escape. J Chem Inf Model. 2024. URL: https://pubmed.ncbi.nlm.nih.gov/38373700/
[11] Alshahrani M, Parikh V, Foley B, et al. Mutational Scanning and Binding Free Energy Computations of the SARS-CoV-2 Spike Complexes with Distinct Groups of Neutralizing Antibodies: Energetic Drivers of Convergent Evolution of Binding Affinity and Immune Escape Hotspots. Int J Mol Sci. 2025. URL: https://pubmed.ncbi.nlm.nih.gov/40003970/
[12] Alshahrani M, Parikh V, Foley B, et al. Quantitative Characterization and Prediction of the Binding Determinants and Immune Escape Hotspots for Groups of Broadly Neutralizing Antibodies Against Omicron Variants: Atomistic Modeling of the SARS-CoV-2 Spike Complexes with Antibodies. Biomolecules. 2025. URL: https://pubmed.ncbi.nlm.nih.gov/40001552/
[13] Alshahrani M, Parikh V, Foley B, et al. Multiscale Modeling and Dynamic Mutational Profiling of Binding Energetics and Immune Escape for Class I Antibodies with SARS-CoV-2 Spike Protein: Dissecting Mechanisms of High Resistance to Viral Escape Against Emerging Variants. Viruses. 2025. URL: https://pubmed.ncbi.nlm.nih.gov/40872744/
[14] Zahradník J, Marciano S, Shemesh M, et al. SARS-CoV-2 variant prediction and antiviral drug design are enabled by RBD in vitro evolution. Nat Microbiol. 2021. URL: https://pubmed.ncbi.nlm.nih.gov/34400835/
[15] Liu Y, He Z, Jia L, et al. Predicting Natural Evolution in the RBD Region of the Spike Glycoprotein of SARS-CoV-2 by Machine Learning. Viruses. 2024. URL: https://pubmed.ncbi.nlm.nih.gov/38543841/
[16] Rancati S, Nicora G, Bergomi L, et al. SARITA: a large language model for generating the S1 subunit of the SARS-CoV-2 spike protein. Brief Bioinform. 2025. URL: https://pubmed.ncbi.nlm.nih.gov/40755284/
[17] Ben Geoffrey AS, Gracia J. A Bayesian walker coupled with a computational workflow that generates the micro-evolution of SARS-CoV-2 and makes predictions of new mutations that can emerge. J Biomol Struct Dyn. 2024. URL: https://pubmed.ncbi.nlm.nih.gov/37771150/
[18] Cerdán L, Silva K, Rodríguez-Martín D, et al. Integrating immune library probing with structure-based computational design to develop potent neutralizing nanobodies against emerging SARS-CoV-2 variants. MAbs. 2025. URL: https://pubmed.ncbi.nlm.nih.gov/40329514/
[19] Ye G, Bu F, Saxena D, et al. Discovery of Nanosota-9 as anti-Omicron nanobody therapeutic candidate. PLoS Pathog. 2024. URL: https://pubmed.ncbi.nlm.nih.gov/39591462/
[20] Kang Y, Jin K, Pan L. AI designed, mutation resistant broad neutralizing antibodies against multiple SARS-CoV-2 strains. Sci Rep. 2025. URL: https://pubmed.ncbi.nlm.nih.gov/40319133/
[21] Peisahovics F, Rohaim MA, Munir M. Structural topological analysis of spike proteins of SARS-CoV-2 variants of concern highlight distinctive amino acid substitution patterns. Eur J Cell Biol. 2022. URL: https://pubmed.ncbi.nlm.nih.gov/36156414/
[22] Rochman ND, Faure G, Wolf YI, et al. Epistasis at the SARS-CoV-2 Receptor-Binding Domain Interface and the Propitiously Boring Implications for Vaccine Escape. mBio. 2022. URL: https://pubmed.ncbi.nlm.nih.gov/35289643/
[23] Nagy A, Basiouni S, Parvin R, et al. Evolutionary insights into the furin cleavage sites of SARS-CoV-2 variants from humans and animals. Arch Virol. 2021. URL: https://pubmed.ncbi.nlm.nih.gov/34258664/
[24] Sarkar A, Ghosh TA, Bandyopadhyay B, et al. Prediction of Prospective Mutational Landscape of SARS-CoV-2 Spike ssRNA and Evolutionary Basis of Its Host Interaction. Mol Biotechnol. 2025. URL: https://pubmed.ncbi.nlm.nih.gov/38619800/
[25] Hikmat H, Le Targa L, Boschi C, et al. Five-Year (2017-2022) Evolutionary Dynamics of Human Coronavirus HKU1 in Southern France With Emergence of Viruses Harboring Spike H512R Substitution. J Med Virol. 2025. URL: https://pubmed.ncbi.nlm.nih.gov/39949218/
[26] Wang E, Chakraborty AK. Design of immunogens for eliciting antibody responses that may protect against SARS-CoV-2 variants. PLoS Comput Biol. 2022. URL: https://pubmed.ncbi.nlm.nih.gov/36156540/
[27] Cia G, Kwasigroch JM, Rooman M, et al. SpikePro: a webserver to predict the fitness of SARS-CoV-2 variants. Bioinformatics. 2022. URL: https://pubmed.ncbi.nlm.nih.gov/35861514/
[28] Feng S, O'Brien A, Chen DY, et al. SARS-CoV-2 nonstructural protein 6 from Alpha to Omicron: evolution of a transmembrane protein. mBio. 2023. URL: https://pubmed.ncbi.nlm.nih.gov/37477426/
[29] Aleem A, Akbar Samad AB, Vaqar S. Emerging Variants of SARS-CoV-2 and Novel Therapeutics Against Coronavirus (COVID-19)(Archived). PubMed. 2026. URL: https://pubmed.ncbi.nlm.nih.gov/34033342/