Computational Analysis of Coronavirus Spike Protein Evolution: From Bat Reservoirs to Human Adaptation
Coronaviruses circulate extensively within bat reservoirs, and their spike glycoproteins mediate host cell entry through receptor recognition and membrane fusion [1]. The evolutionary trajectory of the spike protein from bat hosts to permissive intermediate or incidental hosts is governed by selective pressures at the receptor-binding interface and by immune evasion constraints [2]. Computational virology integrates phylogenetic inference, structural modeling, molecular dynamics (MD) simulations, and machine learning to dissect these evolutionary processes and to identify zoonotic spillover signatures [3, 4]. This article reviews the principal computational approaches used to analyze coronavirus spike protein evolution, with emphasis on the receptor-binding domain (RBD), angiotensin-converting enzyme 2 (ACE2) binding affinity predictions, and the detection of adaptive mutations that facilitate cross-species transmission.
Phylogenetic Surveillance and Evolutionary Tracing
Phylogenetic reconstruction of coronavirus spike gene sequences enables the mapping of transmission pathways from bat reservoirs to intermediate hosts and ultimately to incidental hosts [5, 6]. Maximum likelihood and Bayesian methods applied to full-length spike sequences or to the RBD region reveal lineage diversification and identify nodes associated with host shifts [1, 6]. Hidden Markov models (HMMs) have been employed to uncover epistatic interactions among spike mutations, thereby identifying co-evolving residue networks that constrain or facilitate adaptation [7]. Multifractal analysis of spike glycoprotein sequences further quantifies structural complexity and has been correlated with transmission potential [8]. Sequence surveillance programs that collect samples from bats, livestock, and companion animals generate the primary data for these phylogenetic analyses [1, 9]. Long-read sequencing approaches have been instrumental in resolving complex genomic architectures and detecting minority variants that may precede host jumps [9]. The integration of phylogenetic trees with metadata on host species and geographic origin allows the inference of spillover events and the characterization of viral movement across ecological niches [5].
Structural Modeling of the Spike Glycoprotein
The three-dimensional structure of the prefusion spike trimer is essential for understanding receptor engagement and conformational dynamics [10, 11]. Cryo-electron microscopy (cryo-EM) maps provide atomic-resolution models of spike variants, including those from bat coronaviruses, which can be compared with human-adapted strains [10, 12]. Computational structure prediction tools, such as AlphaFold2, have been used to generate models of novel bat coronavirus spikes when experimental structures are unavailable [1]. These models serve as templates for docking simulations and for identifying structural constraints that limit viral adaptation [2]. Normal mode analysis and elastic network models characterize the intrinsic flexibility of the spike, particularly the RBD hinge motions that govern open versus closed conformations [4]. Conformational sampling algorithms, including replica exchange MD, have been applied to explore the free energy landscapes of spike variants and to quantify the thermodynamic penalties associated with specific mutations [11, 13].
The spike glycoprotein is heavily glycosylated, and the glycan shield influences both receptor binding and antibody recognition [14]. Computational glycoproteomics integrates mass spectrometry data with MD simulations to map glycan occupancy and to predict how mutations alter glycan shielding [14]. Loss or gain of glycosylation sites in the RBD can reorganize epitopes, leading to immune escape [14, 15]. Structural modeling combined with mutational scanning reveals that certain residues in the RBD are constrained by the need to maintain ACE2 binding while simultaneously avoiding antibody neutralization [16, 17].
ACE2 Binding Affinity Predictions
The affinity between the spike RBD and the host ACE2 receptor is a key determinant of host tropism and zoonotic potential [3]. Computational free energy perturbation (FEP) and molecular mechanics generalized Born surface area (MM-GBSA) methods provide quantitative binding free energy estimates for RBD-ACE2 complexes [3, 17]. Integrative MD simulations that incorporate multiple force fields and enhanced sampling techniques have been used to compare the binding of bat coronavirus RBDs with human ACE2 orthologs from various species [3, 13]. These calculations identify residue-level energetic drivers of binding and highlight mutations that increase affinity toward human ACE2.
Machine learning models trained on deep mutational scanning data predict the effect of RBD mutations on ACE2 binding and on antibody escape [18, 19, 20]. Contrastive learning frameworks efficiently classify mutations that confer immune evasion without compromising receptor engagement [18]. Bayesian active learning combined with biophysical fitness functions has been applied to navigate the sequence space of the RBD and to predict high-fitness variants that are likely to emerge under selective pressure [19]. Matrix factorization approaches detect frequently co-occurring mutations in global sequence databases, enabling the early identification of adaptive haplotypes [20]. These computational predictions can be validated by surface plasmon resonance (SPR) or biolayer interferometry (BLI) assays using recombinant proteins, but the in silico methods themselves are now sufficiently mature to guide surveillance priorities.
Identification of Zoonotic Spillover Signatures
Zoonotic spillover requires a cascade of molecular events: the spike must bind the novel host receptor with sufficient affinity, the host protease must cleave the spike at the S1/S2 furin site (in the case of SARS-CoV-2-like viruses), and the virus must evade the innate immune response of the new host [2, 1]. Computational analyses of spike sequences from bat coronaviruses have identified pre-existing adaptations that increase compatibility with human ACE2 [1]. For example, residue substitutions at positions 493, 498, and 501 in the RBD have been linked to enhanced human ACE2 binding in multiple coronavirus lineages [3]. The presence of a polybasic furin cleavage site can be detected by sequence motif scanning, and its functional impact can be modeled by MD simulations of the S1/S2 loop [4].
Integrative computational workflows that combine phylogenetic, structural, and biophysical data are used to prioritize animal coronaviruses for zoonotic risk assessment [1]. These workflows often incorporate a decision tree that evaluates receptor binding affinity, protease cleavage efficiency, and antigenic distance from existing vaccine strains. A representative workflow is depicted in Figure 1.
flowchart TD
A[Field sampling: bats, livestock, companion animals], > B[High-throughput sequencing of coronaviruses]
B, > C[Phylogenetic reconstruction of spike gene]
C, > D[Identification of novel RBD variants]
D, > E[Structural modeling: homology or AlphaFold2]
E, > F[MD simulations of RBD-ACE2 complex]
F, > G[Binding free energy calculation]
G, > H{Threshold binding affinity?}
H, Yes, > I[Cleavage site prediction]
H, No, > J[Low risk, continue surveillance]
I, > K[Antibody escape prediction via mutational scanning]
K, > L[Risk score integration]
L, > M[Prioritize for experimental validation]
Figure 1. Integrated computational workflow for assessing zoonotic spillover risk of bat coronaviruses based on spike protein evolution.
The workflow begins with field sampling and sequencing, proceeds through phylogenetic and structural analyses, and culminates in a risk score that informs experimental validation priorities. This approach has been applied to coronaviruses identified in Colombian Caribbean bats, revealing several RBD mutations that potentially increase interspecies jumping risk [1].
Mutational Profiling and Immune Escape Dynamics
As coronaviruses adapt to a new host, they acquire mutations in the spike that confer immune evasion while preserving receptor binding [14, 16, 21]. Deep mutational scanning experiments generate comprehensive maps of single-amino-acid substitutions that affect antibody neutralization [22, 23]. These data are integrated with computational energy landscape analysis to identify residues where mutation disrupts antibody binding without destabilizing the RBD core [16, 21, 17]. The concept of "neutral frustration" has been applied to spike-antibody interfaces, revealing that immune escape hotspots often occur at positions where the binding interface is conformationally frustrated, allowing mutations to evade recognition with minimal fitness cost [21, 24].
Glycan shielding plays a protective role, and mutations that introduce new glycosylation sites on the RBD can block antibody access [14]. Conversely, loss of glycan sites can expose conserved epitopes that are targeted by broadly neutralizing antibodies [14, 25]. Bispecific antibodies designed to bind both the N-terminal domain (NTD) and the RBD have shown resilience to escape mutations, and computational modeling of their binding interfaces helps rationalize their breadth [23]. The antigenic imprinting of the host immune system shapes the evolutionary trajectory of the spike, as seen in comparisons between immunologically naive and previously infected or vaccinated individuals [26, 22]. In non-human primate models, whole-body immunoPET imaging has been used to track the biodistribution of spike-specific antibodies, providing in vivo validation of computational epitope predictions [27].
Integrative Computational Frameworks and Future Directions
The integration of multiple computational modalities is essential for a holistic understanding of spike evolution. Large language models trained on spike S1 subunit sequences can generate novel variants with desired biophysical properties, offering a generative approach to explore sequence space [28]. Directed evolution experiments in vitro, combined with molecular docking and binding mode analysis, can be used to reprogram antibodies from related coronaviruses to neutralize emerging variants [29]. Computational design of nanobodies targeting the RBD has been optimized using iterative docking and binding free energy calculations [30].
Structural constraints act as a brake on viral adaptation; the spike protein has limited space for mutation without losing functionality [2]. Epistatic interactions between residues in the RBD and the NTD create a rugged fitness landscape that computational methods are only beginning to characterize [7, 20]. Table 1 summarizes the principal computational methods discussed, their applications, and key references.
| Computational Method | Application | Key References |
|---|---|---|
| Phylogenetic inference | Host range mapping, lineage tracing | [5, 1, 6] |
| Hidden Markov models | Epistatic interaction detection | [7] |
| Multifractal analysis | Sequence complexity and transmission | [8] |
| Homology modeling / AlphaFold2 | Spike structure prediction | [10, 1, 11] |
| Molecular dynamics simulations | Conformational dynamics, binding free energy | [3, 4, 11, 13] |
| Free energy perturbation / MM-GBSA | RBD-ACE2 binding affinity | [3, 17] |
| Deep mutational scanning | Mutation effect mapping | [16, 22, 23] |
| Machine learning (contrastive, Bayesian) | Immune escape prediction, fitness prediction | [18, 19] |
| Matrix factorization | Co-occurring mutation discovery | [20] |
| Large language models | Spike S1 sequence generation | [28] |
| Structural glycoproteomics | Glycan shield analysis | [14] |
| Energy landscape analysis | Immune escape hotspot identification | [21, 24] |
Table 1. Summary of computational methods described in this review, their primary applications, and representative references.
The veterinary perspective is particularly relevant because many intermediate hosts (e.g., camelids, mink, deer, and companion animals) are themselves susceptible to coronavirus infection and can serve as reservoirs for further adaptation [1]. Computational surveillance of spike evolution in these species, using the same phylogenetic and structural tools, is critical for early detection of variants with increased zoonotic potential. The workflows described here also apply to other viral glycoproteins; for further reading, see the related article on Zoonotic Spillover Pathways and Receptor Binding Evolution in Bat Reservoirs and the review on Structural and Evolutionary Dynamics of Coronavirus Spike Protein.
Conclusion
Computational analysis of coronavirus spike protein evolution provides a powerful framework for tracing viral movements from bat reservoirs to human and animal populations. Phylogenetic surveillance, structural modeling, MD simulations, and machine learning collectively enable the prediction of RBD mutations that enhance ACE2 binding or promote immune escape. The integration of these methods into unified risk assessment workflows allows the prioritization of animal coronaviruses for experimental characterization. Continued development of generative models and deep learning approaches will further refine our ability to anticipate future spillover events and to design countermeasures against emerging coronaviruses.
References
[1] Martínez C, Echeverri-De la Hoz D, Calderón A, et al. New Coronavirus in Colombian Caribbean Bats: In Silico Analysis Reveals Possible Risk of Interspecific Jumping. Viruses. 2025. URL: https://pubmed.ncbi.nlm.nih.gov/41157592/
[2] Herzig JC, Magwira ML, Lovell SC. Structural Constraints Acting on the SARS-CoV-2 Spike Protein Reveal Limited Space for Viral Adaptation. Genome Biol Evol. 2026. URL: https://pubmed.ncbi.nlm.nih.gov/41876430/
[3] Truong Hoai L, Ilham B, Sompornpisut T, et al. Mutation-driven adaptation of ACE2-RBD binding revealed by integrative molecular dynamics analysis. J Mol Graph Model. 2026. URL: https://pubmed.ncbi.nlm.nih.gov/41839468/
[4] Xu W, Guo T, Su H. Evolutionary aspect of spike glycoprotein's conformational dynamics. Phys Chem Chem Phys. 2026. URL: https://pubmed.ncbi.nlm.nih.gov/41589433/
[5] Yang H, Wang X, Zhang L, et al. Comprehensive whole-genome characterization of SARS-CoV-2 strains in Jining China 2024-2025. Front Microbiol. 2026. URL: https://pubmed.ncbi.nlm.nih.gov/42182013/
[6] Mohamadzadeh M, Keyvani H, Latifi AM, et al. The Study of Mutations and Phylogenetics of the SARS-CoV-2 Spike Gene in Population from Tehran Province. Arch Razi Inst. 2025. URL: https://pubmed.ncbi.nlm.nih.gov/40951573/
[7] Adeniyi AE, Juyal A, Skums P, et al. Uncovering Epistatic Interactions in SARS-CoV-2 Evolution Through Hidden Markov Models. J Comput Biol. 2026. URL: https://pubmed.ncbi.nlm.nih.gov/41719088/
[8] Correia JP, Dantas MEPL, Araújo JMG, et al. Exploring the spike glycoprotein-encoding gene of SARS-CoV-2 using multifractal analysis: Unveiling structural complexity and implications for viral transmission. Comput Biol Med. 2025. URL: https://pubmed.ncbi.nlm.nih.gov/40819500/
[9] Hassan ZU, Park M, Park D, et al. Nanopore sequencing reveals the genomic diversity of the variants of concern of SARS-CoV-2 during 2021 disease outbreak in Pakistan. Sci Rep. 2025. URL: https://pubmed.ncbi.nlm.nih.gov/40750817/
[10] Mancini T, Luchetti N, Macis S, et al. Multimodal Structural Characterization of SARS-CoV-2 Spike Variants: Spectroscopic and Computational Insights. Int J Mol Sci. 2025. URL: https://pubmed.ncbi.nlm.nih.gov/41226381/
[11] da Costa CHS, de Freitas CAB, Dos Santos AM, et al. Conformational Dynamics and Binding Interactions of SARS-CoV-2 Spike Protein Variants: Omicron, XBB.1.9.2, and EG.5. J Chem Inf Model. 2025. URL: https://pubmed.ncbi.nlm.nih.gov/40643983/
[12] Feng Z, Huang J, Baboo S, et al. Structural and functional insights into the evolution of SARS-CoV-2 KP.3.1.1 spike protein. Cell Rep. 2025. URL: https://pubmed.ncbi.nlm.nih.gov/40618371/
[13] Alshahrani M, Parikh V, Foley B, et al. Integrative Computational Modeling of Distinct Binding Mechanisms for Broadly Neutralizing Antibodies Targeting SARS-CoV-2 Spike Omicron Variants: Balance of Evolutionary and Dynamic Adaptability in Shaping Molecular Determinants of Immune Escape. Viruses. 2025. URL: https://pubmed.ncbi.nlm.nih.gov/40573332/
[14] Kumar A, Yadav AJ, Tripathi T, et al. Glycan shielding and epitope reorganization drive sotrovimab resistance in SARS-CoV-2 Omicron variants. Arch Biochem Biophys. 2026. URL: https://pubmed.ncbi.nlm.nih.gov/42128042/
[15] Feng Z, Sang Z, Xiang Y, et al. One thousand SARS-CoV-2 antibody structures reveal convergent binding and near-universal immune escape. Cell Syst. 2026. URL: https://pubmed.ncbi.nlm.nih.gov/41274284/
[16] Alshahrani M, Gatlin W, Ludwick M, et al. Mechanisms of Binding and Immune Escape Resistance for Broadly Neutralizing Antibodies Targeting Distinct Conserved SARS-CoV-2 Spike Epitopes: A Hierarchical Approach Integrating Mutational Profiling and Energy Landscape Analysis. Int J Mol Sci. 2026. URL: https://pubmed.ncbi.nlm.nih.gov/42123600/
[17] Alshahrani M, Parikh V, Foley B, et al. Mutational Scanning and Binding Free Energy Computations of the SARS-CoV-2 Spike Complexes with Distinct Groups of Neutralizing Antibodies: Energetic Drivers of Convergent Evolution of Binding Affinity and Immune Escape Hotspots. Int J Mol Sci. 2025. URL: https://pubmed.ncbi.nlm.nih.gov/40003970/ *** Disclaimer: This article is for educational and informational purposes only. It is not intended to substitute for professional veterinary advice, diagnosis, treatment, or regulatory guidance. Always consult a licensed veterinarian or qualified specialist regarding animal health, disease diagnosis, and therapeutic decisions.
[18] Tenekeci S, Sezgin E, Tekir S. A Contrastive Learning Framework for Efficient Viral Escape Prediction. IEEE Trans Comput Biol Bioinform. 2026. URL: https://pubmed.ncbi.nlm.nih.gov/41843532/
[19] Huot M, Wang D, Liu J, et al. Predicting high-fitness viral protein variants with Bayesian active learning and biophysics. Proc Natl Acad Sci U S A. 2025. URL: https://pubmed.ncbi.nlm.nih.gov/40489612/
[20] Kolar MR, Kobzarenko V, Mitra D. Efficient discovery of frequently co-occurring mutations in a sequence database with matrix factorization. PLoS Comput Biol. 2025. URL: https://pubmed.ncbi.nlm.nih.gov/40273414/
[21] Alshahrani M, Parikh V, Foley B, et al. Dissecting binding and immune evasion mechanisms for ultrapotent Class I and Class 4/1 neutralizing antibodies of SARS-CoV-2 spike protein using a multi-pronged computational approach: neutral frustration architecture of binding interfaces and immune escape hotspots drives adaptive evolution. Phys Chem Chem Phys. 2026. URL: https://pubmed.ncbi.nlm.nih.gov/41623222/
[22] Dadonaite B, Burrell AR, Logue J, et al. SARS-CoV-2 neutralizing antibody specificities differ dramatically between recently infected infants and immune-imprinted individuals. J Virol. 2025. URL: https://pubmed.ncbi.nlm.nih.gov/40130874/
[23] Rubio AA, Baharani VA, Dadonaite B, et al. Bispecific antibodies targeting the N-terminal and receptor binding domains potently neutralize SARS-CoV-2 variants of concern. Sci Transl Med. 2025. URL: https://pubmed.ncbi.nlm.nih.gov/40043139/
[24] Alshahrani M, Parikh V, Foley B, et al. Dynamic mutational profiling of binding interactions and allosteric networks in conformational ensembles of the SARS-CoV-2 spike protein complexes with classes of antibodies targeting cryptic binding sites: confluence of binding and allostery determines molecular mechanisms and hotspots of immune escape. Phys Chem Chem Phys. 2025. URL: https://pubmed.ncbi.nlm.nih.gov/40842437/
[25] Alshahrani M, Parikh V, Foley B, et al. Deciphering the Mechanistic Continuum of Broadly Neutralizing Class 4 Antibodies Targeting Conserved Cryptic Epitopes of the SARS-CoV-2 Spike Protein : Operating at the Intersection of Binding, Allostery and Immune Escape. bioRxiv. 2025. URL: https://pubmed.ncbi.nlm.nih.gov/40672205/
[26] Li J, Wang X, Li J, et al. Antigenic Imprinting and Immune Response Dynamics: Neutralization Against Emerging SARS-CoV-2 Omicron Variants Following Breakthrough Infections. J Med Virol. 2025. URL: https://pubmed.ncbi.nlm.nih.gov/41217130/
[27] Detrille A, Huvelle S, van Gils MJ, et al. Whole-body visualization of SARS-CoV-2 biodistribution in vivo by immunoPET imaging in non-human primates. Nat Commun. 2025. URL: https://pubmed.ncbi.nlm.nih.gov/40118860/
[28] Rancati S, Nicora G, Bergomi L, et al. SARITA: a large language model for generating the S1 subunit of the SARS-CoV-2 spike protein. Brief Bioinform. 2025. URL: https://pubmed.ncbi.nlm.nih.gov/40755284/
[29] Jung I, Sriramulu DK, Lee SG, et al. Reprogramming the SARS-CoV-1 Neutralizing Antibody S230 to SARS-CoV-2 via Directed Evolution and Molecular Docking-Based Binding Mode Analysis. ACS Synth Biol. 2025. URL: https://pubmed.ncbi.nlm.nih.gov/41236374/
[30] Cao S, Sun B, Gao F. From immune evasion to broad in silico binding: computational optimization of SARS-CoV-2 RBD-targeting nanobody. Front Immunol. 2025. URL: https://pubmed.ncbi.nlm.nih.gov/40895570/
[31] Gupta S, Chaudhary A, Bhatnagar S. SARS-CoV-2 Evolution and Its Implications for RT-PCR Diagnostic Performance. J Med Virol. 2026. URL: https://pubmed.ncbi.nlm.nih.gov/42312714/
[32] Yeo J, Yun MR, Kim SY, et al. Structure-guided design of a bivalent SARS-CoV-2 mRNA vaccine with NTD stabilizing mutations enhances broad immunity. Front Immunol. 2025. URL: https://pubmed.ncbi.nlm.nih.gov/41660617/
[33] Dadonaite B, Harari S, Larsen BB, et al. Spike mutations that affect the function and antigenicity of recent KP.3.1.1-like SARS-CoV-2 variants. J Virol. 2025. URL: https://pubmed.ncbi.nlm.nih.gov/41081510/
[34] Nabieva E, Klink GV, Komissarov AB, et al. A highly divergent sample from a nearly extinct SARS-CoV-2 lineage in a patient with long-term COVID-19. Front Cell Infect Microbiol. 2025. URL: https://pubmed.ncbi.nlm.nih.gov/41050757/
[35] Tokhanbigli S, Salami Ghaleh S, Rahimian K, et al. Intersecting SARS-CoV-2 spike mutations and global vaccine efficacy against COVID-19. Front Immunol. 2025. URL: https://pubmed.ncbi.nlm.nih.gov/40124365/