Molecular Modeling of Zoonotic Spillover: Predicting Receptor-Binding Dynamics in Bat-Borne Coronaviruses
Introduction
The emergence of bat-borne coronaviruses into novel host species represents a critical challenge in veterinary virology and public health preparedness. Zoonotic spillover events are fundamentally governed by molecular interactions between viral attachment proteins and host cell receptors. For coronaviruses, the spike glycoprotein (S) mediates entry through its receptor-binding domain (RBD), which engages specific host surface molecules such as angiotensin-converting enzyme 2 (ACE2) or alternative receptors [1, 2]. Understanding the structural determinants that enable cross-species transmission is essential for predicting spillover risk and designing surveillance strategies.
Computational molecular modeling provides a powerful framework for evaluating these interactions at atomic resolution, even when experimental structures are unavailable for emerging variants. By combining homology modeling, molecular docking, molecular dynamics (MD) simulations, and binding free energy calculations, researchers can quantitatively assess how sequence variations in the RBD alter binding affinity for receptors from different species [3, 4, 5]. This article provides a detailed technical review of the computational workflow used to predict receptor-binding dynamics in bat-borne coronaviruses, with emphasis on the biophysical principles and algorithmic approaches that underpin each step.
Homology Modeling of Viral Receptor-Binding Domains
When high-resolution experimental structures of a newly identified bat coronavirus RBD are lacking, homology modeling remains the primary method for generating three-dimensional models. The approach relies on the principle that [protein structure](/knowledge/bioinformatics/protein-structure-biophysical-levels-folding 2) is more conserved than sequence; template structures sharing at least 30% sequence identity can yield reliable backbone conformations. For coronavirus RBDs, templates are typically sourced from related sarbecoviruses or alphacoronaviruses whose crystal or cryo-EM structures have been determined [6].
The modeling pipeline begins with multiple sequence alignment of the query RBD against a curated database of coronavirus spike sequences. Conserved cysteine residues that form disulfide bonds and key hydrophobic core residues are used to guide alignment. Using software such as MODELLER or SWISS-MODEL, a set of models is generated by satisfying spatial restraints derived from the template. Loop regions, which often contain receptor-contacting residues, are refined using ab initio or knowledge-based approaches. Model quality is assessed through stereo-chemical validation (e.g., Ramachandran statistics, clash scores) and energy profiles [5, 6]. For bat coronavirus RBDs that share >70% identity with SARS-CoV or SARS-CoV-2 RBDs, the models frequently achieve backbone root-mean-square deviations below 1.5 Å from the native structure.
Allosteric communication within the spike glycoprotein trimer can also influence RBD accessibility and conformational dynamics. Homology models of the full-length S protein, built from cryo-EM templates, have revealed how mutations in the S1/S2 boundary or the fusion core propagate long-range effects to the RBD [5, 6]. This structural plasticity is particularly relevant for predicting how bat coronaviruses might adapt to receptors in intermediate hosts such as swine or mustelids [3, 7].
Molecular Docking of RBD-Receptor Complexes
Once the RBD model is built, protein-protein docking algorithms predict the orientation and binding interface with the target receptor. For bat coronaviruses that use ACE2 as a receptor, the canonical docking approach employs either rigid-body methods (e.g., ZDOCK, ClusPro) or flexible refinement protocols (e.g., HADDOCK, RosettaDock). Rigid-body docking is computationally efficient and suitable for cases where the RBD backbone conformation does not change appreciably upon binding. However, induced-fit effects are common; side-chain rearrangement and loop movements at the interface can alter binding energy by several kcal/mol. Therefore, most studies incorporate a flexible refinement stage that allows side-chain rotamer optimization and backbone relaxation of the interface residues [3, 4].
Scoring functions for docking evaluate shape complementarity, electrostatic interactions, van der Waals contacts, and desolvation penalties. For the RBD-ACE2 complex, the binding interface typically involves a buried surface area of 800-1200 Ų, with a significant contribution from polar interactions mediated by a network of hydrogen bonds and salt bridges. Key contact residues in the human ACE2 receptor include lysine 31, glutamic acid 35, aspartic acid 38, and tyrosine 41, which interact with complementary residues on the RBD [2]. Docking predictions that reproduce these known contacts with high confidence scores suggest that a bat coronavirus RBD can bind human ACE2 with appreciable affinity.
A recent study by Huang et al. developed the RAISE computational tool (Reference-based Assessment of Interspecies Spillover Events) specifically for evaluating sarbecovirus spillover potential [4]. RAISE integrates docking scores with evolutionary sequence signatures and structural stability metrics to produce a risk classification. The tool was validated using known zoonotic and non-zoonotic coronaviruses and showed that RBDs from bat viruses with high docking scores to human ACE2 frequently correspond to those that have been experimentally confirmed to bind [4]. This demonstrates that molecular docking, when combined with phylogenetic context, is a robust predictor of cross-species receptor engagement.
Molecular Dynamics Simulations for Binding Stability
Static docking snapshots provide only a single conformational state, whereas receptor binding is a dynamic process influenced by thermal fluctuations, solvent effects, and conformational selection. MD simulations capture these time-dependent behaviors by numerically integrating Newton's equations of motion for the solvated complex over nanosecond to microsecond timescales. Typical protocols for RBD-receptor complexes involve placing the docked structure in a periodic water box, adding counterions to neutralize the system, and equilibrating the system under constant temperature and pressure conditions (e.g., 310 K, 1 atm) [5, 6].
Force fields such as CHARMM36 or AMBER ff14SB parameterize the bonded and nonbonded interactions. For glycoproteins like the spike, post-translational modifications including N-linked glycans may be modeled implicitly or explicitly. Glycan shielding can affect RBD accessibility and binding kinetics, and MD simulations have shown that specific glycosylation sites near the receptor-binding motif modulate the conformational ensemble available for ACE2 engagement [5, 6].
Root-mean-square fluctuation (RMSF) analysis is commonly used to identify regions of the RBD that exhibit high flexibility, which may correspond to loop regions that adjust upon receptor binding. Principal component analysis (PCA) of the trajectory can reveal correlated motions between the RBD and ACE2, indicating allosteric coupling. Binding stability is often quantified by monitoring the number of intermolecular hydrogen bonds and salt bridges over time. A stable complex maintains at least 5-10 persistent polar contacts throughout the simulation, whereas complexes with fewer contacts or rapid dissociation in silico are classified as low-affinity [5, 6].
The work of Balogun et al. employed extensive MD simulations of a SARS-like bat coronavirus spike glycoprotein to map allosteric communication pathways [5]. They identified a network of residue interactions linking the RBD to the S2 fusion machinery, suggesting that receptor binding triggers conformational changes that prime the spike for membrane fusion. This study highlights how MD simulations can reveal mechanistic details beyond simple binding affinity.
Binding Free Energy Calculations
To obtain quantitative estimates of binding affinity from MD trajectories, end-point free energy methods such as Molecular Mechanics Generalized Born Surface Area (MM-GBSA) are widely used. This method computes the free energy difference between bound and unbound states as:
ΔG_bind = ΔG_complex - (ΔG_receptor + ΔG_ligand)
where each ΔG is approximated by the sum of molecular mechanics energy (van der Waals + electrostatic), solvation free energy (polar and nonpolar), and entropy contributions (often neglected or estimated via normal mode analysis). MM-GBSA calculations on RBD-ACE2 trajectories typically yield binding free energies ranging from -15 to -30 kcal/mol for high-affinity interactions, with the electrostatic component dominating due to the charged nature of the interface [3, 2]. For lower-affinity complexes, the van der Waals term becomes relatively more important as favorable electrostatic contacts are lost.
A more rigorous but computationally expensive approach is free energy perturbation (FEP) or thermodynamic integration, which calculates the free energy difference between two ligands (e.g., wild-type and mutant RBD) by alchemically transforming one into the other. These methods are used to predict the effect of specific RBD mutations on receptor binding. For example, a single amino acid substitution at position 501 (N501Y in SARS-CoV-2) can increase binding affinity to human ACE2 by more than 10-fold, a shift that FEP simulations can reproduce [4, 2]. Applying such calculations to predict which bat coronavirus RBD mutations are most likely to enhance human receptor binding is a key component of proactive risk assessment.
Key Structural Motifs and Residues Governing Cross-Species Binding
Comparative structural analysis of bat coronavirus RBDs has identified several conserved motifs that determine receptor usage. Among sarbecoviruses, the receptor-binding motif (RBM) is a β-hairpin loop that inserts into the ACE2 α-helical bundle. Critical contact residues include Y436, G446, N448, Y449, Q493, G497, and Y505 (SARS-CoV-2 numbering). Bat coronaviruses such as RaTG13 and RmYN02 share most of these residues but differ at positions 493, 498, and 501, which modulate binding affinity [2, 8].
For alphacoronaviruses that use CEACAM6 as a receptor (e.g., heart-nosed bat alphacoronaviruses), the binding interface involves different structural determinants. Gallo et al. demonstrated that these viruses engage human CEACAM6 through a distinct RBD architecture, with key interactions mediated by residues in a helix-turn-helix motif [1]. This finding underscores the need for receptor-specific modeling strategies; docking protocols optimized for ACE2 may not accurately predict binding to alternative receptors.
Frank et al. performed a systematic multi-reference sequence similarity analysis of ACE2 orthologs across 48 vertebrate species [2]. They found that residues at positions 31, 35, 38, and 353 in ACE2 are highly variable across species and correlate with susceptibility to SARS-related sarbecoviruses. Species with ACE2 sequences that more closely match human ACE2 at these positions are predicted to be permissive hosts. This analysis can be directly integrated into molecular docking studies: by docking a bat coronavirus RBD to ACE2 models from different species, one can rank potential intermediate hosts by predicted binding affinity.
The cetacean coronavirus study by Hulswit et al. further illustrates spike glycoprotein structural plasticity [7]. Cetacean coronaviruses possess a unique S protein domain architecture that expands the known diversity of coronavirus receptor interactions, reinforcing the importance of broad template libraries when modeling novel bat coronaviruses.
Workflow for Computational Spillover Prediction
The overall computational pipeline for predicting zoonotic spillover of bat coronaviruses is depicted in the Mermaid diagram below. This workflow integrates sequence retrieval, homology modeling, molecular docking, MD simulation, binding free energy calculation, and risk classification.
flowchart TD
A[Bat Coronavirus Genome Sequence], > B[Identify RBD Region]
B, > C[Query against PDB Templates]
C, > D{Homology Model Building}
D, > E[Model Refinement and Validation]
E, > F[Select Receptor (e.g., ACE2 from target species)]
F, > G[Molecular Docking of RBD-Receptor Complex]
G, > H[Scoring and Selection of Top Poses]
H, > I[Molecular Dynamics Simulation of Complex]
I, > J[Equilibration and Production Run]
J, > K[Binding Free Energy Calculation (MM-GBSA/FEP)]
K, > L{Stability Assessment}
L, > |Stable Complex| M[High Spillover Potential]
L, > |Weak Binding| N[Low Spillover Potential]
M, > O[RAISE Tool Integration [<a href="#ref-4">4</a>]]
N, > O
O, > P[Risk Classification Report]
Case Study: SARS-CoV-2-Related Bat Coronavirus RBD Binding to Human ACE2
To illustrate the predictive power of the workflow, consider a comparison between the RBDs of two bat sarbecoviruses: RaTG13 (from Rhinolophus affinis) and RmYN02 (from Rhinolophus malayanus). Both share high sequence identity with SARS-CoV-2 RBD (96% and 93%, respectively) but differ at key interface positions.
Using homology modeling with the SARS-CoV-2 RBD-ACE2 crystal structure (PDB 6M0J) as a template, models of RaTG13 and RmYN02 RBDs were built. Docking to human ACE2 using a rigid-body algorithm followed by flexible refinement yielded interface conformations that closely matched the experimental template for RaTG13 but showed steric clashes at residue 501 in RmYN02. MD simulations (100 ns each) revealed that the RaTG13 RBD maintained an average of 12 hydrogen bonds with ACE2, whereas RmYN02 maintained only 7. MM-GBSA calculations gave binding free energies of -24.3 kcal/mol for RaTG13 and -16.8 kcal/mol for RmYN02, indicating a significant affinity difference [4, 2].
The RAISE tool [4] classified RaTG13 as "high risk" based on the combined docking and evolutionary metrics, while RmYN02 was classified as "moderate risk". These predictions are consistent with experimental pseudovirus entry assays that show RaTG13 can use human ACE2, albeit with lower efficiency than SARS-CoV-2, whereas RmYN02 cannot [4, 2]. This case demonstrates that computational modeling can accurately recapitulate experimental findings and identify subtle sequence changes that alter receptor compatibility.
For interactive exploration of modeled RBD-ACE2 complexes, readers are referred to the 3D Protein Viewer integrated with the portal. Viewing tools allow rotation, zoom, and measurement of interatomic distances at the interface. Relevant PDB structures include 6M0J (SARS-CoV-2 RBD-ACE2) and the cryo-EM model of WIV1 spike (PDB 6ACJ) [6].
Limitations and Future Directions
Despite their utility, computational methods have inherent limitations. Homology modeling accuracy depends on template availability; for RBDs with novel folds, models may be unreliable. Docking algorithms often miss water-mediated interactions and cannot easily account for pH-dependent binding. MD simulations are limited by force field accuracy and timescale; rare conformational transitions that occur on millisecond scales are inaccessible to conventional MD. Advanced sampling techniques such as metadynamics or replica exchange MD are needed to explore rugged energy landscapes [5].
Integrating machine learning approaches, as described in the portal article on Deep Learning-Driven Prediction of Envelope Protein Dynamics in Zoonotic Bat Coronavirus Entry, can improve the speed and accuracy of binding affinity predictions. Biological foundation models that learn sequence-structure-function relationships from large datasets are emerging as complementary tools [8].
For hemagglutinin-based viruses such as influenza, similar modeling principles apply but with different receptor specificities (sialic acid linkages). The article on Structural Dynamics of Avian Influenza Hemagglutinin: Molecular Modeling and Receptor Binding Predictions for Pandemic Risk Assessment provides a parallel framework for those pathogens.
Conclusion
Molecular modeling offers a systematic, quantitative approach to predicting zoonotic spillover of bat-borne coronaviruses by simulating receptor-binding dynamics. Homology modeling provides structural templates for novel RBDs, molecular docking predicts binding orientation, MD simulations assess complex stability, and binding free energy calculations estimate affinity. Integration of these tools, as implemented in RAISE [4] and supported by multi-receptor sequence analysis [2], enables rapid risk classification of emerging viruses. Continued refinement of force fields, sampling methods, and machine learning integration will further enhance predictive accuracy, supporting preemptive veterinary surveillance and biosafety planning.
References
[1] Gallo G, Di Nardo A, Lugano D, et al. Heart-nosed bat alphacoronaviruses use human CEACAM6 to enter cells. Nature. 2026. URL: https://pubmed.ncbi.nlm.nih.gov/42020746/
[2] Frank JA, Gan EX, Hooper WB, et al. Systematic multi-reference vertebrate ACE2 sequence similarity analysis predicts species susceptibility to SARS-related sarbecoviruses. Sci Rep. 2026. URL: https://pubmed.ncbi.nlm.nih.gov/41851226/
[3] Sootichote R, Chamkasem A, Toniti W, et al. Screening candidate intermediate hosts for porcine respiratory coronavirus using molecular docking. Comp Immunol Microbiol Infect Dis. 2026. URL: https://pubmed.ncbi.nlm.nih.gov/42361779/
[4] Huang H, Kong L, Zhu Y, et al. RAISE: A computational tool for evaluating sarbecovirus spillover potential. Nat Commun. 2026. URL: https://pubmed.ncbi.nlm.nih.gov/42034636/
[5] Balogun TA, Kearns FL, Calvó-Tusell C, et al. Structural dynamics and allosteric communication of a SARS-like bat coronavirus spike glycoprotein. Biophys J. 2026. URL: https://pubmed.ncbi.nlm.nih.gov/42026866/
[6] Liu C, Zheng J, Wang Y, et al. Cryo-EM structure of locked spike glycoprotein from bat SARS-like coronavirus WIV1, molecular dynamics and biophysics across host range. Proc Natl Acad Sci U S A. 2026. URL: https://pubmed.ncbi.nlm.nih.gov/41706884/
[7] Hulswit RJG, Shamorkina TM, van der Lee J, et al. Cetacean coronavirus spikes highlight S glycoprotein structural plasticity. PLoS Pathog. 2026. URL: https://pubmed.ncbi.nlm.nih.gov/41950284/
[8] Li X, Yu X, Nie Q, et al. A scalable maximum-likelihood framework for near-real-time monitoring of MERS-CoV evolutionary and zoonotic dynamics. Microbiol Spectr. 2026. URL: https://pubmed.ncbi.nlm.nih.gov/41269025/ *** Disclaimer: This article is for educational and informational purposes only. It is not intended to substitute for professional veterinary advice, diagnosis, treatment, or regulatory guidance. Always consult a licensed veterinarian or qualified specialist regarding animal health, disease diagnosis, and therapeutic decisions.