Computational Prediction of Zoonotic Spillover: Receptor-Binding Dynamics and Structural Modeling of Bat Coronavirus Spike Proteins
Introduction
The emergence of zoonotic coronaviruses from bat reservoirs represents a persistent threat to animal and public health. Bats harbor a remarkable diversity of coronaviruses, including sarbecoviruses and alphacoronaviruses, many of which possess spike proteins capable of engaging mammalian cell entry receptors [1, 2]. The molecular determinants of cross-species transmission are encoded primarily in the receptor-binding domain (RBD) of the viral spike glycoprotein, which mediates attachment to host cell surface receptors such as angiotensin-converting enzyme 2 (ACE2) or carcinoembryonic antigen-related cell adhesion molecule 6 (CEACAM6) [3, 4]. Computational prediction of zoonotic spillover risk has therefore become a central objective in veterinary virology and bioinformatics, enabling proactive surveillance and risk assessment before a virus establishes sustained transmission in a new host species [5, 6].
This article reviews the computational toolkit for predicting zoonotic spillover from bat coronaviruses, focusing on structural modeling of spike protein RBDs, molecular docking simulations with ACE2 orthologs from diverse vertebrate species, binding free energy calculations, and machine learning classifiers trained on structural and sequence features. The integration of these methods with sequence surveillance data from public repositories provides a framework for evaluating the spillover potential of newly discovered bat coronaviruses [5, 4, 7].
Homology Modeling of Spike Protein Receptor-Binding Domains
High-resolution experimental structures of bat coronavirus spike proteins are often unavailable, necessitating computational approaches for three-dimensional structure prediction. Homology modeling, also known as comparative modeling, constructs an atomic model of a target protein based on its sequence similarity to one or more template proteins of known structure [6]. For bat coronavirus RBDs, templates are typically selected from structurally characterized sarbecovirus spike proteins available in the Protein Data Bank. The accuracy of the resulting model depends critically on the sequence identity between the target and template, with identities above 30% generally yielding reliable backbone conformations [6].
The modeling workflow involves several steps: template identification through sequence alignment using algorithms such as BLAST or HMMER, alignment refinement to correct for insertions and deletions, model building using satisfaction of spatial restraints or fragment assembly, and model evaluation using statistical potentials (e.g., DOPE score) and stereochemical validation (e.g., Ramachandran plot analysis) [6]. Loop regions within the RBD, particularly the receptor-binding motif (RBM) that directly contacts ACE2, often require specialized refinement due to their conformational variability. Ab initio loop modeling or molecular dynamics (MD) simulations can be employed to sample alternative loop conformations and select the most energetically favorable state [6].
The resulting homology models serve as the starting point for downstream docking simulations and binding affinity predictions. For example, Rodrigues et al. [6] demonstrated that structural models of SARS-CoV-2 spike RBDs from diverse animal species could be used to predict cross-species transmission potential by evaluating the compatibility of the RBD with ACE2 orthologs from different hosts. This approach has been extended to bat coronaviruses, where homology models of spike RBDs from newly sequenced viruses are compared against known receptor-binding interfaces to assess the likelihood of ACE2 engagement [5, 4].
Molecular Docking Simulations with ACE2 Orthologs
Molecular docking is a computational technique that predicts the preferred orientation of one molecule (the ligand) when bound to another (the receptor) to form a stable complex. In the context of bat coronavirus spillover prediction, the RBD of the spike protein is treated as the ligand, and the extracellular domain of ACE2 from a target species is treated as the receptor [4, 6]. Docking algorithms sample translational and rotational degrees of freedom and evaluate each pose using a scoring function that approximates the binding free energy.
Rigid-body docking, as implemented in programs such as ZDOCK or ClusPro, assumes that both the RBD and ACE2 maintain their unbound conformations upon binding. This approximation is computationally efficient and suitable for initial screening across multiple ACE2 orthologs [6]. However, the RBD-ACE2 interface is known to undergo conformational changes upon binding, particularly in the RBM loop. Flexible docking protocols, such as those using RosettaDock or HADDOCK, allow for side-chain and backbone flexibility in the interface region, improving the accuracy of the predicted complex [6].
A critical parameter in docking simulations is the definition of the binding interface. For sarbecovirus RBD-ACE2 complexes, the interface typically involves a set of conserved contact residues on both proteins. Frank et al. [4] performed a systematic multi-reference analysis of ACE2 sequences from over 100 vertebrate species and identified key residues that determine binding compatibility with SARS-related sarbecovirus RBDs. Their analysis revealed that species with ACE2 orthologs possessing amino acid substitutions at critical contact positions (e.g., residues 31, 35, 38, 41, 82, and 353 in human ACE2 numbering) are predicted to have reduced or abrogated binding to specific RBD variants [4].
Docking simulations can be validated by comparing predicted binding poses with known co-crystal structures of RBD-ACE2 complexes. The root-mean-square deviation (RMSD) of the predicted complex relative to the experimental structure is a standard metric for assessing docking accuracy. For bat coronavirus RBDs that lack experimental complex structures, cross-validation using sequence conservation and mutational scanning data provides an alternative means of evaluating docking reliability [6].
Binding Free Energy Calculations
While docking scoring functions provide a rapid estimate of binding affinity, more rigorous calculations are required for quantitative predictions. The molecular mechanics generalized Born surface area (MM-GBSA) approach combines molecular mechanics energies with continuum solvation models to estimate the binding free energy of a protein-protein complex [6]. The MM-GBSA calculation is typically performed on snapshots extracted from MD simulations of the docked complex, allowing for the inclusion of conformational flexibility and solvent effects.
The binding free energy (Delta G bind) is decomposed into contributions from van der Waals interactions, electrostatic interactions, polar solvation energy (calculated using the generalized Born model), and nonpolar solvation energy (proportional to the solvent-accessible surface area). For RBD-ACE2 complexes, the van der Waals and electrostatic terms are generally favorable (negative), while the polar solvation term is unfavorable (positive) due to the desolvation penalty upon binding [6]. The net Delta G bind correlates with experimental binding affinities measured by surface plasmon resonance or biolayer interferometry.
Rodrigues et al. [6] applied MM-GBSA calculations to predict the binding affinity of SARS-CoV-2 RBD variants with ACE2 orthologs from multiple species, including bats, pangolins, and domestic animals. Their results demonstrated that MM-GBSA predictions could recapitulate known host range restrictions and identify species with ACE2 orthologs that are predicted to bind bat coronavirus RBDs with high affinity. This approach has been extended to bat coronaviruses discovered through metagenomic surveillance, where MM-GBSA calculations on homology-modeled RBD-ACE2 complexes provide a quantitative estimate of spillover risk [5, 4].
Machine Learning Classifiers for Spillover Potential
The integration of structural and sequence features into machine learning (ML) classifiers represents a powerful approach for predicting zoonotic spillover potential at scale. Huang et al. [5] developed RAISE, a computational tool that evaluates sarbecovirus spillover potential by combining features derived from RBD sequence, predicted RBD structure, and ACE2 ortholog compatibility. The RAISE classifier was trained on a dataset of sarbecoviruses with known host range and validated using cross-species transmission experiments.
Feature engineering for ML classifiers in this domain typically includes: (1) sequence-based features such as amino acid composition at RBD contact positions, evolutionary conservation scores (e.g., from multiple sequence alignments), and the presence of specific motifs associated with ACE2 binding; (2) structure-based features such as the electrostatic potential of the RBD surface, the shape complementarity of the RBD-ACE2 interface, and the predicted binding free energy from MM-GBSA calculations; and (3) host-based features such as the phylogenetic distance between the bat reservoir and potential target species, and the expression level of ACE2 in the respiratory tract of the target species [5, 4, 7].
Mayasich et al. [7] performed an in silico analysis of cross-species sequence variability in host interferon antiviral pathway proteins and demonstrated that sequence divergence in innate immune factors can influence susceptibility to coronavirus infection. While their study focused on host antiviral pathways rather than receptor binding directly, the principle that host genetic variation modulates spillover risk is directly applicable to ML classifier design. Incorporating features that capture host immune compatibility alongside receptor binding compatibility can improve the accuracy of spillover predictions [7].
The output of ML classifiers is typically a probability score indicating the likelihood that a given bat coronavirus can productively infect a given target species. These scores can be used to prioritize viruses for experimental validation, such as pseudovirus entry assays or live virus challenge studies in animal models [5]. The RAISE tool, for example, assigns a spillover risk score to each virus-host pair, enabling rapid triage of emerging bat coronaviruses for further investigation [5].
Sequence Surveillance and Variant Tracking
Public sequence repositories such as GISAID and GenBank serve as the foundation for computational spillover prediction. Continuous surveillance of bat coronavirus sequences allows for the detection of novel RBD variants that may exhibit altered receptor binding properties [1, 5]. Mazumder et al. [1] employed a bioinformatics-driven approach to identify viral microRNAs encoded by bat coronaviruses with high spillover potential and predicted their target genes in human cells. While their focus was on miRNA-mediated regulation, the underlying sequence analysis pipeline is directly applicable to spike protein surveillance.
Phylogenetic analysis of spike gene sequences can identify clades of bat coronaviruses that are closely related to known zoonotic viruses. For example, sarbecoviruses from Rhinolophus bats that cluster phylogenetically with SARS-CoV and SARS-CoV-2 are prioritized for structural modeling and docking simulations [5, 4]. The identification of specific amino acid substitutions in the RBD that are associated with enhanced ACE2 binding, such as the N501Y mutation in SARS-CoV-2 variants, can be tracked across bat coronavirus sequences to assess the emergence of potentially zoonotic strains [6].
The integration of sequence surveillance with structural modeling creates a dynamic risk assessment framework. As new bat coronavirus sequences are deposited, automated pipelines can generate homology models, perform docking simulations with a panel of ACE2 orthologs, and compute binding free energies without manual intervention [5]. This high-throughput approach enables real-time monitoring of spillover risk across geographic regions and host species.
Workflow for Computational Spillover Prediction
The following Mermaid diagram illustrates a typical workflow for computational prediction of zoonotic spillover from bat coronaviruses, integrating sequence surveillance, structural modeling, and machine learning classification.
flowchart TD
A[Bat Coronavirus Sequence Surveillance], > B[Spike Gene Extraction and RBD Identification]
B, > C[Sequence Alignment and Phylogenetic Analysis]
C, > D[Selection of High-Risk Clades]
D, > E[Homology Modeling of RBD]
E, > F[Molecular Docking with ACE2 Orthologs]
F, > G[MM-GBSA Binding Free Energy Calculation]
G, > H[Feature Engineering for ML Classifier]
H, > I[RAISE or Similar ML Classifier]
I, > J[Spillover Risk Score for Each Host Species]
J, > K[Experimental Validation Priority List]
K, > L[Pseudovirus Entry Assays / Animal Models]
The workflow begins with sequence surveillance and phylogenetic analysis to identify bat coronaviruses with genetic similarity to known zoonotic viruses [1, 5]. High-risk candidates proceed to homology modeling of the RBD, followed by molecular docking with ACE2 orthologs from a panel of target species [4, 6]. MM-GBSA calculations provide quantitative binding affinity estimates, which are combined with sequence and structural features as input to a machine learning classifier such as RAISE [5]. The output is a prioritized list of virus-host pairs for experimental validation.
Cross-Species Receptor Usage Beyond ACE2
While ACE2 is the primary receptor for sarbecoviruses, other bat coronaviruses utilize alternative entry receptors. Gallo et al. [3] demonstrated that heart-nosed bat alphacoronaviruses use human CEACAM6 to enter cells, revealing a previously unrecognized receptor usage pathway. This finding underscores the importance of expanding computational prediction frameworks to include multiple receptor families.
For alphacoronaviruses, structural modeling of spike protein RBDs must consider compatibility with CEACAM family members, aminopeptidase N (APN), or other entry factors [3]. Homology modeling and docking protocols can be adapted by using templates from structurally characterized alphacoronavirus spike-receptor complexes. The same principles of binding free energy calculation and ML classification apply, but the feature space must be expanded to account for receptor-specific contact residues and binding interfaces [3].
Limitations and Future Directions
Computational predictions of zoonotic spillover are subject to several limitations. Homology models may lack accuracy for highly divergent RBD sequences, particularly in loop regions that are critical for receptor binding [6]. Docking simulations may fail to capture the full conformational dynamics of the RBD-ACE2 interaction, especially when induced fit or conformational selection plays a major role [6]. MM-GBSA calculations are sensitive to the force field parameters and the solvation model employed, and they may not accurately predict absolute binding affinities for all RBD-ACE2 pairs [6].
Machine learning classifiers are limited by the quality and representativeness of the training data. Spillover events are rare, and the available dataset of bat coronaviruses with experimentally confirmed host range is small [5]. Classifiers trained on this data may suffer from overfitting or may not generalize to novel viral lineages. The inclusion of negative data (viruses that do not infect a given host) is essential for training balanced classifiers but is often underreported in the literature [5].
Future directions include the integration of molecular dynamics simulations with enhanced sampling techniques (e.g., metadynamics or umbrella sampling) to compute free energy landscapes for RBD-ACE2 binding [6]. Deep learning approaches, such as graph neural networks applied to protein structures, may improve the accuracy of binding affinity predictions by learning complex features directly from three-dimensional coordinates. The incorporation of host immune evasion data, including the ability of bat coronavirus spike proteins to evade antibody responses in target species, will further refine spillover risk assessments [7].
Conclusion
Computational prediction of zoonotic spillover from bat coronaviruses relies on an integrated pipeline of homology modeling, molecular docking, binding free energy calculations, and machine learning classification. These methods enable the rapid assessment of newly discovered bat coronaviruses for their potential to infect domestic animals and wildlife species. The RAISE tool [5], systematic ACE2 sequence analysis [4], and structural modeling approaches [6] provide a quantitative framework for prioritizing viruses for experimental validation. Continued surveillance of bat coronavirus diversity, combined with advances in computational structural biology, will enhance our ability to predict and prevent future zoonotic spillover events.
References
[1] Mazumder S, Kapoor S, Kaur H, et al. Bioinformatics-driven genome-wide identification of viral miRNAs in high spillover bat coronaviruses and their target genes in human. Arch Microbiol. 2026. URL: https://pubmed.ncbi.nlm.nih.gov/42060194/
[2] Brook CE, Boots M, Chandran K, et al. Accelerated viral dynamics in bat cell lines, with implications for zoonotic emergence. Elife. 2020. URL: https://pubmed.ncbi.nlm.nih.gov/32011232/ *** Disclaimer: This article is for educational and informational purposes only. It is not intended to substitute for professional veterinary advice, diagnosis, treatment, or regulatory guidance. Always consult a licensed veterinarian or qualified specialist regarding animal health, disease diagnosis, and therapeutic decisions.
[3] Gallo G, Di Nardo A, Lugano D, et al. Heart-nosed bat alphacoronaviruses use human CEACAM6 to enter cells. Nature. 2026. URL: https://pubmed.ncbi.nlm.nih.gov/42020746/
[4] Frank JA, Gan EX, Hooper WB, et al. Systematic multi-reference vertebrate ACE2 sequence similarity analysis predicts species susceptibility to SARS-related sarbecoviruses. Sci Rep. 2026. URL: https://pubmed.ncbi.nlm.nih.gov/41851226/
[5] Huang H, Kong L, Zhu Y, et al. RAISE: A computational tool for evaluating sarbecovirus spillover potential. Nat Commun. 2026. URL: https://pubmed.ncbi.nlm.nih.gov/42034636/
[6] Rodrigues JPGLM, Barrera-Vilarmau S, M C Teixeira J, et al. Insights on cross-species transmission of SARS-CoV-2 from structural modeling. PLoS Comput Biol. 2020. URL: https://pubmed.ncbi.nlm.nih.gov/33270653/
[7] Mayasich SA, Schumann PG, Botz M, et al. In Silico Analysis of Cross-Species Sequence Variability in Host Interferon Antiviral Pathway Proteins and SARS-CoV-2 Susceptibility. Zoonoses. 2024. URL: https://pubmed.ncbi.nlm.nih.gov/41816638/