Zubair Khalid

Virologist/Molecular Biologist | Veterinarian | Bioinformatician

Conventional & Molecular Virology • Vaccine Development • Computational Biology

Dr. Zubair Khalid is a veterinarian and virologist specializing in conventional and molecular virology, vaccine development, and computational biology. Dedicated to advancing animal health through innovative research and multi-omics approaches.

Dr. Zubair Khalid - Veterinarian, Virologist, and Vaccine Development Researcher specializing in Computational Biology, Multi-omics, Animal Health, and Infectious Disease Research

Section: Computational Biology

Predicting Zoonotic Spillover: Computational Modeling of Bat Coronavirus Spike Protein–ACE2 Receptor Binding Dynamics

Introduction

The emergence of zoonotic coronaviruses from bat reservoirs represents a persistent threat to animal and public health. Bats harbor a diverse array of coronaviruses, including alphacoronaviruses, betacoronaviruses, and gammacoronaviruses, many of which possess spike proteins capable of binding to host cell receptors [1, 2]. The angiotensin-converting enzyme 2 (ACE2) receptor serves as the primary entry portal for sarbecoviruses, while alternative receptors such as CEACAM6 have been identified for certain alphacoronaviruses [2]. Understanding the molecular determinants of spike–receptor interactions is critical for predicting which bat coronaviruses possess the capacity for cross-species transmission [3, 4].

Computational modeling provides a powerful framework for evaluating binding dynamics without requiring live virus experimentation. These methods integrate structural biology, biophysics, and machine learning to assess the compatibility between viral receptor-binding domains (RBDs) and host ACE2 orthologs [5, 3]. This article reviews the principal computational approaches used to model bat coronavirus spike protein–ACE2 interactions, including molecular docking, molecular dynamics (MD) simulations, free energy calculations, and sequence-based classifiers. Emphasis is placed on how these techniques inform surveillance strategies and vaccine design for veterinary and zoonotic contexts.

Molecular Docking and Binding Affinity Prediction

Molecular docking algorithms predict the preferred orientation of a ligand (the spike RBD) within a receptor binding site (ACE2) and estimate binding affinity. Rigid-body docking, semi-flexible docking, and fully flexible docking protocols are employed depending on the computational resources and the degree of conformational change expected upon binding [5]. For bat coronavirus spike proteins, docking studies have been used to screen candidate intermediate hosts by comparing binding scores across species [5].

The scoring functions used in docking typically combine van der Waals interactions, electrostatic potentials, desolvation penalties, and hydrogen bonding terms. Empirical scoring functions are trained on known protein–protein complexes, while knowledge-based potentials derive statistical preferences from structural databases [5]. For example, docking of porcine respiratory coronavirus spike protein to ACE2 orthologs from various livestock species revealed species-specific binding patterns that correlated with susceptibility [5].

Docking is often the first step in a computational pipeline because it is computationally inexpensive. However, docking alone cannot capture the full conformational dynamics of the spike–ACE2 interface. Therefore, docking poses are typically refined using MD simulations [3].

Molecular Dynamics Simulations

MD simulations model the time-dependent behavior of atoms in a protein–protein complex using Newtonian mechanics. For bat coronavirus spike–ACE2 systems, MD trajectories typically span tens to hundreds of nanoseconds, allowing observation of induced fit, loop rearrangements, and water-mediated interactions at the binding interface [3, 4].

Force fields such as CHARMM, AMBER, and OPLS parameterize the potential energy of the system. The choice of water model (e.g., TIP3P, SPC/E) and the treatment of long-range electrostatics via particle mesh Ewald summation are critical for accurate simulation of the charged residues that dominate the spike–ACE2 interface [4]. Root-mean-square deviation (RMSD) and root-mean-square fluctuation (RMSF) analyses quantify structural stability and residue-level flexibility. Regions of high RMSF in the RBD often correspond to loops that contact ACE2 and are under positive selection [3].

MD simulations have been used to compare the binding stability of bat coronavirus RBDs with human ACE2 versus livestock ACE2 orthologs. For instance, simulations of merbecovirus spike proteins (e.g., MERS-related coronaviruses) revealed that certain bat RBDs form stable complexes with human dipeptidyl peptidase 4 (DPP4), the receptor for MERS-CoV, indicating spillover potential [1]. Similarly, simulations of sarbecovirus RBDs from horseshoe bats have identified mutations that enhance binding to human ACE2 [3].

Free Energy Calculations (MM/GBSA)

The molecular mechanics generalized Born surface area (MM/GBSA) method estimates binding free energy from MD trajectories by averaging gas-phase molecular mechanics energies, solvation free energies (polar and nonpolar), and entropic contributions. MM/GBSA is computationally more efficient than free energy perturbation (FEP) and is widely used for ranking binding affinities across multiple RBD variants [3, 4].

In bat coronavirus studies, MM/GBSA calculations decompose the binding free energy into per-residue contributions. Hotspot residues on the spike RBD that contribute significantly to binding (e.g., those with delta G values below -2 kcal/mol) are identified. These hotspots often correspond to positions that are conserved across zoonotic strains or that mutate during host adaptation [4]. For example, systematic MM/GBSA analysis of ACE2 orthologs from 20 vertebrate species showed that residues at positions 31, 41, and 353 in ACE2 are major determinants of binding affinity for SARS-related sarbecoviruses [4].

The MM/GBSA approach has also been applied to evaluate the impact of RBD mutations observed in bat coronavirus surveillance. Mutations that increase binding affinity to human ACE2 while maintaining affinity to bat ACE2 are flagged as high-risk for spillover [3].

Sequence-Based Machine Learning Classifiers

Machine learning classifiers trained on sequence and structural features offer a high-throughput alternative to physics-based simulations. Features such as amino acid composition, evolutionary conservation scores (e.g., from PSI-BLAST profiles), predicted secondary structure, and solvent accessibility are used to train models that classify RBD–ACE2 pairs as binding or non-binding [3, 6].

The RAISE (RBD–ACE2 Interaction Spillover Evaluator) tool exemplifies this approach. RAISE integrates a random forest classifier with features derived from docking scores, MD-derived RMSF, and sequence conservation to predict the spillover potential of sarbecoviruses [3]. The tool was validated on known zoonotic and non-zoonotic coronaviruses and achieved high accuracy in distinguishing strains capable of using human ACE2 [3].

Another framework, described by Zhao et al., uses a unified machine learning pipeline that incorporates host ecological traits, viral phylogeny, and receptor binding predictions to prioritize cross-species transmission risk across an expansive host landscape [6]. This framework identified several bat coronaviruses with high predicted risk for livestock species, including swine and camelids [6].

Deep learning models, including convolutional neural networks (CNNs) applied to protein contact maps and graph neural networks (GNNs) applied to protein structures, are emerging as powerful tools for binding prediction. These models can learn complex patterns of residue–residue interactions without explicit feature engineering [3].

Integrated Computational Workflows

A comprehensive computational pipeline for predicting zoonotic spillover typically combines multiple methods in a tiered approach. The workflow begins with sequence-based screening of bat coronavirus RBDs against a panel of host ACE2 orthologs, followed by molecular docking of high-scoring candidates, MD simulation of the top complexes, MM/GBSA free energy decomposition, and finally machine learning classification to integrate all features [3, 6].

The following Mermaid diagram illustrates a typical decision tree for such a pipeline:

flowchart TD
    A[Bat coronavirus RBD sequence], > B{Sequence similarity to known zoonotic RBDs?}
    B, >|High| C[Molecular docking to host ACE2 orthologs]
    B, >|Low| D[Structure prediction (homology modeling or AlphaFold)]
    D, > C
    C, > E[Rank by docking score]
    E, > F[Select top 5-10 complexes]
    F, > G[MD simulation (50-100 ns)]
    G, > H[RMSD/RMSF analysis]
    H, > I[MM/GBSA free energy calculation]
    I, > J[Per-residue decomposition]
    J, > K[Feature extraction]
    K, > L[Machine learning classifier (e.g., RAISE)]
    L, > M{Spillover risk score}
    M, >|High| N[Prioritize for surveillance and experimental validation]
    M, >|Low| O[Archive for periodic reassessment]

This pipeline has been applied to assess the zoonotic potential of merbecoviruses circulating in African bats [1] and to evaluate the risk posed by alphacoronaviruses that use CEACAM6 as an entry receptor [2]. The integration of ecological modeling, such as the timing of coronavirus outbreaks in bat populations, further refines risk assessments by identifying periods of heightened viral shedding [7].

Implications for Surveillance and Vaccine Design

Computational predictions of spike–ACE2 binding dynamics directly inform surveillance strategies. High-risk RBD variants identified in silico can be prioritized for experimental binding assays and for inclusion in genomic surveillance panels [1, 3]. For livestock species, docking and MD simulations have identified pigs, cattle, and camels as potential intermediate hosts for certain bat coronaviruses, guiding targeted sampling in these populations [5, 4].

Vaccine design for zoonotic coronaviruses benefits from structural insights into conserved epitopes on the spike protein. Immunogenic relationship mapping using computational clustering of RBD sequences has enabled the design of minimal-set trivalent vaccines that elicit broad protection against multiple sarbecovirus lineages [8]. These vaccines target epitopes that are conserved across bat and human coronaviruses, reducing the risk of immune escape [8].

The role of host population immunity in modulating spillover risk has also been modeled computationally. Post-pandemic changes in population immunity have been shown to reduce the likelihood of emergence of zoonotic coronaviruses, as cross-reactive antibodies from prior infections or vaccinations can block entry [9]. This finding underscores the importance of maintaining high vaccine coverage in both human and animal populations at risk [9].

Bioinformatics-driven identification of viral microRNAs in bat coronaviruses has revealed additional layers of host–pathogen interaction that may influence replication and transmission [10]. These non-coding RNA elements can be targeted by computational tools to predict their effects on host gene expression and immune evasion [10].

Conclusion

Computational modeling of bat coronavirus spike protein–ACE2 receptor binding dynamics provides a robust framework for predicting zoonotic spillover risk. Molecular docking, MD simulations, MM/GBSA free energy calculations, and machine learning classifiers each contribute unique insights into the molecular determinants of host range. Integrated pipelines that combine these methods with ecological and immunological data offer the highest predictive power. Continued development of these computational tools, coupled with experimental validation, will be essential for preempting future coronavirus emergence in both veterinary and zoonotic contexts.

References

[1] Li X, Kang M, Jiao XY et al. Addressing the zoonotic threat of merbecoviruses. Nat Microbiol. 2026. URL: https://pubmed.ncbi.nlm.nih.gov/42342927/

[2] Gallo G, Di Nardo A, Lugano D et al. Heart-nosed bat alphacoronaviruses use human CEACAM6 to enter cells. Nature. 2026. URL: https://pubmed.ncbi.nlm.nih.gov/42020746/

[3] Huang H, Kong L, Zhu Y et al. RAISE: A computational tool for evaluating sarbecovirus spillover potential. Nat Commun. 2026. URL: https://pubmed.ncbi.nlm.nih.gov/42034636/

[4] Frank JA, Gan EX, Hooper WB et al. Systematic multi-reference vertebrate ACE2 sequence similarity analysis predicts species susceptibility to SARS-related sarbecoviruses. Sci Rep. 2026. URL: https://pubmed.ncbi.nlm.nih.gov/41851226/

[5] Sootichote R, Chamkasem A, Toniti W et al. Screening candidate intermediate hosts for porcine respiratory coronavirus using molecular docking. Comp Immunol Microbiol Infect Dis. 2026. URL: https://pubmed.ncbi.nlm.nih.gov/42361779/

[6] Zhao D, Wang YF, Yin ZF et al. A Unified Framework to Prioritize RNA Virus Cross-Species Transmission Risk Across an Expansive Host Landscape. Viruses. 2026. URL: https://pubmed.ncbi.nlm.nih.gov/41754554/

[7] Yu C, Hoem T, Ou TP et al. Timing is infectious: eco-epidemiological modelling of Pteropus lylei in Cambodia suggests regular annual coronavirus outbreaks in bats. Proc Biol Sci. 2026. URL: https://pubmed.ncbi.nlm.nih.gov/41592774/ *** Disclaimer: This article is for educational and informational purposes only. It is not intended to substitute for professional veterinary advice, diagnosis, treatment, or regulatory guidance. Always consult a licensed veterinarian or qualified specialist regarding animal health, disease diagnosis, and therapeutic decisions.

[8] Sun Y, Cheng Z, Wu X et al. Immunogenic relationship mapping supports a minimal-set trivalent vaccine strategy for broad sarbecovirus protection. Signal Transduct Target Ther. 2026. URL: https://pubmed.ncbi.nlm.nih.gov/41656339/

[9] Imrie RM, Bissett LA, Raveendran S et al. Post-pandemic changes in population immunity have reduced the likelihood of emergence of zoonotic coronaviruses. Nat Commun. 2026. URL: https://pubmed.ncbi.nlm.nih.gov/41876522/

[10] Mazumder S, Kapoor S, Kaur H et al. Bioinformatics-driven genome-wide identification of viral miRNAs in high spillover bat coronaviruses and their target genes in human. Arch Microbiol. 2026. URL: https://pubmed.ncbi.nlm.nih.gov/42060194/