What is Dr. Zubair Khalid's research focus?

Dr. Zubair Khalid specializes in molecular virology, mRNA vaccine development, and computational biology, with a focus on avian pathogens like IBDV and Avian Reovirus.

Where is Dr. Zubair Khalid currently working?

Dr. Zubair Khalid is a Postdoctoral Research Associate at the University of Maryland (UMD), specifically within the Department of Animal and Avian Sciences.

Computational Docking and Binding Affinity Prediction for Emerging Zoonotic Coronaviruses: From Spike Protein Dynamics to Host Receptor Interactions

Introduction

Zoonotic coronaviruses circulating in wildlife reservoirs, particularly in bat populations, represent a persistent threat for cross-species transmission into domestic animals and other mammalian hosts [1]. The molecular gatekeeper governing host range and entry is the viral spike (S) glycoprotein, which mediates attachment to host cell receptors and subsequent membrane fusion [2]. Understanding the biophysical principles that govern spike-receptor interactions is essential for assessing zoonotic risk, designing surveillance strategies, and developing countermeasures such as subunit vaccines and entry inhibitors [3, 4]. This review provides a technical overview of computational methods employed to predict binding affinity between coronavirus spike proteins and their cognate host receptors, with emphasis on molecular docking, molecular dynamics (MD) simulations, free energy calculations, and the integration of these approaches into structural bioinformatics pipelines.

Structural Basis of Spike Protein Receptor Binding

Coronavirus spike glycoproteins are class I fusion proteins that form trimeric structures on the virion surface [2]. The S1 subunit contains the receptor-binding domain (RBD), which engages host cell receptors. For many zoonotic coronaviruses related to severe acute respiratory syndrome coronavirus (SARS-CoV), the primary receptor is angiotensin-converting enzyme 2 (ACE2) [5]. Structural studies have revealed that the RBD adopts two major conformational states: a "standing" or "up" conformation that is accessible for receptor binding, and a "lying" or "down" conformation that is inaccessible [2]. These dynamic transitions are critical for understanding infectivity and immune evasion.

The receptor-binding motif (RBM), a subregion within the RBD, forms the direct contact interface with the N-terminal helix of ACE2 [4]. The precise geometry of this interface, including hydrogen bond networks, hydrophobic contacts, and salt bridges, determines the binding affinity and consequently the efficiency of viral entry [5]. For example, screening studies using molecular docking have identified candidate intermediate hosts for porcine respiratory coronavirus by evaluating the compatibility of the spike RBD with ACE2 orthologs from different species [6]. Similarly, bat-derived coronaviruses such as HKU5-CoV-2 have been subjected to structural dynamics analyses to evaluate their spike protein interactions with host receptors [7].

Computational Docking: Principles and Applications

Molecular docking is a computational technique that predicts the preferred orientation and binding conformation of a ligand (in this context, the viral RBD) when bound to a receptor (e.g., ACE2) to form a stable complex [6, 5]. The goal is to generate a binding pose and to estimate the binding affinity using scoring functions. Docking algorithms can be broadly classified into rigid-body docking, where both receptor and ligand are treated as rigid objects, and flexible docking, where conformational changes in side chains or the entire backbone are permitted.

One widely used docking approach employs AutoDock-family programs, which use genetic algorithms and empirical free energy force fields to sample conformational space [6]. Another popular platform is HADDOCK (High Ambiguity Driven protein-protein DOCKing), which uses biochemical and biophysical information such as chemical shift perturbation data or interfacial residue predictions to drive the docking process. For protein-protein systems like spike RBD-ACE2, HADDOCK can incorporate ambiguous interaction restraints derived from mutagenesis or sequence conservation data.

The accuracy of docking predictions is highly dependent on the quality of the input structures. Experimentally determined structures from X-ray crystallography or cryo-electron microscopy (cryo-EM) provide the most reliable starting coordinates. When experimental structures are unavailable, homology modeling or deep learning-based methods such as AlphaFold can generate structural models [5]. The integration of these predicted structures into docking workflows is discussed in detail in related resources including Structural Prediction of Viral Envelope Glycoproteins Using AlphaFold2.

Scoring functions used in docking can be categorized as force field-based, empirical, or knowledge-based. Force field-based functions sum the van der Waals, electrostatic, and internal strain energies of the complex. Empirical functions are parameterized using experimental binding affinity data, while knowledge-based functions derive potentials of mean force from known protein-ligand crystal structures. For spike-ACE2 complexes, a combination of these approaches often yields the most reliable results.

Molecular Dynamics Simulations of Spike-Receptor Complexes

While docking provides a static snapshot of the binding interface, MD simulations capture the dynamic behavior of spike-receptor complexes over time [7, 5]. By integrating Newtonian equations of motion for all atoms in the system, MD reveals transient interactions, conformational rearrangements, and the stability of the bound state. Simulations are typically performed using all-atom force fields (e.g., CHARMM, AMBER, OPLS) with explicit solvent models (e.g., TIP3P water) and physiological ionic concentrations.

For zoonotic coronavirus research, MD simulations have been employed to investigate how mutations in the spike RBD alter the dynamics of the ACE2 interface [5]. These simulations can compute root mean square fluctuation (RMSF) profiles to identify flexible regions, principal component analysis (PCA) to characterize dominant motions, and hydrogen bond occupancy to quantify stable interactions. The results from such analyses inform predictions of host tropism and spillover potential.

An illustrative case is the bat coronavirus HKU5-CoV-2, for which structural dynamics and energetics studies have been used to examine the stability of its spike protein and to screen for potential antiviral compounds that could block receptor binding [7]. Similarly, computational pipelines that integrate MD with docking have been used to predict binding affinity changes induced by mutations in the RBM [4].

Free Energy Calculations for Binding Affinity Prediction

Quantitative binding affinity prediction is critical for comparing the receptor engagement potential of different coronavirus strains. Several thermodynamic approaches derived from MD simulations can estimate the free energy of binding (delta G binding). The most rigorous method is free energy perturbation (FEP), which calculates the free energy difference between two states (e.g., wild-type vs. mutant) by gradually alchemically transforming one into the other along a coupling parameter.

FEP is computationally expensive but has been applied to predict how specific amino acid substitutions in the spike RBD affect ACE2 binding [5]. A more computationally efficient method is molecular mechanics generalized Born surface area (MM/GBSA) or molecular mechanics Poisson-Boltzmann surface area (MM/PBSA). These end-point methods calculate the binding free energy as the sum of gas-phase molecular mechanics energies (internal, electrostatic, and van der Waals), solvation free energies, and entropic contributions, averaged over snapshots from MD trajectories.

For example, MM/GBSA calculations on bat coronavirus spike-ACE2 complexes have revealed that electrostatic complementarity is a major driver of cross-species binding [5]. These calculations can also be used to rank the binding affinity of a single spike variant against multiple host receptors, guiding risk assessment for emerging viruses.

Structural Bioinformatics Pipelines for Variant Analysis

The integration of sequence data from public repositories with structural modeling and docking forms the backbone of modern computational virology. The following workflow summarizes a typical pipeline for evaluating emerging zoonotic coronavirus spike proteins.

flowchart TD
    A[Sequence Data: GISAID, NCBI], > B[Phylogenetic Classification & Host Association]
    B, > C[Spike Protein Sequence Extraction]
    C, > D[Structural Modeling: Homology Modeling or AlphaFold]
    D, > E[Receptor Structure Preparation: Host ACE2 Orthologs]
    E, > F[Molecular Docking: AutoDock, HADDOCK]
    F, > G[Scoring & Ranking of Binding Poses]
    G, > H{Convergence?}
    H, Yes, > I[MD Simulations for Complex Refinement]
    H, No, > F
    I, > J[Free Energy Calculations: MM/GBSA or FEP]
    J, > K[Binding Affinity Prediction & Host Tropism Ranking]
    K, > L[Experimental Validation & Risk Assessment]

Sequence data are retrieved from GISAID or NCBI GenBank [1, 2]. The spike protein is identified, and the RBD is delineated based on multiple sequence alignment with well-characterized templates. Structural models are built using either homology modeling or deep learning methods such as AlphaFold [5]. The receptor structures, typically ACE2 orthologs from candidate host species, are prepared from existing crystallographic templates or by modeling from related structures.

Docking simulations are performed iteratively to ensure convergence of the binding pose [6]. The top-ranked complexes are then subjected to MD simulations to assess dynamic stability. Finally, free energy calculations provide quantitative estimates of binding affinity, yielding predictions of spillover risk [5]. This pipeline has direct relevance to veterinary investigation of emerging zoonoses, as detailed in companion resources such as Computational Prediction of Host Tropism and Receptor Binding Dynamics in Emerging Zoonotic Coronaviruses.

Implications for Pandemic Preparedness and Rational Vaccine Design

The ability to computationally predict spike-ACE2 binding affinity has profound implications for veterinary and public health preparedness. By screening bat coronavirus sequences [1] against a panel of ACE2 orthologs from domestic animals (e.g., pigs, cats, dogs, cattle), it is possible to identify which viral strains possess the molecular prerequisites for efficient entry into a given species. This information can guide surveillance efforts, informing which animal populations should be monitored most closely.

Furthermore, rational vaccine design benefits from structural insights into spike protein dynamics [3]. Immunogens can be engineered to stabilize prefusion conformations that expose conserved neutralization epitopes. Computational design platforms can evaluate candidate mutations to increase thermostability and immunogenicity. For example, the design of immunogens that elicit broadly protective antibody responses against multiple variant strains relies on understanding the structural constraints of the RBM [3]. D-peptides that mimic the ACE2 binding helix have also been designed computationally as direct inhibitors that block the interaction with high specificity [4]. These approaches are discussed further in Structure-Guided Antiviral Design.

Additionally, computational pipelines can prioritize existing or novel small molecules for repurposing as entry inhibitors. For instance, compounds such as lycorine have been investigated for their ability to inhibit RNA-dependent RNA polymerase activity in emerging coronaviruses [8]. While lycorine targets a different viral protein, the same virtual screening and docking workflows can be applied to the spike protein to identify receptor blockers [7].

Limitations and Future Directions

Despite advances, computational predictions of binding affinity are subject to several limitations. Docking scoring functions often fail to capture entropic contributions, solvent reorganization effects, and the impact of post-translational modifications such as glycosylation [2]. MD simulations are constrained by force field inaccuracies, limited sampling timescales, and the computational cost of simulating large glycoprotein complexes. Free energy calculations are sensitive to force field parameters and require substantial computational resources.

Emerging techniques such as machine learning-driven scoring functions, coarse-grained MD simulations, and enhanced sampling methods (e.g., metadynamics, replica exchange) are beginning to address these challenges. The integration of experimental structures (e.g., cryo-EM) with computational predictions, as described in Integrating Cryo-EM and Molecular Dynamics Simulations to Elucidate Glycan Shield Dynamics, will continue to improve prediction accuracy. Furthermore, the application of biological foundation models offers a pathway to predict host tropism directly from sequence data, bypassing some of the structural modeling bottlenecks Biological Foundation Models for Predicting Host Tropism in Emerging Zoonotic Viruses.

Conclusion

Computational docking and binding affinity prediction are essential tools for characterizing the molecular basis of zoonotic coronavirus host range. By combining structural modeling, molecular dynamics simulations, and free energy calculations, researchers can evaluate the spillover potential of emerging bat coronaviruses and guide the development of targeted interventions. The continued refinement of these computational methods, supported by high-quality experimental data, will be critical for proactive veterinary surveillance and pandemic preparedness.

References

[1] Hemnani M, Karatas M, Cruz AVS, et al. Metagenomic analysis of viral diversity in Portuguese bats. Vet Res Commun. 2025. URL: https://pubmed.ncbi.nlm.nih.gov/40974374/

[2] Hills FR, Geoghegan JL, Bostina M. Architects of infection: A structural overview of SARS-related coronavirus spike glycoproteins. Virology. 2025. URL: https://pubmed.ncbi.nlm.nih.gov/39983449/

[3] Wang E, Chakraborty AK. Design of immunogens for eliciting antibody responses that may protect against SARS-CoV-2 variants. PLoS Comput Biol. 2022. URL: https://pubmed.ncbi.nlm.nih.gov/36156540/

[4] Valiente PA, Nim S, Lee J, et al. Targeting the Receptor-Binding Motif of SARS-CoV-2 with D-Peptides Mimicking the ACE2 Binding Helix: Lessons for Inhibiting Omicron and Future Variants of Concern. J Chem Inf Model. 2022. URL: https://pubmed.ncbi.nlm.nih.gov/35875887/

[5] Shum MH, Lee Y, Tam L, et al. Binding affinity between coronavirus spike protein and human ACE2 receptor. Comput Struct Biotechnol J. 2024. URL: https://pubmed.ncbi.nlm.nih.gov/38304547/

[6] Sootichote R, Chamkasem A, Toniti W, et al. Screening candidate intermediate hosts for porcine respiratory coronavirus using molecular docking. Comp Immunol Microbiol Infect Dis. 2026. URL: https://pubmed.ncbi.nlm.nih.gov/42361779/

[7] Dubey A, Kumar M, Tufail A. Inhibiting viral entry of bat-derived coronavirus HKU5-CoV-2: Targeting spike protein S1 subunit with FDA-approved antivirals-A structural dynamics and energetics study. Bioorg Chem. 2025. URL: https://pubmed.ncbi.nlm.nih.gov/40865231/

[8] Jin YH, Min JS, Jeon S, et al. Lycorine, a non-nucleoside RNA dependent RNA polymerase inhibitor, as potential treatment for emerging coronavirus infections. Phytomedicine. 2021. URL: https://pubmed.ncbi.nlm.nih.gov/33376043/ *** Disclaimer: This article is for educational and informational purposes only. It is not intended to substitute for professional veterinary advice, diagnosis, treatment, or regulatory guidance. Always consult a licensed veterinarian or qualified specialist regarding animal health, disease diagnosis, and therapeutic decisions.

[9] Kumar N, Kaushik R, Zhang KYJ, et al. A novel consensus-based computational pipeline for screening of antibody therapeutics for efficacy against SARS-CoV-2 variants of concern including Omicron variant. Proteins. 2023. URL: https://pubmed.ncbi.nlm.nih.gov/36629264/