Zubair Khalid

Virologist/Molecular Biologist | Veterinarian | Bioinformatician

Conventional & Molecular Virology • Vaccine Development • Computational Biology

Dr. Zubair Khalid is a veterinarian and virologist specializing in conventional and molecular virology, vaccine development, and computational biology. Dedicated to advancing animal health through innovative research and multi-omics approaches.

Dr. Zubair Khalid - Veterinarian, Virologist, and Vaccine Development Researcher specializing in Computational Biology, Multi-omics, Animal Health, and Infectious Disease Research

Section: Computational Biology

Structural Prediction of Zoonotic Coronavirus Spike Glycoproteins: From Rosetta to AlphaFold2

Introduction

Zoonotic coronaviruses originating from bat reservoirs and other wildlife species pose a persistent threat to animal and public health [1, 2]. The spike glycoprotein (S protein) is the primary determinant of host range and tissue tropism, mediating attachment to host cell receptors and subsequent membrane fusion [2, 3]. Accurate prediction of the three-dimensional (3D) structure of spike glycoproteins from emerging zoonotic coronaviruses is essential for understanding receptor binding mechanisms, assessing spillover risk, and designing intervention strategies [4, 5]. This article reviews the evolution of computational methods for spike protein structure prediction, from classical homology modeling and Rosetta-based approaches to contemporary deep learning frameworks such as AlphaFold2 and ESMFold. Emphasis is placed on applications to bat-derived coronaviruses and intermediate hosts (e.g., pangolins) and on how predicted structures inform host receptor binding and zoonotic potential.

Structural Biology of Coronavirus Spike Glycoproteins

Coronavirus spike glycoproteins are large class I fusion proteins that form homotrimeric spikes on the virion surface [2]. Each monomer comprises two functional subunits: the N-terminal S1 subunit, which contains the receptor-binding domain (RBD), and the C-terminal S2 subunit, which drives membrane fusion [2, 3]. The RBD undergoes conformational transitions between a "closed" (receptor-inaccessible) and an "open" (receptor-accessible) state [2]. Structural characterization of these domains is critical because the RBD directly contacts host receptors such as angiotensin-converting enzyme 2 (ACE2) or alternative entry factors [1, 3, 5]. For example, heart-nosed bat alphacoronaviruses have been shown to utilize human CEACAM6 for cell entry, highlighting the diversity of receptor usage among zoonotic coronaviruses [1]. Similarly, bat sarbecoviruses can enter cells via ACE2-independent pathways, and sequence determinants of human-cell entry have been identified through combined laboratory and computational network science approaches [3].

Computational Methods for Spike Protein Structure Prediction

Homology Modeling and Rosetta

Before the advent of deep learning, comparative (homology) modeling was the primary method for predicting spike protein structures when experimental templates were available [2, 5]. The Rosetta suite offered a flexible platform for ab initio and template-based modeling, incorporating fragment assembly and energy minimization [2]. Rosetta has been used to model the S1 subunit of bat-derived coronavirus HKU5-CoV-2, enabling subsequent molecular dynamics simulations and virtual screening of FDA-approved antivirals targeting the S1 subunit [4]. Rosetta's strength lies in its physically based energy function, which can refine models to near-experimental accuracy when sufficient computational sampling is applied [2]. However, for highly divergent spike sequences lacking close homologs in the Protein Data Bank, Rosetta often fails to produce reliable models [2].

Molecular Dynamics Simulations

Molecular dynamics (MD) simulations provide a means to explore the conformational landscape of spike glycoproteins and their complexes with host receptors [4, 6, 5]. All-atom MD simulations, typically performed using GROMACS or NAMD, allow calculation of binding free energies via methods such as Molecular Mechanics Poisson-Boltzmann Surface Area (MM/PBSA) [4, 6]. For instance, Dubey et al. used MD simulations and MM/PBSA to evaluate the binding of FDA-approved drugs to the S1 subunit of HKU5-CoV-2, identifying compounds that stabilize the closed conformation and inhibit receptor attachment [4]. Similarly, Bouback et al. employed pharmacophore-based virtual screening combined with quantum mechanics calculations and MD simulations to identify natural antiviral candidates against the MERS-CoV S1 N-terminal domain [6]. MD simulations also reveal the dynamics of RBD opening and closing, which is critical for receptor accessibility [2, 5]. Lam et al. used MD to predict that SARS-CoV-2 spike protein forms stable complexes with ACE2 orthologues from a broad range of mammals, supporting the notion of broad host tropism [5].

Deep Learning Approaches: AlphaFold2 and ESMFold

The release of AlphaFold2 represented a paradigm shift in protein structure prediction [2]. AlphaFold2 uses an end-to-end deep learning architecture that integrates multiple sequence alignments (MSAs) and pairwise residue features to predict atomic coordinates with near-experimental accuracy [2]. For coronavirus spike glycoproteins, AlphaFold2 has been applied to model full-length S proteins from novel bat coronaviruses, including those with low sequence identity to known structures [2]. Hills et al. provided a structural overview of SARS-related coronavirus spike glycoproteins, demonstrating that AlphaFold2 models can recapitulate key features such as the RBD core and the fusion peptide region [2]. ESMFold, a language model-based predictor, offers faster inference by using protein language model embeddings instead of MSAs, making it suitable for high-throughput screening of spike variants [2]. Both methods have been integrated into pipelines for predicting receptor binding and host range [2, 5].

Comparative Performance and Limitations

The following table summarizes key characteristics of the major computational methods used for spike protein structure prediction.

Method Type Input Requirements Strengths Limitations
Homology Modeling Template-based Sequence alignment to known structure Fast, interpretable Requires close template; poor for novel folds
Rosetta Hybrid (ab initio + template) Sequence, optional template Physically realistic energy function; good for refinement Computationally expensive; sampling limitations
Molecular Dynamics Physics-based Starting structure (experimental or model) Captures dynamics; binding free energy High computational cost; limited timescales
AlphaFold2 Deep learning (MSA-based) Sequence, MSA High accuracy; near-experimental for single chains Requires deep MSA; large memory; less accurate for multimers
ESMFold Deep learning (language model) Sequence only Fast; no MSA needed Slightly lower accuracy than AlphaFold2; less interpretable

Workflow for Structural Prediction and Zoonotic Risk Assessment

A typical computational pipeline for assessing zoonotic potential of a novel coronavirus spike protein integrates multiple methods. The following Mermaid diagram illustrates a decision tree for such a workflow.

flowchart TD
    A[Novel Coronavirus Spike Sequence], > B{Close homolog in PDB?}
    B, >|Yes| C[Homology Modeling / Rosetta]
    B, >|No| D[AlphaFold2 / ESMFold]
    C, > E[Model Refinement with MD]
    D, > E
    E, > F[Receptor Docking (e.g., ACE2, CEACAM6)]
    F, > G[Binding Free Energy Calculation (MM/PBSA)]
    G, > H{High binding affinity?}
    H, >|Yes| I[High Zoonotic Risk: Further In Vitro Testing]
    H, >|No| J[Low Zoonotic Risk: Monitor Sequence Evolution]
    I, > K[Virtual Screening for Entry Inhibitors]
    J, > L[Periodic Reassessment with New Sequences]

This workflow begins with sequence acquisition from metagenomic surveillance or synthetic reconstruction [7]. If a close structural template exists, homology modeling or Rosetta is used; otherwise, deep learning models are employed [2]. The resulting structure is refined through short MD simulations to relieve steric clashes and optimize side-chain conformations [4, 6]. Receptor docking, using tools such as HADDOCK or ClusPro, predicts the binding mode between the spike RBD and host receptor orthologues [5]. Binding free energy calculations (MM/PBSA) then quantify the strength of interaction [4]. High predicted affinity suggests potential for cross-species transmission, warranting experimental validation [3, 5]. Conversely, low affinity indicates lower immediate risk, though continued surveillance is necessary because single mutations can alter binding [3, 7].

Applications to Emerging Zoonotic Coronaviruses

Bat Alphacoronaviruses and CEACAM6 Usage

Gallo et al. demonstrated that heart-nosed bat alphacoronaviruses use human CEACAM6 as an entry receptor [1]. Structural prediction of the spike RBD from these viruses, using AlphaFold2, revealed a binding interface distinct from ACE2-dependent coronaviruses [1]. Docking simulations predicted that the RBD interacts with the N-terminal domain of CEACAM6, and MM/PBSA calculations confirmed favorable binding energies [1]. This finding expands the known repertoire of coronavirus receptors and underscores the need for broad structural surveillance.

Bat Sarbecoviruses and ACE2-Independent Entry

Khaledian et al. combined laboratory experiments with computational network science to identify sequence determinants of human-cell entry in ACE2-independent bat sarbecoviruses [3]. They used Rosetta to model spike RBD variants and MD simulations to assess stability and receptor binding [3]. Their results indicated that a small number of amino acid substitutions in the RBD can confer the ability to use human ACE2 or alternative receptors [3]. This work highlights the utility of computational prediction in prioritizing variants for experimental testing.

HKU5-CoV-2 and Antiviral Targeting

Dubey et al. focused on the bat-derived merbecovirus HKU5-CoV-2, which is closely related to MERS-CoV [4]. They employed homology modeling to build the S1 subunit structure, followed by MD simulations and MM/PBSA calculations to screen FDA-approved drugs [4]. Several compounds, including lopinavir and remdesivir, were predicted to bind the S1 subunit with high affinity, potentially inhibiting viral entry [4]. This study exemplifies how structural prediction can accelerate antiviral discovery for emerging zoonotic threats.

Broad Host Range Prediction

Lam et al. used a combination of homology modeling and MD simulations to predict that the SARS-CoV-2 spike RBD can bind ACE2 orthologues from a wide range of mammalian species, including livestock, companion animals, and wildlife [5]. Their binding free energy calculations correlated with experimental pseudovirus entry data, validating the computational approach [5]. Such predictions inform veterinary surveillance and biosecurity measures.

Cross-Linking to Related Articles

Readers are encouraged to explore the following related articles on this portal for deeper context:

Future Directions

The field is moving toward integration of multiple prediction methods into automated pipelines that can process metagenomic data in real time [2, 3]. Protein language models, such as ESM-2, are being fine-tuned on viral glycoprotein sequences to predict mutational effects on structure and binding [2]. Additionally, AlphaFold3 and related methods now support prediction of protein-ligand and protein-nucleic acid complexes, which may enable direct modeling of spike-receptor interactions without separate docking steps [2]. Experimental validation remains essential, but computational predictions can triage candidates for laboratory testing, reducing time and cost [3, 7]. The continued evolution of zoonotic coronaviruses, as demonstrated by synthetic reconstruction studies [7], underscores the need for robust, scalable structural prediction tools.

References

[1] Gallo G, Di Nardo A, Lugano D et al. Heart-nosed bat alphacoronaviruses use human CEACAM6 to enter cells. Nature. 2026. URL: https://pubmed.ncbi.nlm.nih.gov/42020746/

[2] Hills FR, Geoghegan JL, Bostina M. Architects of infection: A structural overview of SARS-related coronavirus spike glycoproteins. Virology. 2025. URL: https://pubmed.ncbi.nlm.nih.gov/39983449/

[3] Khaledian E, Ulusan S, Erickson J et al. Sequence determinants of human-cell entry identified in ACE2-independent bat sarbecoviruses: A combined laboratory and computational network science approach. EBioMedicine. 2022. URL: https://pubmed.ncbi.nlm.nih.gov/35405384/

[4] Dubey A, Kumar M, Tufail A. Inhibiting viral entry of bat-derived coronavirus HKU5-CoV-2: Targeting spike protein S1 subunit with FDA-approved antivirals-A structural dynamics and energetics study. Bioorg Chem. 2025. URL: https://pubmed.ncbi.nlm.nih.gov/40865231/

[5] Lam SD, Bordin N, Waman VP et al. SARS-CoV-2 spike protein predicted to form complexes with host receptor protein orthologues from a broad range of mammals. Sci Rep. 2020. URL: https://pubmed.ncbi.nlm.nih.gov/33020502/

[6] Bouback TA, Pokhrel S, Albeshri A et al. Pharmacophore-Based Virtual Screening, Quantum Mechanics Calculations, and Molecular Dynamics Simulation Approaches Identified Potential Natural Antiviral Drug Candidates against MERS-CoV S1-NTD. Molecules. 2021. URL: https://pubmed.ncbi.nlm.nih.gov/34443556/

[7] Sheahan T, Rockx B, Donaldson E et al. Pathways of cross-species transmission of synthetically reconstructed zoonotic severe acute respiratory syndrome coronavirus. J Virol. 2008. URL: https://pubmed.ncbi.nlm.nih.gov/18579604/ *** Disclaimer: This article is for educational and informational purposes only. It is not intended to substitute for professional veterinary advice, diagnosis, treatment, or regulatory guidance. Always consult a licensed veterinarian or qualified specialist regarding animal health, disease diagnosis, and therapeutic decisions.