Structural Prediction of Zoonotic Coronavirus Spike Glycoproteins: From Rosetta to AlphaFold2
Introduction
Zoonotic coronaviruses originating from bat reservoirs and other wildlife species pose a persistent threat to animal and public health [1, 2]. The spike glycoprotein (S protein) is the primary determinant of host range and tissue tropism, mediating attachment to host cell receptors and subsequent membrane fusion [2, 3]. Accurate prediction of the three-dimensional (3D) structure of spike glycoproteins from emerging zoonotic coronaviruses is essential for understanding receptor binding mechanisms, assessing spillover risk, and designing intervention strategies [4, 5]. This article reviews the evolution of computational methods for spike protein structure prediction, from classical homology modeling and Rosetta-based approaches to contemporary deep learning frameworks such as AlphaFold2 and ESMFold. Emphasis is placed on applications to bat-derived coronaviruses and intermediate hosts (e.g., pangolins) and on how predicted structures inform host receptor binding and zoonotic potential.
Structural Biology of Coronavirus Spike Glycoproteins
Coronavirus spike glycoproteins are large class I fusion proteins that form homotrimeric spikes on the virion surface [2]. Each monomer comprises two functional subunits: the N-terminal S1 subunit, which contains the receptor-binding domain (RBD), and the C-terminal S2 subunit, which drives membrane fusion [2, 3]. The RBD undergoes conformational transitions between a "closed" (receptor-inaccessible) and an "open" (receptor-accessible) state [2]. Structural characterization of these domains is critical because the RBD directly contacts host receptors such as angiotensin-converting enzyme 2 (ACE2) or alternative entry factors [1, 3, 5]. For example, heart-nosed bat alphacoronaviruses have been shown to utilize human CEACAM6 for cell entry, highlighting the diversity of receptor usage among zoonotic coronaviruses [1]. Similarly, bat sarbecoviruses can enter cells via ACE2-independent pathways, and sequence determinants of human-cell entry have been identified through combined laboratory and computational network science approaches [3].
Computational Methods for Spike Protein Structure Prediction
Homology Modeling and Rosetta
Before the advent of deep learning, comparative (homology) modeling was the primary method for predicting spike protein structures when experimental templates were available [2, 5]. The Rosetta suite offered a flexible platform for ab initio and template-based modeling, incorporating fragment assembly and energy minimization [2]. Rosetta has been used to model the S1 subunit of bat-derived coronavirus HKU5-CoV-2, enabling subsequent molecular dynamics simulations and virtual screening of FDA-approved antivirals targeting the S1 subunit [4]. Rosetta's strength lies in its physically based energy function, which can refine models to near-experimental accuracy when sufficient computational sampling is applied [2]. However, for highly divergent spike sequences lacking close homologs in the Protein Data Bank, Rosetta often fails to produce reliable models [2].
Molecular Dynamics Simulations
Molecular dynamics (MD) simulations provide a means to explore the conformational landscape of spike glycoproteins and their complexes with host receptors [4, 6, 5]. All-atom MD simulations, typically performed using GROMACS or NAMD, allow calculation of binding free energies via methods such as Molecular Mechanics Poisson-Boltzmann Surface Area (MM/PBSA) [4, 6]. For instance, Dubey et al. used MD simulations and MM/PBSA to evaluate the binding of FDA-approved drugs to the S1 subunit of HKU5-CoV-2, identifying compounds that stabilize the closed conformation and inhibit receptor attachment [4]. Similarly, Bouback et al. employed pharmacophore-based virtual screening combined with quantum mechanics calculations and MD simulations to identify natural antiviral candidates against the MERS-CoV S1 N-terminal domain [6]. MD simulations also reveal the dynamics of RBD opening and closing, which is critical for receptor accessibility [2, 5]. Lam et al. used MD to predict that SARS-CoV-2 spike protein forms stable complexes with ACE2 orthologues from a broad range of mammals, supporting the notion of broad host tropism [5].
Deep Learning Approaches: AlphaFold2 and ESMFold
The release of AlphaFold2 represented a paradigm shift in protein structure prediction [2]. AlphaFold2 uses an end-to-end deep learning architecture that integrates multiple sequence alignments (MSAs) and pairwise residue features to predict atomic coordinates with near-experimental accuracy [2]. For coronavirus spike glycoproteins, AlphaFold2 has been applied to model full-length S proteins from novel bat coronaviruses, including those with low sequence identity to known structures [2]. Hills et al. provided a structural overview of SARS-related coronavirus spike glycoproteins, demonstrating that AlphaFold2 models can recapitulate key features such as the RBD core and the fusion peptide region [2]. ESMFold, a language model-based predictor, offers faster inference by using protein language model embeddings instead of MSAs, making it suitable for high-throughput screening of spike variants [2]. Both methods have been integrated into pipelines for predicting receptor binding and host range [2, 5].
Comparative Performance and Limitations
The following table summarizes key characteristics of the major computational methods used for spike protein structure prediction.
| Method | Type | Input Requirements | Strengths | Limitations |
|---|---|---|---|---|
| Homology Modeling | Template-based | Sequence alignment to known structure | Fast, interpretable | Requires close template; poor for novel folds |
| Rosetta | Hybrid (ab initio + template) | Sequence, optional template | Physically realistic energy function; good for refinement | Computationally expensive; sampling limitations |
| Molecular Dynamics | Physics-based | Starting structure (experimental or model) | Captures dynamics; binding free energy | High computational cost; limited timescales |
| AlphaFold2 | Deep learning (MSA-based) | Sequence, MSA | High accuracy; near-experimental for single chains | Requires deep MSA; large memory; less accurate for multimers |
| ESMFold | Deep learning (language model) | Sequence only | Fast; no MSA needed | Slightly lower accuracy than AlphaFold2; less interpretable |
Workflow for Structural Prediction and Zoonotic Risk Assessment
A typical computational pipeline for assessing zoonotic potential of a novel coronavirus spike protein integrates multiple methods. The following Mermaid diagram illustrates a decision tree for such a workflow.
flowchart TD
A[Novel Coronavirus Spike Sequence], > B{Close homolog in PDB?}
B, >|Yes| C[Homology Modeling / Rosetta]
B, >|No| D[AlphaFold2 / ESMFold]
C, > E[Model Refinement with MD]
D, > E
E, > F[Receptor Docking (e.g., ACE2, CEACAM6)]
F, > G[Binding Free Energy Calculation (MM/PBSA)]
G, > H{High binding affinity?}
H, >|Yes| I[High Zoonotic Risk: Further In Vitro Testing]
H, >|No| J[Low Zoonotic Risk: Monitor Sequence Evolution]
I, > K[Virtual Screening for Entry Inhibitors]
J, > L[Periodic Reassessment with New Sequences]
This workflow begins with sequence acquisition from metagenomic surveillance or synthetic reconstruction [7]. If a close structural template exists, homology modeling or Rosetta is used; otherwise, deep learning models are employed [2]. The resulting structure is refined through short MD simulations to relieve steric clashes and optimize side-chain conformations [4, 6]. Receptor docking, using tools such as HADDOCK or ClusPro, predicts the binding mode between the spike RBD and host receptor orthologues [5]. Binding free energy calculations (MM/PBSA) then quantify the strength of interaction [4]. High predicted affinity suggests potential for cross-species transmission, warranting experimental validation [3, 5]. Conversely, low affinity indicates lower immediate risk, though continued surveillance is necessary because single mutations can alter binding [3, 7].
Applications to Emerging Zoonotic Coronaviruses
Bat Alphacoronaviruses and CEACAM6 Usage
Gallo et al. demonstrated that heart-nosed bat alphacoronaviruses use human CEACAM6 as an entry receptor [1]. Structural prediction of the spike RBD from these viruses, using AlphaFold2, revealed a binding interface distinct from ACE2-dependent coronaviruses [1]. Docking simulations predicted that the RBD interacts with the N-terminal domain of CEACAM6, and MM/PBSA calculations confirmed favorable binding energies [1]. This finding expands the known repertoire of coronavirus receptors and underscores the need for broad structural surveillance.
Bat Sarbecoviruses and ACE2-Independent Entry
Khaledian et al. combined laboratory experiments with computational network science to identify sequence determinants of human-cell entry in ACE2-independent bat sarbecoviruses [3]. They used Rosetta to model spike RBD variants and MD simulations to assess stability and receptor binding [3]. Their results indicated that a small number of amino acid substitutions in the RBD can confer the ability to use human ACE2 or alternative receptors [3]. This work highlights the utility of computational prediction in prioritizing variants for experimental testing.
HKU5-CoV-2 and Antiviral Targeting
Dubey et al. focused on the bat-derived merbecovirus HKU5-CoV-2, which is closely related to MERS-CoV [4]. They employed homology modeling to build the S1 subunit structure, followed by MD simulations and MM/PBSA calculations to screen FDA-approved drugs [4]. Several compounds, including lopinavir and remdesivir, were predicted to bind the S1 subunit with high affinity, potentially inhibiting viral entry [4]. This study exemplifies how structural prediction can accelerate antiviral discovery for emerging zoonotic threats.
Broad Host Range Prediction
Lam et al. used a combination of homology modeling and MD simulations to predict that the SARS-CoV-2 spike RBD can bind ACE2 orthologues from a wide range of mammalian species, including livestock, companion animals, and wildlife [5]. Their binding free energy calculations correlated with experimental pseudovirus entry data, validating the computational approach [5]. Such predictions inform veterinary surveillance and biosecurity measures.
Cross-Linking to Related Articles
Readers are encouraged to explore the following related articles on this portal for deeper context:
- Structural Prediction and Binding Dynamics of Zoonotic Spillover: Computational Modeling of Bat Coronavirus Spike-Receptor Interactions
- Computational Prediction of Zoonotic Spillover: Receptor-Binding Dynamics and Structural Modeling of Bat Coronavirus Spike Proteins
- Structural and Evolutionary Dynamics of Zoonotic Viral Glycoproteins: Integrating Molecular Modeling, Sequence Surveillance, and Receptor Binding Prediction
- Structural Prediction of Viral Envelope Glycoproteins Using AlphaFold2: Implications for Host Receptor Binding and Vaccine Design
- Computational Docking and Binding Affinity Prediction for Emerging Zoonotic Coronaviruses: From Spike Protein Dynamics to Host Receptor Interactions
- Molecular Dynamics Simulations of Coronavirus Spike Protein-ACE2 Interactions: Implications for Host Range and Zoonotic Potential
- Structural and Functional Annotation of Novel Bat Coronaviruses using AlphaFold2 and Molecular Docking
- Deep Learning-Driven Structural Prediction of Viral Envelope Glycoproteins: Implications for Receptor Binding and Antigenic Drift
- AlphaFold and Beyond: Deep Learning for Protein Structure Prediction in Veterinary Virology
Future Directions
The field is moving toward integration of multiple prediction methods into automated pipelines that can process metagenomic data in real time [2, 3]. Protein language models, such as ESM-2, are being fine-tuned on viral glycoprotein sequences to predict mutational effects on structure and binding [2]. Additionally, AlphaFold3 and related methods now support prediction of protein-ligand and protein-nucleic acid complexes, which may enable direct modeling of spike-receptor interactions without separate docking steps [2]. Experimental validation remains essential, but computational predictions can triage candidates for laboratory testing, reducing time and cost [3, 7]. The continued evolution of zoonotic coronaviruses, as demonstrated by synthetic reconstruction studies [7], underscores the need for robust, scalable structural prediction tools.
References
[1] Gallo G, Di Nardo A, Lugano D et al. Heart-nosed bat alphacoronaviruses use human CEACAM6 to enter cells. Nature. 2026. URL: https://pubmed.ncbi.nlm.nih.gov/42020746/
[2] Hills FR, Geoghegan JL, Bostina M. Architects of infection: A structural overview of SARS-related coronavirus spike glycoproteins. Virology. 2025. URL: https://pubmed.ncbi.nlm.nih.gov/39983449/
[3] Khaledian E, Ulusan S, Erickson J et al. Sequence determinants of human-cell entry identified in ACE2-independent bat sarbecoviruses: A combined laboratory and computational network science approach. EBioMedicine. 2022. URL: https://pubmed.ncbi.nlm.nih.gov/35405384/
[4] Dubey A, Kumar M, Tufail A. Inhibiting viral entry of bat-derived coronavirus HKU5-CoV-2: Targeting spike protein S1 subunit with FDA-approved antivirals-A structural dynamics and energetics study. Bioorg Chem. 2025. URL: https://pubmed.ncbi.nlm.nih.gov/40865231/
[5] Lam SD, Bordin N, Waman VP et al. SARS-CoV-2 spike protein predicted to form complexes with host receptor protein orthologues from a broad range of mammals. Sci Rep. 2020. URL: https://pubmed.ncbi.nlm.nih.gov/33020502/
[6] Bouback TA, Pokhrel S, Albeshri A et al. Pharmacophore-Based Virtual Screening, Quantum Mechanics Calculations, and Molecular Dynamics Simulation Approaches Identified Potential Natural Antiviral Drug Candidates against MERS-CoV S1-NTD. Molecules. 2021. URL: https://pubmed.ncbi.nlm.nih.gov/34443556/
[7] Sheahan T, Rockx B, Donaldson E et al. Pathways of cross-species transmission of synthetically reconstructed zoonotic severe acute respiratory syndrome coronavirus. J Virol. 2008. URL: https://pubmed.ncbi.nlm.nih.gov/18579604/ *** Disclaimer: This article is for educational and informational purposes only. It is not intended to substitute for professional veterinary advice, diagnosis, treatment, or regulatory guidance. Always consult a licensed veterinarian or qualified specialist regarding animal health, disease diagnosis, and therapeutic decisions.