What is Dr. Zubair Khalid's research focus?

Dr. Zubair Khalid specializes in molecular virology, mRNA vaccine development, and computational biology, with a focus on avian pathogens like IBDV and Avian Reovirus.

Where is Dr. Zubair Khalid currently working?

Dr. Zubair Khalid is a Postdoctoral Research Associate at the University of Maryland (UMD), specifically within the Department of Animal and Avian Sciences.

In Silico Prediction of Viral Glycoprotein Dynamics: Molecular Modeling and Free Energy Landscapes for Zoonotic Spillover Risk Assessment

Introduction

The emergence of zoonotic viruses from animal reservoirs into human populations represents a persistent global health threat. A critical determinant of cross-species transmission is the ability of viral glycoproteins to bind host cell receptors and mediate membrane fusion. These glycoproteins, such as the hemagglutinin (HA) of influenza A viruses, the spike (S) protein of coronaviruses, and the attachment glycoprotein (G) of henipaviruses, undergo substantial conformational changes during receptor engagement and entry. Understanding the biophysical basis of these dynamics is essential for predicting which viral strains possess the molecular prerequisites for spillover. In silico methods, including molecular dynamics (MD) simulations, free energy calculations, and machine learning-driven docking, provide a powerful framework for characterizing these processes at atomic resolution. This article reviews the computational workflow for building homology models from cryo-electron microscopy (cryo-EM) data, running MD simulations to sample metastable states, calculating binding free energies, and integrating phylogenetic data to estimate zoonotic risk. The focus is on veterinary and wildlife reservoir systems, with examples drawn from avian influenza, bat coronaviruses, and Nipah virus.

Structural Biology of Viral Glycoproteins

Viral glycoproteins are typically class I, class II, or class III fusion proteins that mediate attachment and entry into host cells. Class I fusion proteins, including influenza HA and coronavirus S, are characterized by a trimeric structure with a central coiled-coil domain that undergoes a large-scale refolding event to drive membrane fusion [1]. The receptor-binding domain (RBD) of these proteins is often the most variable region, as it must adapt to host-specific receptors. For influenza A viruses, the HA binds sialic acid receptors, with avian strains preferentially recognizing alpha-2,3-linked sialic acids and mammalian strains recognizing alpha-2,6-linked sialic acids [2]. For coronaviruses, the S protein RBD binds angiotensin-converting enzyme 2 (ACE2) or other receptors, with bat-derived viruses often showing broad receptor tropism [3]. Henipavirus G proteins bind ephrin-B2 and ephrin-B3 receptors, which are highly conserved across mammals, facilitating cross-species transmission [4].

The conformational flexibility of these glycoproteins is a key feature enabling immune evasion and host adaptation. Glycoproteins exist in equilibrium between pre-fusion and post-fusion states, with the pre-fusion state being metastable. Receptor binding or low pH triggers a cascade of conformational changes that expose the fusion peptide and drive membrane merger [5]. In silico methods are uniquely suited to capture these transient states and quantify the energetic barriers separating them.

Homology Modeling from Cryo-EM Data

The starting point for any computational study of glycoprotein dynamics is a reliable three-dimensional structure. Cryo-EM has become the method of choice for determining the structures of large, flexible glycoprotein complexes at near-atomic resolution. However, for many emerging viral strains, experimental structures are unavailable. Homology modeling, also known as comparative modeling, fills this gap by constructing a model of a target protein based on its sequence similarity to one or more template structures [6].

The workflow for homology modeling involves four steps: template identification, sequence alignment, model building, and model refinement. Template structures are typically retrieved from the Protein Data Bank (PDB). For viral glycoproteins, templates are often derived from related strains or species. For example, a model of a novel bat coronavirus S protein can be built using the structure of a closely related coronavirus S protein as a template. The accuracy of the model depends on the sequence identity between target and template, with identities above 30% generally yielding reliable models for backbone atoms [7].

Model building is performed using software such as MODELLER or Rosetta. These programs generate a set of models that satisfy spatial restraints derived from the alignment. The models are then evaluated using statistical potentials (e.g., DOPE score) and stereochemical checks (e.g., Ramachandran plots). For glycoproteins, glycosylation sites must be explicitly modeled, as glycans can influence protein conformation and receptor binding [8]. Tools such as GlyProt or GLYCAM can be used to attach glycan trees to asparagine residues.

Cryo-EM density maps, when available, can be used to guide model building through flexible fitting. Methods such as molecular dynamics flexible fitting (MDFF) allow the model to be deformed to match the experimental density while maintaining physically realistic bond lengths and angles [9]. This hybrid approach is particularly valuable for capturing the conformational heterogeneity of glycoproteins.

Molecular Dynamics Simulations of Metastable States

MD simulations provide a computational microscope for observing the time-dependent behavior of glycoproteins at atomic resolution. In a typical MD simulation, the protein is placed in a solvated box with explicit water molecules and ions, and the equations of motion are integrated using a force field such as AMBER, CHARMM, or GROMOS [10]. The simulation generates a trajectory of atomic positions over time, from which thermodynamic and kinetic properties can be extracted.

For viral glycoproteins, MD simulations are used to sample metastable states that are difficult to capture experimentally. These states include partially open conformations of the RBD, intermediate states along the fusion pathway, and conformations induced by mutations. The timescale of these conformational changes ranges from nanoseconds to microseconds, which is accessible to conventional MD simulations on high-performance computing clusters [11].

Enhanced sampling techniques are often required to overcome high energy barriers. Umbrella sampling, metadynamics, and replica exchange MD are commonly used to explore the free energy landscape of glycoprotein conformational changes [12]. For example, umbrella sampling can be applied to calculate the potential of mean force (PMF) along a reaction coordinate, such as the distance between the RBD and the core of the spike protein. This approach reveals the free energy barriers separating the closed (receptor-inaccessible) and open (receptor-accessible) states of the RBD.

The choice of force field is critical for accurate simulations. Recent developments in force fields, such as AMBER ff14SB and CHARMM36m, have improved the treatment of protein backbone and side chain conformations [13]. For glycoproteins, the GLYCAM06 force field is widely used for carbohydrate parameters. Simulations of membrane-bound glycoproteins require the inclusion of a lipid bilayer, which can be modeled using the CHARMM36 lipid force field [14]. The article Molecular Dynamics Simulations of Membrane-Bound Viral Glycoproteins provides further details on this topic.

Free Energy Calculations for Receptor Binding

The binding affinity between a viral glycoprotein and its host receptor is a key determinant of host tropism and spillover potential. Free energy calculations provide a rigorous way to compute binding affinities from MD simulations. Two widely used methods are the Molecular Mechanics/Poisson-Boltzmann Surface Area (MM/PBSA) approach and free energy perturbation (FEP).

MM/PBSA is an end-point method that estimates the binding free energy as the sum of molecular mechanics energies, solvation free energies (polar and nonpolar), and entropic contributions [15]. The method is computationally efficient and can be applied to a large number of mutants or receptor variants. For example, MM/PBSA has been used to calculate the relative binding affinities of avian influenza HA for avian and human sialic acid receptors, identifying mutations that enhance human receptor binding [16]. Similarly, MM/PBSA calculations on coronavirus S protein-ACE2 complexes have quantified the impact of RBD mutations on binding affinity [17].

FEP is a more rigorous alchemical method that computes the free energy difference between two states (e.g., wild-type and mutant) by gradually transforming one into the other. FEP calculations are computationally expensive but provide highly accurate relative binding free energies [18]. The method is particularly useful for predicting the effect of single amino acid substitutions on receptor binding. For henipavirus G proteins, FEP has been used to predict how mutations in the receptor-binding interface alter affinity for ephrin receptors [19].

Umbrella sampling, as mentioned earlier, can also be used to compute binding free energies by integrating the PMF along the dissociation pathway. This method provides a detailed picture of the binding mechanism, including the presence of intermediate states and the location of transition states [20]. The article Free Energy Perturbation Calculations in Drug Discovery offers additional context on these methodologies.

Machine Learning for Docking and Affinity Prediction

Machine learning (ML) has emerged as a powerful complement to physics-based methods for predicting glycoprotein-receptor interactions. Deep learning models, such as convolutional neural networks (CNNs) and graph neural networks (GNNs), can be trained on large datasets of protein-protein complexes to predict binding poses and affinities [21].

For docking, ML-based scoring functions have been developed that outperform classical scoring functions in terms of accuracy. These models learn the relationship between structural features (e.g., interatomic distances, electrostatic potentials) and binding affinity from experimental data. For example, the model AlphaFold2, while primarily designed for structure prediction, has been adapted for protein-protein docking and has shown remarkable success in predicting the structures of antibody-antigen complexes [22]. The article Structural Prediction of Viral Envelope Glycoproteins Using AlphaFold2: Implications for Host Receptor Binding and Vaccine Design discusses these applications in detail.

ML models can also predict the effect of mutations on binding affinity directly from sequence. Protein language models, such as ESM-1b and ProtBERT, generate embeddings that capture evolutionary and structural information. These embeddings can be used as input to a regression model to predict changes in binding free energy upon mutation [23]. This approach is particularly valuable for high-throughput screening of viral variants, as it does not require a pre-existing structure.

The integration of ML with MD simulations is an active area of research. ML potentials, which are neural networks trained to reproduce quantum mechanical energies, can be used to accelerate MD simulations while maintaining high accuracy [24]. These potentials are especially useful for studying glycoprotein dynamics, where accurate treatment of hydrogen bonding and electrostatic interactions is critical.

Integrating Phylogenetic Data for Spillover Risk Assessment

Computational predictions of glycoprotein dynamics and receptor binding must be placed in an evolutionary context to assess spillover risk. Phylogenetic analysis of viral sequences from animal reservoirs can identify lineages that are accumulating mutations in the glycoprotein that may enhance human receptor binding [25]. For example, surveillance of avian influenza viruses in wild birds and poultry has identified HA mutations that increase binding to human-type sialic acid receptors, such as the Q226L and G228S substitutions in H5N1 and H7N9 subtypes [26].

The integration of structural and phylogenetic data is achieved through ancestral sequence reconstruction and structural mapping. Ancestral sequences can be inferred using maximum likelihood or Bayesian methods, and their structures can be modeled using homology modeling. MD simulations and free energy calculations can then be applied to these ancestral proteins to determine when in evolutionary time the ability to bind human receptors emerged [27].

Machine learning models can also be trained to predict spillover risk directly from sequence data. Features such as the presence of specific amino acid motifs, the predicted stability of the glycoprotein, and the predicted binding affinity for human receptors can be combined into a risk score [28]. These models require careful validation against known spillover events and should be interpreted with caution, as many factors beyond glycoprotein dynamics contribute to zoonotic transmission.

The following Mermaid diagram illustrates the integrated workflow for in silico spillover risk assessment.

graph TD
    A[Viral Sequence from Reservoir], > B[Homology Modeling from Cryo-EM Templates]
    B, > C[MD Simulation of Glycoprotein Dynamics]
    C, > D[Free Energy Calculation MM/PBSA, FEP, Umbrella Sampling]
    D, > E[Predicted Receptor Binding Affinity]
    A, > F[Phylogenetic Analysis and Ancestral Reconstruction]
    F, > G[Structural Mapping of Mutations]
    G, > D
    E, > H[Machine Learning Risk Classifier]
    H, > I[Spillover Risk Assessment]
    I, > J[Surveillance Prioritization]

Databases and Tools

Several public databases and software tools are essential for the computational workflow described above. The Protein Data Bank (PDB) is the primary repository for experimentally determined structures of viral glycoproteins. The GISAID database provides a comprehensive collection of influenza and coronavirus genomic sequences, including metadata on host species and geographic origin [29]. For henipaviruses, sequences are available in GenBank.

Key software tools include GROMACS, AMBER, and NAMD for MD simulations. GROMACS is an open-source package that is widely used for high-performance simulations of biomolecular systems [30]. AMBER offers a suite of programs for MD simulations and free energy calculations, including the pmemd module for GPU-accelerated simulations. Rosetta is a versatile software suite for protein modeling, design, and docking, and is particularly useful for homology modeling and structure prediction [31].

For free energy calculations, the g_mmpbsa tool (part of the GROMACS distribution) can be used for MM/PBSA calculations. FEP calculations can be performed using the AMBER package with the TI (thermodynamic integration) or BAR (Bennett acceptance ratio) methods. Umbrella sampling can be implemented in GROMACS using the pull code.

Machine learning tools include the scikit-learn library for classical ML models and PyTorch or TensorFlow for deep learning. Pre-trained protein language models are available through the Hugging Face model hub.

Case Studies in Veterinary Virology

Avian Influenza Hemagglutinin

The hemagglutinin of avian influenza viruses is a classic example of a glycoprotein whose receptor-binding specificity determines host range. Computational studies have shown that the HA of H5N1 viruses requires only a few mutations to switch from avian-type to human-type receptor binding [32]. MD simulations of HA in complex with sialic acid analogs have revealed the role of specific residues, such as Gln226 and Gly228, in stabilizing the human receptor conformation. Free energy calculations using MM/PBSA have quantified the energetic penalty of binding the wrong receptor type and have predicted which mutations are most likely to enhance human receptor affinity [33]. The article Structural Dynamics of Avian Influenza Hemagglutinin: Molecular Modeling and Receptor Binding Predictions for Pandemic Risk Assessment provides a detailed discussion of this topic.

Bat Coronavirus Spike Protein

Bat coronaviruses are considered a major reservoir for emerging zoonotic coronaviruses, including SARS-CoV and SARS-CoV-2. The spike protein of bat coronaviruses must bind to ACE2 receptors from multiple species to facilitate spillover. MD simulations of bat coronavirus S protein-ACE2 complexes have shown that the RBD is highly flexible and can adopt multiple conformations that affect receptor binding [34]. Free energy calculations have identified key residues in the RBD that determine species specificity, such as Asn501 and Gln493 in SARS-CoV-2. These predictions have been validated by experimental binding assays and have informed surveillance efforts in bat populations [35]. The article Molecular Dynamics Simulations of Bat Coronavirus Spike Protein-Receptor Interactions: Implications for Zoonotic Risk Assessment covers this area in depth.

Nipah Virus Attachment Glycoprotein

Nipah virus, a henipavirus transmitted from fruit bats to pigs and humans, uses its attachment glycoprotein (G) to bind ephrin-B2 and ephrin-B3 receptors. The G protein undergoes a conformational change upon receptor binding that triggers fusion. MD simulations of the Nipah G protein have identified a hinge region that controls the opening of the receptor-binding site [36]. Mutations in this hinge region can alter the stability of the pre-fusion state and affect receptor binding affinity. Free energy calculations have been used to predict the impact of naturally occurring mutations on ephrin binding, providing a basis for risk assessment of novel henipavirus variants [37]. The article Nipah Virus: Pathogenesis, Transmission Dynamics, and Zoonotic Potential provides additional context on this pathogen.

Limitations and Future Directions

Despite the power of in silico methods, several limitations must be acknowledged. Force field inaccuracies can lead to systematic errors in free energy calculations, particularly for carbohydrate-protein interactions. Enhanced sampling methods, while powerful, are computationally expensive and may not fully converge for large glycoprotein complexes. Machine learning models are limited by the quality and quantity of training data, and their predictions may not generalize to novel viral lineages.

Future directions include the development of more accurate force fields for glycoproteins, the integration of coarse-grained and all-atom simulations to study larger systems, and the use of generative AI to design and test hypothetical viral variants. The combination of cryo-EM with MD simulations, known as integrative structural biology, will continue to provide unprecedented views of glycoprotein dynamics. The article AlphaFold and Beyond: Deep Learning for Protein Structure Prediction in Veterinary Virology discusses the impact of deep learning on this field.

Conclusion

In silico prediction of viral glycoprotein dynamics is a critical component of zoonotic spillover risk assessment. By combining homology modeling, MD simulations, free energy calculations, and machine learning, researchers can characterize the conformational landscapes of glycoproteins, quantify receptor binding affinities, and identify mutations that enhance cross-species transmission. These computational approaches, when integrated with phylogenetic surveillance and experimental validation, provide a powerful framework for anticipating and mitigating emerging zoonotic threats in veterinary medicine and public health.

References

[1] White, J. M., & Whittaker, G. R. (2016). Fusion of Enveloped Viruses in Endosomes. Traffic, 17(6), 593-614.

[2] Skehel, J. J., & Wiley, D. C. (2000). Receptor Binding and Membrane Fusion in Virus Entry: The Influenza Hemagglutinin. Annual Review of Biochemistry, 69, 531-569.

[3] Li, F. (2016). Structure, Function, and Evolution of Coronavirus Spike Proteins. Annual Review of Virology, 3(1), 237-261.

[4] Bowden, T. A., et al. (2008). Structural Basis of Nipah and Hendra Virus Attachment to Their Cell-Surface Receptor Ephrin-B2. Nature Structural & Molecular Biology, 15(6), 567-572.

[5] Harrison, S. C. (2008). Viral Membrane Fusion. Nature Structural & Molecular Biology, 15(7), 690-698.

[6] Marti-Renom, M. A., et al. (2000). Comparative Protein Structure Modeling of Genes and Genomes. Annual Review of Biophysics and Biomolecular Structure, 29, 291-325.

[7] Baker, D., & Sali, A. (2001). Protein Structure Prediction and Structural Genomics. Science, 294(5540), 93-96.

[8] Woods, R. J. (2018). Predicting the Structures of Glycans, Glycoproteins, and Their Complexes. Chemical Reviews, 118(17), 8005-8024.

[9] Trabuco, L. G., et al. (2008). Flexible Fitting of Atomic Structures into Electron Microscopy Maps Using Molecular Dynamics. Structure, 16(5), 673-683.

[10] Karplus, M., & McCammon, J. A. (2002). Molecular Dynamics Simulations of Biomolecules. Nature Structural Biology, 9(9), 646-652.

[11] Shaw, D. E., et al. (2010). Atomic-Level Characterization of the Structural Dynamics of Proteins. Science, 330(6002), 341-346.

[12] Torrie, G. M., & Valleau, J. P. (1977). Nonphysical Sampling Distributions in Monte Carlo Free-Energy Estimation: Umbrella Sampling. Journal of Computational Physics, 23(2), 187-199.

[13] Maier, J. A., et al. (2015). ff14SB: Improving the Accuracy of Protein Side Chain and Backbone Parameters from ff99SB. Journal of Chemical Theory and Computation, 11(8), 3696-3713.

[14] Klauda, J. B., et al. (2010). Update of the CHARMM All-Atom Additive Force Field for Lipids: Validation on Six Lipid Types. Journal of Physical Chemistry B, 114(23), 7830-7843.

[15] Kollman, P. A., et al. (2000). Calculating Structures and Free Energies of Complex Molecules: Combining Molecular Mechanics and Continuum Models. Accounts of Chemical Research, 33(12), 889-897.

[16] Xu, D., et al. (2012). Computational Prediction of Host-Specific Receptor Binding of Influenza A Virus Hemagglutinin. Journal of Virology, 86(15), 8102-8110.

[17] Starr, T. N., et al. (2020). Deep Mutational Scanning of SARS-CoV-2 Receptor Binding Domain Reveals Constraints on Folding and ACE2 Binding. Cell, 182(5), 1295-1310.

[18] Chipot, C., & Pohorille, A. (2007). Free Energy Calculations: Theory and Applications in Chemistry and Biology. Springer.

[19] Guillaume, V., et al. (2009). Evidence of a Potential Receptor-Binding Site for Hendra Virus on the Nipah Virus Attachment Glycoprotein. Journal of Virology, 83(22), 11628-11636.

[20] Roux, B. (1995). The Calculation of the Potential of Mean Force Using Computer Simulations. Computer Physics Communications, 91(1-3), 275-282.

[21] Jumper, J., et al. (2021). Highly Accurate Protein Structure Prediction with AlphaFold. Nature, 596(7873), 583-589.

[22] Evans, R., et al. (2021). Protein Complex Prediction with AlphaFold-Multimer. bioRxiv.

[23] Rives, A., et al. (2021). Biological Structure and Function Emerge from Scaling Unsupervised Learning to 250 Million Protein Sequences. Proceedings of the National Academy of Sciences, 118(15), e2016239118.

[24] Noé, F., et al. (2020). Machine Learning for Molecular Simulation. Annual Review of Physical Chemistry, 71, 361-390.

[25] Holmes, E. C., & Rambaut, A. (2004). Viral Evolution and the Emergence of SARS. Philosophical Transactions of the Royal Society B, 359(1447), 1059-1065.

[26] Stevens, J., et al. (2006). Structure and Receptor Specificity of the Hemagglutinin from an H5N1 Influenza Virus. Science, 312(5772), 404-410.

[27] Meyer, A. G., & Wilke, C. O. (2015). The Utility of Protein Structure as a Prediction Tool for the Emergence of Pandemic Influenza Strains. Journal of Virology, 89(5), 2781-2790.

[28] Mollentze, N., et al. (2021). Identifying and Prioritizing Emerging Zoonotic Viruses Using Machine Learning. Nature Communications, 12, 5326.

[29] Shu, Y., & McCauley, J. (2017). GISAID: Global Initiative on Sharing All Influenza Data from Vision to Reality. Eurosurveillance, 22(13), 30494.

[30] Abraham, M. J., et al. (2015). GROMACS: High Performance Molecular Simulations through Multi-Level Parallelism from Laptops to Supercomputers. SoftwareX, 1-2, 19-25.

[31] Leaver-Fay, A., et al. (2011). ROSETTA3: An Object-Oriented Software Suite for the Simulation and Design of Macromolecules. Methods in Enzymology, 487, 545-574.

[32] Imai, M., et al. (2012). Experimental Adaptation of an Influenza H5 HA Confers Respiratory Droplet Transmission to a Reassortant H5 HA/H1N1 Virus in Ferrets. Nature, 486(7403), 420-428.

[33] Linster, M., et al. (2014). Identification, Characterization, and Natural Selection of Mutations Driving Airborne Transmission of A/H5N1 Virus. Cell, 157(2), 329-339.

[34] Letko, M., et al. (2020). Functional Assessment of Cell Entry and Receptor Usage for SARS-CoV-2 and Other Lineage B Betacoronaviruses. Nature Microbiology, 5(4), 562-569.

[35] Zhou, P., et al. (2020). A Pneumonia Outbreak Associated with a New Coronavirus of Probable Bat Origin. Nature, 579(7798), 270-273.

[36] Wong, J. J., et al. (2017). A Hinge Region of the Nipah Virus Attachment Glycoprotein Regulates Receptor-Induced Conformational Changes. Journal of Virology, 91(10), e00187-17.

[37] Pernet, O., et al. (2012). Evidence for Henipavirus Spillover into Human Populations in Africa. Nature Communications, 3, 796. *** Disclaimer: This article is for educational and informational purposes only. It is not intended to substitute for professional veterinary advice, diagnosis, treatment, or regulatory guidance. Always consult a licensed veterinarian or qualified specialist regarding animal health, disease diagnosis, and therapeutic decisions.