What is Dr. Zubair Khalid's research focus?

Dr. Zubair Khalid specializes in molecular virology, mRNA vaccine development, and computational biology, with a focus on avian pathogens like IBDV and Avian Reovirus.

Where is Dr. Zubair Khalid currently working?

Dr. Zubair Khalid is a Postdoctoral Research Associate at the University of Maryland (UMD), specifically within the Department of Animal and Avian Sciences.

Deep Learning for Predicting Antiviral Resistance Mutations in Influenza Neuraminidase

Introduction

Influenza A viruses (IAV) circulate widely in avian and swine populations, causing significant economic losses in poultry production and posing zoonotic risks. Antiviral therapy, primarily through neuraminidase inhibitors (NAIs) such as oseltamivir and zanamivir, remains a critical intervention strategy in veterinary settings. The emergence of resistance mutations in the neuraminidase (NA) glycoprotein undermines treatment efficacy and complicates outbreak control. Computational prediction of these resistance mutations using deep learning offers a proactive approach to surveillance and drug design. This article reviews the biological mechanisms of NAI resistance, the structural biology of the NA active site, and the deep learning architectures employed to predict resistance-conferring mutations from sequence and structural data.

Biological Context of Neuraminidase Inhibitor Resistance

Neuraminidase Structure and Function

The NA protein is a homotetrameric surface glycoprotein responsible for cleaving sialic acid residues from host cell receptors, facilitating viral release from infected cells. Each monomer comprises a cytoplasmic tail, a transmembrane domain, a stalk region, and a globular catalytic head domain. The active site is a highly conserved pocket lined by eight functionally critical residues (R118, D151, R152, R224, E276, R292, R371, and Y406; N2 numbering) that coordinate substrate binding and catalysis. Mutations in or near this pocket can reduce inhibitor binding affinity while preserving enzymatic activity.

Mechanisms of NAI Resistance

NAIs are competitive inhibitors that mimic the transition state of sialic acid cleavage. Oseltamivir carboxylate binds within the active site, inducing a conformational change in the E276 side chain to accommodate the bulky pentyl ether group. Resistance mutations typically fall into two categories: direct steric interference with inhibitor binding and allosteric modulation of active site geometry. The H274Y substitution (N2 numbering) is the most clinically relevant resistance mutation in N1 subtype viruses, introducing a bulky tyrosine side chain that displaces the E276 carboxylate and disrupts oseltamivir binding. Other notable mutations include N294S, which alters the positioning of the catalytic triad, and E119V, which reduces hydrophobic contacts with the inhibitor.

Veterinary Relevance

In poultry, highly pathogenic avian influenza (HPAI) H5N1 and H7N9 subtypes have demonstrated the capacity to acquire NAI resistance mutations under drug selection pressure. Swine influenza viruses, particularly H1N1 and H3N2 subtypes, also circulate in pig populations where antiviral use may be employed during outbreaks. Surveillance of NA sequences from veterinary sources is therefore essential for monitoring the emergence and spread of resistance determinants.

Deep Learning Architectures for Resistance Prediction

Sequence-Based Models

Convolutional neural networks (CNNs) and recurrent neural networks (RNNs) have been applied to predict resistance phenotypes from NA amino acid sequences. One-hot encoded sequences are fed into convolutional layers that learn position-specific substitution patterns associated with reduced inhibitor susceptibility. Long short-term memory (LSTM) networks capture long-range dependencies within the sequence, modeling epistatic interactions between distal residues that collectively modulate resistance. These models are trained on labeled datasets where NA sequences are paired with phenotypic IC50 fold-change values determined by neuraminidase inhibition assays.

Structure-Based Models

Graph neural networks (GNNs) operate on three-dimensional protein structures represented as graphs, where nodes correspond to residues and edges encode spatial proximity or interatomic contacts. For NA, the active site and surrounding loops are represented as a graph, and the network learns to predict the effect of point mutations on inhibitor binding energy. Geometric deep learning approaches, such as equivariant neural networks, respect the rotational and translational symmetries of molecular structures, improving generalization to unseen mutations.

Hybrid Sequence-Structure Approaches

Fusion models combine sequence embeddings from transformer architectures (e.g., ESM-1b or ProtBERT) with structural features derived from AlphaFold-predicted or crystallographic NA models. The sequence branch captures evolutionary information from multiple sequence alignments, while the structural branch provides biophysical context. Attention mechanisms allow the model to focus on residues that form direct contacts with the inhibitor or participate in catalytic function.

Training Data and Labeling

Training deep learning models for resistance prediction requires large, curated datasets of NA sequences with associated phenotypic data. Public repositories such as the Influenza Research Database and the Global Initiative on Sharing All Influenza Data (GISAID) provide sequence records, but phenotypic data are scarcer. Deep mutational scanning (DMS) experiments, in which libraries of NA variants are generated and assayed for inhibitor susceptibility, offer high-throughput labeled data. Transfer learning from large protein language models pretrained on millions of sequences can mitigate the limited availability of resistance-specific labels.

Molecular Dynamics and Structural Analysis

Free Energy Perturbation

Molecular dynamics (MD) simulations combined with free energy perturbation (FEP) calculations provide a physics-based complement to deep learning predictions. By simulating the NA-inhibitor complex with and without a mutation, the change in binding free energy (ΔΔG) can be estimated. These calculations capture subtle conformational rearrangements and solvent effects that sequence-based models may miss. Deep learning models can be trained to approximate FEP results, enabling rapid screening of thousands of mutations.

Structural Features as Inputs

Key structural features used as inputs to deep learning models include residue depth, solvent accessibility, backbone dihedral angles, and contact maps. For NA, the flexibility of the 150-loop (residues 147-152) and the 430-loop (residues 429-433) influences inhibitor binding. Mutations that alter loop dynamics can be detected through MD-derived fluctuation profiles and incorporated into predictive models.

Integration with 3D Protein Viewer

Predicted resistance mutations can be visualized on the NA three-dimensional structure using interactive molecular viewers. The Computational Visualization of Single-Point Mutations on Protein 3D Structures article describes methods for mapping mutation sites onto protein surfaces and highlighting changes in electrostatic potential or steric clashes. Linking prediction outputs to a 3D viewer allows researchers to inspect the structural context of high-risk mutations.

Workflow for Predicting Resistance Mutations

The following Mermaid diagram illustrates a typical computational pipeline for predicting NAI resistance mutations in influenza NA.

flowchart TD
    A[Sequence Database], > B[Multiple Sequence Alignment]
    B, > C[Feature Extraction]
    C, > D[Deep Learning Model]
    D, > E[Resistance Score]
    F[NA Crystal Structure], > G[Graph Construction]
    G, > H[Graph Neural Network]
    H, > E
    I[MD Simulations], > J[Free Energy Perturbation]
    J, > K[ΔΔG Calculation]
    K, > L[Validation]
    E, > L
    L, > M[High-Risk Mutation List]
    M, > N[3D Visualization]

The pipeline integrates sequence-based and structure-based deep learning models with molecular dynamics validation. High-risk mutations are prioritized for experimental confirmation and surveillance monitoring.

Model Evaluation and Benchmarking

Performance Metrics

Binary classification models predicting resistance (yes/no) are evaluated using accuracy, precision, recall, F1-score, and area under the receiver operating characteristic curve (AUC-ROC). Regression models predicting IC50 fold-change are assessed using Pearson correlation coefficient and root mean square error. Cross-validation at the sequence or subtype level prevents data leakage and provides realistic performance estimates.

Benchmark Datasets

Standardized benchmark datasets for NAI resistance prediction include the NA-H274Y mutation panel and comprehensive DMS libraries for N1 and N2 subtypes. Models are compared against baseline methods such as random forests, support vector machines, and position-specific scoring matrices. Deep learning models generally outperform these baselines when sufficient training data are available, particularly for mutations with epistatic interactions.

Generalization Across Subtypes

A critical challenge is generalizing predictions across NA subtypes (N1, N2, N3, etc.) due to sequence divergence and structural differences in loop regions. Transfer learning and multi-task learning, where the model is trained on multiple subtypes simultaneously, improve cross-subtype performance. Zero-shot prediction using protein language models pretrained on diverse sequences also shows promise for novel subtypes.

Surveillance Applications

Genomic Surveillance in Poultry

Deep learning models can be deployed in surveillance pipelines to scan newly sequenced NA genes from avian influenza isolates for resistance mutations. The Avian Influenza A Virus in Poultry: Clinical Signs and Surveillance article outlines field sampling and molecular detection methods. Integrating resistance prediction with routine surveillance enables early detection of emerging resistant variants.

Swine Influenza Monitoring

In swine populations, NA sequences from H1N1 and H3N2 subtypes can be analyzed using the same computational frameworks. The Swine Influenza A Virus article provides background on epidemiology and clinical presentation. Resistance prediction in swine is particularly important given the potential for reassortment and zoonotic transmission.

Linking to Variant Calling Pipelines

Resistance prediction models can be integrated into variant calling pipelines that identify single nucleotide polymorphisms (SNPs) in NA genes. The Variant Calling Pipelines: GATK Best Practices, FreeBayes, and DeepVariant Comparison article describes the computational tools used to call variants from high-throughput sequencing data. Predicted resistance mutations are flagged for further investigation.

Limitations and Future Directions

Data Scarcity

The primary limitation of deep learning approaches is the scarcity of high-quality phenotypic data for NA variants. Most publicly available datasets are biased toward well-studied mutations like H274Y, limiting model performance on rare or novel substitutions. Active learning strategies, where the model selects informative mutations for experimental testing, can reduce the data burden.

Interpretability

Deep learning models are often criticized as black boxes. Attention mechanisms and saliency maps provide some interpretability by highlighting residues that contribute most to resistance predictions. However, these methods do not guarantee causal understanding. Integrating physics-based features from MD simulations can improve interpretability by grounding predictions in biophysical principles.

Epistasis and Fitness Landscapes

Resistance mutations do not occur in isolation; they are subject to epistatic interactions with other residues in NA and with mutations in hemagglutinin (HA) that affect receptor binding. Deep learning models that jointly predict resistance and viral fitness are an active area of research. The Predicting Vaccine Escape Mutations Using Structure-Based Deep Learning article discusses similar challenges in the context of antibody escape.

Conclusion

Deep learning provides a powerful framework for predicting antiviral resistance mutations in influenza neuraminidase, combining sequence, structural, and dynamic information. These models enable proactive surveillance in veterinary populations, supporting the rational use of NAIs in poultry and swine. Continued advances in protein language models, geometric deep learning, and integration with molecular dynamics will further improve prediction accuracy and generalizability. Linking predictions to three-dimensional structural visualization enhances biological interpretation and facilitates communication with field veterinarians and diagnostic laboratories.

References

Russell RJ, Haire LF, Stevens DJ, Collins PJ, Lin YP, Blackburn GM, Hay AJ, Gamblin SJ, Skehel JJ. The structure of H5N1 avian influenza neuraminidase suggests new opportunities for drug design. Nature. 2006;443(7107):45-49.
Collins PJ, Haire LF, Lin YP, Liu J, Russell RJ, Walker PA, Skehel JJ, Martin SR, Hay AJ, Gamblin SJ. Crystal structures of oseltamivir-resistant influenza virus neuraminidase mutants. Nature. 2008;453(7199):1258-1261.
Hurt AC, Holien JK, Parker MW, Barr IG. Oseltamivir resistance and the H274Y neuraminidase mutation in seasonal, pandemic and highly pathogenic influenza viruses. Drugs. 2009;69(18):2523-2531.
Bloom JD, Gong LI, Baltimore D. Permissive secondary mutations enable the evolution of influenza oseltamivir resistance. Science. 2010;328(5983):1272-1275.
Doud MB, Bloom JD. Accurate measurement of the effects of all amino-acid mutations on influenza hemagglutinin. Viruses. 2016;8(6):155.
Lee JM, Huddleston J, Doud MB, Hooper KA, Wu NC, Bedford T, Bloom JD. Deep mutational scanning of hemagglutinin helps predict evolutionary fates of human H3N2 influenza variants. Proc Natl Acad Sci USA. 2018;115(34):E8276-E8285.
Hie B, Zhong ED, Berger B, Bryson B. Learning the language of viral evolution and escape. Science. 2021;371(6526):284-288.
Riesselman AJ, Ingraham JB, Marks DS. Deep generative models of genetic variation capture the effects of mutations. Nat Methods. 2018;15(10):816-822.
Frazer J, Notin P, Dias M, Gomez A, Min JK, Brock K, Gal Y, Marks DS. Disease variant prediction with deep generative models of evolutionary data. Nature. 2021;599(7883):91-95.
Jumper J, Evans R, Pritzel A, Green T, Figurnov M, Ronneberger O, Tunyasuvunakool K, Bates R, Zidek A, Potapenko A, et al. Highly accurate protein structure prediction with AlphaFold. Nature. 2021;596(7873):583-589.
Senior AW, Evans R, Jumper J, Kirkpatrick J, Sifre L, Green T, Qin C, Zidek A, Nelson AWR, Bridgland A, et al. Improved protein structure prediction using potentials from deep learning. Nature. 2020;577(7792):706-710.
Rives A, Meier J, Sercu T, Goyal S, Lin Z, Liu J, Guo D, Ott M, Zitnick CL, Ma J, et al. Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences. Proc Natl Acad Sci USA. 2021;118(15):e2016239118.
Elnaggar A, Heinzinger M, Dallago C, Rehawi G, Wang Y, Jones L, Gibbs T, Feher T, Angerer C, Steinegger M, et al. ProtTrans: toward understanding the language of life through self-supervised learning. IEEE Trans Pattern Anal Mach Intell. 2022;44(10):7112-7127.
Satorras VG, Hoogeboom E, Welling M. E(n) equivariant graph neural networks. Proc Mach Learn Res. 2021;139:9323-9332.
Jing B, Eismann S, Suriana P, Townshend RJL, Dror RO. Learning from protein structure with geometric vector perceptrons. Proc Int Conf Learn Represent. 2021.
Ingraham J, Garg V, Barzilay R, Jaakkola T. Generative models for graph-based protein design. Adv Neural Inf Process Syst. 2019;32:15820-15831.
Wang M, Tai C, Weinan E, Wei L. DeFine: deep convolutional neural networks for protein function prediction. Bioinformatics. 2018;34(13):i425-i433.
AlQuraishi M. End-to-end differentiable learning of protein structure. Cell Syst. 2019;8(4):292-301.
Bepler T, Berger B. Learning the protein language: evolution, structure, and function. Cell Syst. 2021;12(6):654-669.
Rao R, Bhattacharya N, Thomas N, Duan Y, Chen X, Canny J, Abbeel P, Song YS. Evaluating protein transfer learning with TAPE. Adv Neural Inf Process Syst. 2019;32:9689-9701.
Alley EC, Khimulya G, Biswas S, AlQuraishi M, Church GM. Unified rational protein engineering with sequence-based deep representation learning. Nat Methods. 2019;16(12):1315-1322.
Heinzinger M, Elnaggar A, Wang Y, Dallago C, Nechaev D, Matthes F, Rost B. Modeling aspects of the language of life through transfer-learning protein sequences. BMC Bioinformatics. 2019;20(1):723.
Strodthoff N, Wagner P, Wenzel M, Samek W. UDSMProt: universal deep sequence models for protein classification. Bioinformatics. 2020;36(8):2401-2409.
Madani A, McCann B, Naik N, Keskar NS, Anand N, Eguchi RR, Huang PS, Socher R. ProGen: language modeling for protein generation. arXiv. 2020;2004.03497.
Shin JE, Riesselman AJ, Kollasch AW, McMahon C, Simon E, Sander C, Manglik A, Kruse AC, Marks DS. Protein design and variant prediction using autoregressive generative models. Nat Commun. 2021;12(1):2403.
Hsu C, Nisonoff H, Fannjiang C, Listgarten J. Learning protein fitness models from evolutionary and assay-labeled data. Nat Biotechnol. 2022;40(7):1114-1122.
Biswas S, Khimulya G, Alley EC, Esvelt KM, Church GM. Low-N protein engineering with data-efficient deep learning. Nat Methods. 2021;18(4):389-396.
Yang KK, Wu Z, Arnold FH. Machine-learning-guided directed evolution for protein engineering. Nat Methods. 2019;16(8):687-694.
Wu Z, Johnston KE, Arnold FH, Yang KK. Protein sequence design with deep generative models. Curr Opin Chem Biol. 2021;65:18-27.
Repecka D, Jauniskis V, Karpus L, Rembeza E, Rokaitis I, Zrimec J, Poviloniene S, Laurynenas A, Viknander S, Abuajwa W, et al. Expanding functional protein sequence spaces using generative adversarial networks. Nat Mach Intell. 2021;3(4):324-333.
Hawkins-Hooker A, Depardieu F, Baur S, Couairon G, Chen A, Bikard D. Generating functional protein variants with variational autoencoders. PLoS Comput Biol. 2021;17(2):e1008736.
Brookes DH, Park H, Listgarten J. Conditioning by adaptive sampling for robust design. Proc Mach Learn Res. 2019;97:773-782.
Sinai S, Kelsic E, Church GM, Nowak MA. Variational auto-encoding of protein sequences. arXiv. 2017;1712.03346.
Greener JG, Moffat L, Jones DT. Design of metalloproteins and novel protein folds using variational autoencoders. Sci Rep. 2018;8(1):16189.
Sanyal S, Anishchenko I, Dagar A, Baker D, Talukdar P. ProteinGCN: protein model quality assessment using graph convolutional networks. bioRxiv. 2020.
Baldassarre F, Menendez Hurtado D, Elofsson A, Azizpour H. GraphQA: protein model quality assessment using graph convolutional networks. Bioinformatics. 2021;37(3):360-366.
Chen L, Li Z, Zhang J, Jiang S, Wang L, Zhang Y, Xu D. Predicting protein-ligand binding affinity with equivariant graph neural networks. J Chem Inf Model. 2022;62(10):2401-2412.
Gainza P, Sverrisson F, Monti F, Rodola E, Boscaini D, Bronstein MM, Correia BE. Deciphering interaction fingerprints from protein molecular surfaces using geometric deep learning. Nat Methods. 2020;17(2):184-192.
Townshend RJL, Vogele M, Suriana P, Derry A, Powers A, Liu Y, Obermeyer F, Kortemme T, Dror RO. End-to-end learning on 3D protein structure for interface prediction. Adv Neural Inf Process Syst. 2019;32:15642-15651.
Fout A, Byrd J, Shariat B, Ben-Hur A. Protein interface prediction using graph convolutional networks. Adv Neural Inf Process Syst. 2017;30:6530-6539.
Sanchez-Garcia R, Sorzano COS, Carazo JM, Segura J. BIPSPI: a method for the prediction of partner-specific protein-protein interfaces. Bioinformatics. 2019;35(3):470-477.
Zeng H, Edwards MD, Liu G, Gifford DK. Convolutional neural network architectures for predicting DNA-protein binding. Bioinformatics. 2016;32(12):i121-i127.
Zhou J, Troyanskaya OG. Predicting effects of noncoding variants with deep learning-based sequence model. Nat Methods. 2015;12(10):931-934.
Quang D, Chen Y, Xie X. DANN: a deep learning approach for annotating the pathogenicity of genetic variants. Bioinformatics. 2015;31(5):761-763.
Sundaram L, Gao H, Padigepati SR, McRae JF, Li Y, Kosmicki JA, Fritzilas N, Hakenberg J, Dutta A, Shon J, et al. Predicting the clinical impact of human mutation with deep neural networks. Nat Genet. 2018;50(8):1161-1170.
Rentzsch P, Witten D, Cooper GM, Shendure J, Kircher M. CADD: predicting the deleteriousness of variants throughout the human genome. Nucleic Acids Res. 2019;47(D1):D886-D894.
Kircher M, Witten DM, Jain P, O'Roak BJ, Cooper GM, Shendure J. A general framework for estimating the relative pathogenicity of human genetic variants. Nat Genet. 2014;46(3):310-315.
Adzhubei IA, Schmidt S, Peshkin L, Ramensky VE, Gerasimova A, Bork P, Kondrashov AS, Sunyaev SR. A method and server for predicting damaging missense mutations. Nat Methods. 2010;7(4):248-249.
Ng PC, Henikoff S. SIFT: predicting amino acid changes that affect protein function. Nucleic Acids Res. 2003;31(13):3812-3814.
Kumar P, Henikoff S, Ng PC. Predicting the effects of coding non-synonymous variants on protein function using the SIFT algorithm. Nat Protoc. 2009;4(7):1073-1081.

Disclaimer: This article is for educational and informational purposes only. It is not intended to substitute for professional veterinary advice, diagnosis, treatment, or regulatory guidance. Always consult a licensed veterinarian or qualified specialist regarding animal health, disease diagnosis, and therapeutic decisions.