What is Dr. Zubair Khalid's research focus?

Dr. Zubair Khalid specializes in molecular virology, mRNA vaccine development, and computational biology, with a focus on avian pathogens like IBDV and Avian Reovirus.

Where is Dr. Zubair Khalid currently working?

Dr. Zubair Khalid is a Postdoctoral Research Associate at the University of Maryland (UMD), specifically within the Department of Animal and Avian Sciences.

Neoantigen Prediction Algorithms in Cancer Immunotherapy

Introduction

Neoantigens are tumor-specific peptide sequences that arise from somatic mutations, aberrant splicing, post-translational modifications, or viral integration events and are presented on major histocompatibility complex (MHC) molecules to T cells. The computational identification of neoantigens is a cornerstone of personalized cancer immunotherapy, enabling the rational design of vaccines, adoptive T cell therapies, and immune checkpoint modulator combinations. In veterinary oncology, spontaneous tumors in companion animals such as dogs and cats provide immunologically relevant models for human disease, yet the predictive algorithms must account for species-specific MHC polymorphisms, peptide length preferences, and binding motif architectures. This review examines the algorithmic landscape of neoantigen prediction, from classical binding affinity models to contemporary deep learning and multi-task architectures, with a focus on computational methods applicable to veterinary and comparative oncology.

Biological Basis of Neoantigen Recognition

The adaptive immune system distinguishes self from non-self through T cell receptor (TCR) recognition of peptides bound to MHC molecules. Neoantigens arise from non-synonymous somatic mutations (missense, frameshift, insertion, deletion, splice-site alterations) that generate novel peptide sequences not present in the germline reference proteome. The immunogenicity of a neoantigen is governed by a multi-step cascade: proteasomal cleavage of the source protein, transporter associated with antigen processing (TAP) translocation into the endoplasmic reticulum, MHC binding affinity, peptide-MHC (pMHC) stability, and TCR engagement. Each step introduces a selective bottleneck that computational algorithms aim to model.

In species relevant to veterinary medicine, MHC genes are termed dog leukocyte antigen (DLA), feline leukocyte antigen (FLA), and bovine leukocyte antigen (BoLA) in cattle. The peptide-binding grooves of these molecules exhibit distinct anchor residue preferences and structural plasticity. For example, DLA class I molecules predominantly bind 9-mer peptides with hydrophobic C-terminal anchors, whereas BoLA class II molecules accommodate peptides of 13 to 25 residues with core binding registers of 9 amino acids. Algorithms trained on human leukocyte antigen (HLA) data often require transfer learning or retraining to achieve acceptable performance on veterinary MHC alleles [1, 2].

Algorithmic Taxonomy

Neoantigen prediction algorithms can be categorized by their modeling approach, input features, and prediction target. Table 1 provides a comparative overview of major algorithmic categories.

Table 1. Categories of Neoantigen Prediction Algorithms

Algorithm Class	Input Features	Prediction Target	Example Methods
Binding affinity predictors	Peptide sequence, MHC allele type	IC50 or rank-based binding score	NetMHCpan, MHCnuggets [3], MHCRoBERTa [2]
Immunogenicity predictors	Peptide sequence, pMHC binding features, proteasomal cleavage	Binary immunogenic/non-immunogenic label	DeepImmuno [4], NeoTImmuML [5], CNNeoPP [6]
Multi-task models	Peptide sequence, MHC allele, RNA expression, TCR repertoire	Joint prediction of presentation and immunogenicity	NeoMUST [7], ENCAP [8]
Structure-based predictors	Peptide and MHC 3D coordinates	Binding energy and pMHC stability	DeepNetBim [9], MHC2-SCALE [10]
Pan-specific predictors	Peptide sequence, pseudo-sequence of MHC binding groove	Cross-allele binding prediction	MHCRoBERTa [2], NetMHCpan

Binding Affinity Prediction

The foundational step in neoantigen discovery is predicting whether a mutated peptide will bind to a specific MHC molecule. Binding affinity is typically measured as the half-maximal inhibitory concentration (IC50) in competition binding assays, with a threshold of 500 nM or 5000 nM commonly used for class I and class II respectively. Early methods employed position-specific scoring matrices (PSSMs) and support vector machines (SVMs), but current state-of-the-art approaches use deep neural networks trained on large immunopeptidomic datasets.

The pan-specific binding predictor MHCRoBERTa [2] uses a transformer architecture pre-trained on unlabeled protein sequences via a masked language modeling objective, then fine-tuned on peptide-MHC binding data. This transfer learning strategy enables the model to generalize to alleles with limited or no training data, which is critical for veterinary species with poorly characterized MHC diversity. The MHCnuggets [3] approach uses a recurrent neural network (RNN) with an allele-specific embedding layer, demonstrating competitive performance across both class I and class II alleles. Glynn and colleagues [1] addressed the issue of inequitable prediction performance across MHC alleles, showing that models trained on over-represented human alleles perform poorly on rare alleles, and proposed a reweighting strategy to improve cross-allele generalization.

Immunogenicity Scoring

Binding affinity alone is insufficient to predict immunogenicity because many high-affinity pMHC complexes fail to elicit T cell responses. Immunogenicity predictors integrate additional features such as peptide sequence entropy, hydrophobicity, TCR contact residue variability, and dissimilarity to the self-proteome.

DeepImmuno [4] employs a deep learning framework that combines peptide embedding, MHC allele encoding, and a feed-forward neural network to predict immunogenicity from mass spectrometry-eluted ligand data. The model also incorporates a generative component to produce novel immunogenic peptide sequences. NeoTImmuML [5] uses an ensemble of machine learning classifiers (random forest, gradient boosting, and SVM) with features derived from amino acid indices, hydrophobicity scales, and secondary structure propensities. The study demonstrated that features related to peptide flexibility and beta-turn propensity were among the most informative for immunogenicity discrimination.

CNNeoPP [6] integrates a large language model (LLM) with a convolutional neural network (CNN) architecture for personalized neoantigen prediction. The LLM component generates contextualized peptide representations by modeling local sequence interactions, while the CNN refines binding pocket compatibility. This pipeline also supports liquid biopsy applications by predicting neoantigens from circulating tumor DNA sequencing data.

Multi-Task and Joint Prediction Models

Recognizing that neoantigen presentation and immunogenicity are interdependent processes, several methods adopt multi-task learning frameworks. NeoMUST [7] is a multi-task model that simultaneously predicts peptide-MHC binding, TAP transport efficiency, and proteasomal cleavage probability. The joint optimization enforces consistency across these biophysical steps, yielding higher precision in immunogenic neoantigen identification compared to single-task models.

ENCAP [8] uses ensemble classifiers with diverse sequence features, including evolutionary conservation scores, disorder propensities, and post-translational modification sites. By combining multiple weak learners, ENCAP achieves robust performance across cancer types and mutation classes. The Sa-TTCA method [11] extracts features from both biological sequence encoding and natural language processing (NLP) embeddings, then classifies tumor T cell antigens using an SVM with a radial basis function kernel.

Workflow Architecture for Neoantigen Prediction

The computational neoantigen prediction workflow proceeds through sequential modules, as illustrated in Figure 1.

flowchart TD
    A["Tumor and Germline Sequencing Data"] --> B["Somatic Variant Calling (SNVs, Indels, Fusions)"]
    B --> C["Peptide Generation("Mutant 8-11 mers for Class I; 13-25 mers for Class II")"]
    C --> D["MHC Binding Affinity Prediction"]
    D --> E["Proteasomal Cleavage and TAP Transport Prediction"]
    E --> F["Peptide-MHC Stability Modeling"]
    F --> G["Immunogenicity Scoring (TCR recognition, self-similarity)"]
    G --> H["Prioritization and Ranking"]
    H --> I["Validation (Mass Spectrometry, T cell assays)"]

The workflow begins with whole-exome or whole-genome sequencing of tumor and matched normal tissue. Somatic variant calling identifies non-synonymous mutations, which are translated into mutant peptide sequences. These peptides are filtered by predicted MHC binding affinity using allele-specific models. Subsequent filters include proteasomal cleavage prediction, TAP transport efficiency, and pMHC complex stability. The final immunogenicity score often incorporates features such as the dissimilarity of the mutant peptide to the wild-type self-peptide repertoire [12] and the compatibility with the patient's TCR beta chain repertoire [13].

Evaluation and Benchmarking

Standardized benchmarking is essential for algorithm comparison. The Tumor Neoantigen Selection Alliance (TESLA) consortium [14] provided a systematic evaluation of neoantigen prediction methods using curated T cell response data, revealing that no single algorithm consistently outperformed others. Key parameters identified included peptide-MHC binding affinity, mutant allele frequency, and gene expression level.

Shoombuatong and colleagues [15] performed a comprehensive review of machine learning-based approaches, concluding that ensemble methods and deep learning models generally outperform PSSM-based methods but suffer from reduced generalizability across cancer types. The pVACtools suite [16] and pVACview visualization tool [17] facilitate the interactive exploration of prediction outputs, allowing researchers to integrate multiple algorithm scores and manual curation.

Veterinary-Specific Considerations

The application of neoantigen prediction algorithms to veterinary species presents unique challenges. The DLA complex in dogs comprises highly polymorphic class I (DLA-88, DLA-12, DLA-64) and class II (DLA-DRB1, DLA-DQA1, DLA-DQB1) loci, with over 100 known alleles. Most prediction algorithms have been trained on human HLA data and require species-specific retraining or transfer learning. Charneau and colleagues [18] developed a prediction algorithm using HLA transgenic mice, a strategy that could be adapted for veterinary MHC alleles by generating transgenic mice expressing common DLA or FLA variants.

The immunopeptidomic landscape of canine tumors, including osteosarcoma, lymphoma, and mammary carcinoma, has been characterized by liquid chromatography-tandem mass spectrometry (LC-MS/MS). These datasets provide training material for allele-specific binding predictors. However, the limited number of experimentally validated neoantigens in veterinary species restricts the ability to train supervised immunogenicity models. Unsupervised and semi-supervised approaches, such as those based on pMHC stability or peptide self-similarity, may offer more immediate applicability in veterinary contexts.

The SIGANEO method [19] uses a similarity network with generative adversarial network (GAN) enhancement to predict immunogenic neoepitopes, an approach that does not require large labeled datasets and could be directly applied to canine or feline tumor data. Similarly, the ScanNeo2 workflow [20] is designed to handle diverse genomic alterations including gene fusions and non-canonical splicing events, which are relevant to veterinary cancers with high structural variant loads.

Future Directions and Challenges

Several outstanding challenges remain. First, the prediction of neoantigens derived from non-canonical sources such as intron retention, alternative splicing, and long non-coding RNA translation requires specialized detection pipelines. Splicing neoantigen discovery using SNAF [21] demonstrated that splice variants generate shared immunogenic targets across patients, a finding that may extend to veterinary cancers with conserved splice junction patterns.

Second, the integration of TCR repertoire sequencing data into neoantigen prioritization improves the specificity of immunogenicity predictions. Pham and colleagues [13] showed that the TCR beta chain repertoire of tumor-infiltrating lymphocytes can be used to filter neoantigens that are likely to be recognized by the existing T cell population. Similar approaches could be applied to canine and feline tumor specimens using species-specific TCR variable region databases.

Third, the equitable treatment of MHC alleles across species and populations remains an unresolved issue [1]. The development of pan-species MHC binding predictors that generalize across human, canine, feline, and bovine alleles would greatly accelerate veterinary cancer immunotherapy research.

Finally, the incorporation of mass spectrometry immunopeptidomic data into the training pipeline improves the accuracy of binding predictions [22, 23]. Pyke and colleagues [22] used large-scale immunopeptidomes from human cell lines to train a composite model of MHC peptide presentation. The generation of analogous immunopeptidomic datasets from canine and feline cell lines or tumor specimens is a priority for the advancement of veterinary computational immuno-oncology.

Conclusion

Neoantigen prediction algorithms have evolved from simple binding affinity scorers to sophisticated multi-task deep learning architectures that integrate genomic, transcriptomic, proteomic, and immunologic data. The field is moving toward pan-specific and pan-species models that can generalize across MHC alleles and cancer types. For veterinary medicine, the adaptation of these algorithms to species-specific MHC polymorphisms and the creation of validated immunopeptidomic datasets are essential next steps. As computational methods continue to mature, the prospect of personalized cancer immunotherapy for companion animals becomes increasingly feasible.

References

[1] Glynn E, Ghersi D, Singh M. Toward equitable major histocompatibility complex binding predictions. Proc Natl Acad Sci U S A. 2025. URL: https://pubmed.ncbi.nlm.nih.gov/39964728/

[2] Wang F, Wang H, Wang L et al. MHCRoBERTa: pan-specific peptide-MHC class I binding prediction through transfer learning with label-agnostic protein sequences. Brief Bioinform. 2022. URL: https://pubmed.ncbi.nlm.nih.gov/35443027/

[3] Shao XM, Bhattacharya R, Huang J et al. High-Throughput Prediction of MHC Class I and II Neoantigens with MHCnuggets. Cancer Immunol Res. 2020. URL: https://pubmed.ncbi.nlm.nih.gov/31871119/

[4] Li G, Iyer B, Prasath VBS et al. DeepImmuno: deep learning-empowered prediction and generation of immunogenic peptides for T-cell immunity. Brief Bioinform. 2021. URL: https://pubmed.ncbi.nlm.nih.gov/34009266/

[5] Shao Y, Ge S, Dong R et al. NeoTImmuML: a machine learning-based prediction model for human tumor neoantigen immunogenicity. Front Immunol. 2025. URL: https://pubmed.ncbi.nlm.nih.gov/41200173/

[6] Cai Y, Chen R, Song M et al. CNNeoPP: a large language model-enhanced deep learning pipeline for personalized neoantigen prediction and liquid biopsy applications. Front Immunol. 2026. URL: https://pubmed.ncbi.nlm.nih.gov/41716398/

[7] Ma W, Zhang J, Yao H. NeoMUST: an accurate and efficient multi-task learning model for neoantigen presentation. Life Sci Alliance. 2024. URL: https://pubmed.ncbi.nlm.nih.gov/38290755/

[8] Yu JC, Ni K, Chen CT. ENCAP: Computational prediction of tumor T cell antigens with ensemble classifiers and diverse sequence features. PLoS One. 2024. URL: https://pubmed.ncbi.nlm.nih.gov/39024250/

[9] Yang X, Zhao L, Wei F et al. DeepNetBim: deep learning model for predicting HLA-epitope interactions based on network analysis by harnessing binding and immunogenicity information. BMC Bioinformatics. 2021. URL: https://pubmed.ncbi.nlm.nih.gov/33952199/

[10] Gober JG, Capietto AH, Hoshyar R et al. MHC2-SCALE enhances identification of immunogenic neoantigens. iScience. 2025. URL: https://pubmed.ncbi.nlm.nih.gov/40235585/

[11] Tran TO, Le NQK. Sa-TTCA: An SVM-based approach for tumor T-cell antigen classification using features extracted from biological sequencing and natural language processing. Comput Biol Med. 2024. URL: https://pubmed.ncbi.nlm.nih.gov/38636332/

[12] Richman LP, Vonderheide RH, Rech AJ. Neoantigen Dissimilarity to the Self-Proteome Predicts Immunogenicity and Response to Immune Checkpoint Blockade. Cell Syst. 2019. URL: https://pubmed.ncbi.nlm.nih.gov/31606370/

[13] Pham TMQ, Nguyen TN, Tran Nguyen BQ et al. The T cell receptor beta chain repertoire of tumor infiltrating lymphocytes improves neoantigen prediction and prioritization. Elife. 2024. URL: https://pubmed.ncbi.nlm.nih.gov/39466298/

[14] Wells DK, van Buuren MM, Dang KK et al. Key Parameters of Tumor Epitope Immunogenicity Revealed Through a Consortium Approach Improve Neoantigen Prediction. Cell. 2020. URL: https://pubmed.ncbi.nlm.nih.gov/33038342/

[15] Shoombuatong W, Ahmed S, Mahmud SH et al. A comprehensive review and evaluation of machine learning-based approaches for identifying tumor T cell antigens. Comput Biol Chem. 2025. URL: https://pubmed.ncbi.nlm.nih.gov/40215672/

[16] Hundal J, Kiwala S, McMichael J et al. pVACtools: A Computational Toolkit to Identify and Visualize Cancer Neoantigens. Cancer Immunol Res. 2020. URL: https://pubmed.ncbi.nlm.nih.gov/31907209/

[17] Xia H, Hoang MH, Schmidt E et al. pVACview: an interactive visualization tool for efficient neoantigen prioritization and selection. Genome Med. 2024. URL: https://pubmed.ncbi.nlm.nih.gov/39538339/

[18] Charneau J, Suzuki T, Shimomura M et al. Development of antigen-prediction algorithm for personalized neoantigen vaccine using human leukocyte antigen transgenic mouse. Cancer Sci. 2022. URL: https://pubmed.ncbi.nlm.nih.gov/35122353/

[19] Ye Y, Shen Y, Wang J et al. SIGANEO: Similarity network with GAN enhancement for immunogenic neoepitope prediction. Comput Struct Biotechnol J. 2023. URL: https://pubmed.ncbi.nlm.nih.gov/38034402/

[20] Schäfer RA, Guo Q, Yang R. ScanNeo2: a comprehensive workflow for neoantigen detection and immunogenicity prediction from diverse genomic and transcriptomic alterations. Bioinformatics. 2023. URL: https://pubmed.ncbi.nlm.nih.gov/37882750/

[21] Li G, Mahajan S, Ma S et al. Splicing neoantigen discovery with SNAF reveals shared targets for cancer immunotherapy. Sci Transl Med. 2024. URL: https://pubmed.ncbi.nlm.nih.gov/38232136/

[22] Pyke RM, Mellacheruvu D, Dea S et al. Precision Neoantigen Discovery Using Large-Scale Immunopeptidomes and Composite Modeling of MHC Peptide Presentation. Mol Cell Proteomics. 2023. URL: https://pubmed.ncbi.nlm.nih.gov/36796642/

[23] Abelin JG, Harjanto D, Malloy M et al. Defining HLA-II Ligand Processing and Binding Rules with Mass Spectrometry Enhances Cancer Epitope Prediction. Immunity. 2019. URL: https://pubmed.ncbi.nlm.nih.gov/31495665/

[24] Wang Z, Gu Y, Sun X et al. Computation strategies and clinical applications in neoantigen discovery towards precision cancer immunotherapy. Biomark Res. 2025. URL: https://pubmed.ncbi.nlm.nih.gov/40629481/

[25] Zhang Y, Chen TT, Li X et al. Advances and challenges in neoantigen prediction for cancer immunotherapy. Front Immunol. 2025. URL: https://pubmed.ncbi.nlm.nih.gov/40574862/

[26] Chihab LY, Burel JG, Miller AM et al. Comparative performance analysis of neoepitope prediction algorithms in head and neck cancer. Front Immunol. 2025. URL: https://pubmed.ncbi.nlm.nih.gov/40103827/

[27] Yi X, Zhao H, Hu S et al. Tumor-associated antigen prediction using a single-sample gene expression state inference algorithm. Cell Rep Methods. 2024. URL: https://pubmed.ncbi.nlm.nih.gov/39561714/

[28] Bulashevska A, Nacsa Z, Lang F et al. Artificial intelligence and neoantigens: paving the path for precision cancer immunotherapy. Front Immunol. 2024. URL: https://pubmed.ncbi.nlm.nih.gov/38868767/

[29] Müller M, Huber F, Arnaud M et al. Machine learning methods and harmonized datasets improve immunogenic neoantigen prediction. Immunity. 2023. URL: https://pubmed.ncbi.nlm.nih.gov/37816353/

[30] Jaton F. Groundwork for AI: Enforcing a benchmark for neoantigen prediction in personalized cancer immunotherapy. Soc Stud Sci. 2023. URL: https://pubmed.ncbi.nlm.nih.gov/37650579/

[31] Cai Y, Chen R, Gao S et al. Artificial intelligence applied in neoantigen identification facilitates personalized cancer immunotherapy. Front Oncol. 2022. URL: https://pubmed.ncbi.nlm.nih.gov/36698417/

[32] Herrera-Bravo J, Herrera Belén L, Farias JG et al. TAP 1.0: A robust immunoinformatic tool for the prediction of tumor T-cell antigens based on AAindex properties. Comput Biol Chem. 2021. URL: https://pubmed.ncbi.nlm.nih.gov/33592504/

[33] Bai P, Li Y, Zhou Q et al. Immune-based mutation classification enables neoantigen prioritization and immune feature discovery in cancer immunotherapy. Oncoimmunology. 2021. URL: https://pubmed.ncbi.nlm.nih.gov/33537173/

[34] Tang Y, Wang Y, Wang J et al. TruNeo: an integrated pipeline improves personalized true tumor neoantigen identification. BMC Bioinformatics. 2020. URL: https://pubmed.ncbi.nlm.nih.gov/33208106/

[35] Moore TV, Nishimura MI. Improved MHC II epitope prediction - a step towards personalized medicine. Nat Rev Clin Oncol. 2020. URL: https://pubmed.ncbi.nlm.nih.gov/31836878/

[36] Richters MM, Xia H, Campbell KM et al. Best practices for bioinformatic characterization of neoantigens for clinical utility. Genome Med. 2019. URL: https://pubmed.ncbi.nlm.nih.gov/31462330/

[37] Boegel S, Castle JC, Kodysh J et al. Bioinformatic methods for cancer neoantigen prediction. Prog Mol Biol Transl Sci. 2019. URL: https://pubmed.ncbi.nlm.nih.gov/31383407/

[38] Boehm KM, Bhinder B, Raja VJ et al. Predicting peptide presentation by major histocompatibility complex class I: an improved machine learning approach to the immunopeptidome. BMC Bioinformatics. 2019. URL: https://pubmed.ncbi.nlm.nih.gov/30611210/

[39] Tarek MM, Shafei AE, Ali MA et al. Computational prediction of vaccine potential epitopes and 3-dimensional structure of XAGE-1b for non-small cell lung cancer immunotherapy. Biomed J. 2018. URL: https://pubmed.ncbi.nlm.nih.gov/29866600/

[40] Schmidt J, Guillaume P, Dojcinovic D et al. In silico and cell-based analyses reveal strong divergence between prediction and observation of T-cell-recognized tumor antigen T-cell epitopes. J Biol Chem. 2017. URL: https://pubmed.ncbi.nlm.nih.gov/28536262/

Disclaimer: This article is for educational and informational purposes only. It is not intended to substitute for professional veterinary advice, diagnosis, treatment, or regulatory guidance. Always consult a licensed veterinarian or qualified specialist regarding animal health, disease diagnosis, and therapeutic decisions.