What is Dr. Zubair Khalid's research focus?

Dr. Zubair Khalid specializes in molecular virology, mRNA vaccine development, and computational biology, with a focus on avian pathogens like IBDV and Avian Reovirus.

Where is Dr. Zubair Khalid currently working?

Dr. Zubair Khalid is a Postdoctoral Research Associate at the University of Maryland (UMD), specifically within the Department of Animal and Avian Sciences.

RNA Structure Prediction Algorithms: A Comprehensive Technical Review

Introduction

Ribonucleic acid (RNA) molecules are central to a vast array of biological functions, including information transfer, enzymatic catalysis, and regulatory control. In viral systems, RNA structure determines genome packaging, replication efficiency, translation regulation, and evasion of host immune responses. Accurate prediction of RNA structure from sequence data is therefore a critical computational challenge in veterinary virology, diagnostics, and systems biology. This review provides an exhaustive, biophysically grounded analysis of the algorithmic approaches to RNA structure prediction, covering thermodynamic models, dynamic programming, machine learning, deep learning, and graph-based methods, with reference to the most recent literature.

The Hierarchical Nature of RNA Structure

RNA folding is a hierarchical process. Primary structure (the linear sequence of nucleotides) determines secondary structure (canonical base pairs: A-U, G-C, and G-U wobble pairs, stacked in helices, with unpaired regions forming loops, bulges, and junctions). Tertiary structure results from long-range interactions primarily mediated by pseudoknots, base triples, and non-canonical contacts. The nearest-neighbor thermodynamic model (NNTM) provides a free-energy parametrization for secondary structure formation. Algorithms for structure prediction can be classified by the level of detail they target and by the core computational strategy employed.

Thermodynamic and Dynamic Programming Methods

The foundation of secondary structure prediction rests on free-energy minimization through dynamic programming (DP). The Nussinov algorithm (1978) maximises the number of base pairs. The Zuker algorithm (1981) implements a recursive partition function formalism that computes the minimum free energy (MFE) structure using experimentally determined thermodynamic parameters.

Modern implementations such as RNAfold (ViennaRNA Package) and mfold extend these principles. The nearest-neighbor parameters for stacking and loop entropies are derived from optical melting experiments. Prediction accuracy for a single MFE structure is typically 70-80% for sequences shorter than 400 nucleotides, but declines with length and for sequences containing pseudoknots.

Pseudoknots represent a major structural class in viral RNAs (e.g., ribosomal frameshifting elements in coronaviruses and retroviruses). Pseudoknot prediction is NP-complete under the standard thermodynamic model. Heuristic DP algorithms such as PKnotsRG and IPknot use iterative or integer programming approaches. A recent advancement is Spark, a sparse hierarchical energy minimization framework that enables scalable pseudoknot prediction [1]. Spark reduces computational complexity by exploiting the sparsity of base-pairing possibilities. The hierarchical analysis of pseudoknots via sections [2] provides a formal decomposition method that can be integrated into DP engines. Additionally, chord diagrams and intersection graphs offer a topological formalism for classifying pseudoknot types [3].

Machine Learning and Deep Learning Approaches

The limitations of purely thermodynamic methods have motivated a surge of machine learning (ML) and deep learning (DL) solutions. A comprehensive review by Sacco et al. [4] categorises these approaches into sequence-based, contact-map-based, and hybrid models.

Convolutional Neural Networks and Attention Mechanisms

Convolutional neural networks (CNNs) capture local sequence motifs. The combination of convolutional block attention networks with ensemble learning enhances generalizability across RNA families [5]. The TVAE-RNA method uses a transformer variational autoencoder to generate an ensemble of plausible structures [6]. Transformer architectures, initially developed for natural language processing, can process long-range dependencies via attention layers. NTFold introduces structure-sensing nucleotide attention learning, where the attention matrix is explicitly guided by secondary structure constraints [7].

Graph Neural Networks

RNA structure is naturally represented as a graph where nucleotides are nodes and base pairs are edges. Graph neural networks (GNNs) can learn to score candidate structures. Siciliano et al. [8] apply graph transformers to infer the quality of protein-RNA models, while Jiang et al. [9] develop a graph-learning-based scoring function for RNA-protein complexes. The TRAMbio package provides graph rigidity analysis of macromolecular structures, enabling evaluation of mechanical stability [10]. For ncRNA-protein interaction prediction, personalized subgraph attention models improve accuracy [11].

Language Models and Embeddings

Pre-trained RNA language models (e.g., RNA-BERT, RNA-MSM) generate context-aware embeddings that capture syntactic and structural features. These embeddings can be fine-tuned for downstream tasks such as binding site prediction. The CoBRA method predicts compound binding sites using an RNA language model [12]. BioLLMNet employs a cross-LLM transformation network for RNA-interaction prediction [13]. These models are particularly valuable when limited experimental structures are available for a given RNA family.

Conformational Ensembles and Equilibrium Folding

RNA molecules in solution do not adopt a single static structure but exist as an ensemble of conformations. The partition function approach (e.g., RNAfold -p) computes the probability of every possible base pair. Reweighting techniques based on NMR data can refine ensemble distributions. Leopold et al. [14] integrated NMR and molecular dynamics (MD) to reveal differences in the conformational ensembles of GAAG and GCAA tetraloops after reweighting.

Geng et al. [15] demonstrate that thermodynamic prediction of RNA cellular activity from sequence can be achieved via conformational ensembles, linking in vitro folding predictions to in vivo function. The RNAprecis method predicts full-detail RNA conformation from the experimentally best-observed sparse parameters, such as those from SHAPE or cryo-EM data [16].

Probabilistic and Integer Programming Models

Integer programming (IP) offers an exact optimization framework for RNA structure prediction. Kato and Sato [17] model joint secondary structures of interacting RNA molecules as an IP problem, enabling co-folding prediction. These methods are particularly applicable for viral genome packaging signals where two RNA segments must base-pair. The hierarchical analysis using sections [2] also lends itself to IP formulations. Bayesian approaches, such as those underlying the MODENA web server [18], allow in silico design of interacting RNA sequences by sampling from a posterior distribution of structures. Such methods are useful for engineering RNA-based therapeutics and for modelling viral recombination hotspots.

Specialized Algorithmic Domains

Circular RNAs

Circular RNAs (circRNAs) are covalently closed loops that form during back-splicing. Their structure prediction requires algorithms that enforce a circular constraint. Bernhart et al. [19] provide a dedicated review of circRNA secondary structure prediction and target identification. The circIRES-DAF framework identifies internal ribosome entry sites in circRNAs [20]. For circRNA-drug sensitivity associations, collaborative feature learning with graph structure learning has been applied [21].

RNA Triplet Repeats and Branching

Triplet repeat expansions (e.g., in certain viral or host transcripts) form stable hairpins. Boehmer et al. [22] present improved algorithms for predicting the structure of RNA triplet repeats and their interactions. Branching complexity is explored by Poznanovic et al. [23, 24], who examine the interplay between geometric combinatorics and thermodynamic models to improve branching predictions.

RNA-Protein and RNA-Small Molecule Interactions

Predicting binding sites on RNA is crucial for understanding viral replication and for drug design. PROBind [25] provides a web server for protein-nucleic acid binding residue prediction. The RLSFmode approach uses deep learning on molecular surfaces to predict RNA-small molecule binding modes [26]. For small interfering RNA (siRNA) design, Ren et al. [27] review computational resources that incorporate structure predictions. Sun et al. [28] review advances and challenges in ML-based RNA-small molecule interaction modeling.

Databases and Benchmarking

Structure prediction algorithms rely on high-quality training data. The StructRMDB database catalogs RNA modification sites that affect secondary structure [29]. The diverse database of de Lajarte et al. [30] aims to narrow the generalization gap by including non-coding RNAs from multiple families. Blind code competitions, such as the one reported by Lee et al. [31], compare template-based and ab initio predictions. Such benchmarks drive algorithmic refinement.

Mitigating Family Effects and Generalization

A persistent challenge is overfitting to structurally similar RNA families (family effects). Mokkedem et al. [32] employ latent-space continual learning to mitigate these effects. The use of diverse training datasets [30] and ensemble methods [5] also improves cross-family generalization.

Algorithm Comparison: Strengths and Limitations

Algorithm Category	Representative Methods	Strengths	Limitations
Thermodynamic DP	RNAfold, mfold, GernFold	Well-calibrated parameters; fast for short RNAs	No pseudoknots; single-structure bias
Pseudoknot DP	Spark, IPknot, PKnotsRG	Handles canonical pseudoknots	NP-complete for complex pseudoknots; slower
Deep Learning	TVAE-RNA, NTFold, SPOT-RNA	Captures complex patterns; high accuracy	Requires large training sets; black-box
Graph-based	GraphIRL, TRAMbio, GNN scoring	Interpretable; integrates with physics	Limited to known contact predictions
Conformational Ensemble	RNAfold -p, RNAprecis	Provides Boltzmann ensemble; accounts for dynamics	Computationally intensive for long RNAs
Integer Programming	IP for joint structures	Exact optimization; handles constraints	Scalability to long sequences

Mermaid Diagram: Algorithm Selection Workflow

graph TB
    A[RNA Sequence Input] --> B{Sequence Length?}
    B -->| < 400 nt | C{Random coil or structured?}
    B -->| > 400 nt | D[Use deep learning or divide into domains]
    C -->| Structured | E["Thermodynamic DP (MFE)"]
    C -->| Random coil | F["Partition function (ensemble)"]
    E --> G{Pseudoknots expected?}
    G -->| Yes | H["Sparse hierarchical DP (Spark)"]
    G -->| No | I[Output MFE structure]
    F --> J["Use sparse reweighting (NMR/SHAPE constraints)"]
    D --> K["Convolutional / Transformer model (NTFold, TVAE-RNA)"]
    K --> L{Pseudoknots?}
    L -->| Yes | M[Graph neural network with pseudoknot edges]
    L -->| No | N[Output structure probability map]
    H --> O[Validate against sequence covariation]
    O --> P[Final secondary+tertiary model]
    N --> P
    M --> P

Conclusions and Future Directions

RNA structure prediction has evolved from purely thermodynamic DP to a rich ecosystem of ML, DL, and hybrid methods. For veterinary applications, accurate structure models are essential for understanding mechanisms of viral pathogenesis (e.g., internal ribosome entry sites in porcine coronaviruses, frameshifting signals in avian reoviruses), designing diagnostic probes, and predicting RNA-small molecule interactions for antiviral development. Key challenges include predicting long-range tertiary contacts from sequence alone, modeling large RNAs (viral genomes exceeding 10 kb), and integrating dynamic information from experimental constraints. The recent explosion of methods in the 2025-2026 literature indicates a rapid maturation of the field. Continued benchmarking and the development of interpretable models will be essential for clinical translation in veterinary medicine.

References

[1] Gray M, Will S, Jabbari H. Spark: sparse hierarchical energy minimization for scalable prediction of RNA pseudoknots. Bioinformatics. 2026. URL: https://pubmed.ncbi.nlm.nih.gov/42015361/

[2] Masuki R, Liew D, Yong EH. Hierarchical analysis of RNA secondary structures with pseudoknots based on sections. PLoS Comput Biol. 2026. URL: https://pubmed.ncbi.nlm.nih.gov/41592130/

[3] Ibrahim R, Moore AH. Methods for Analyzing RNA Pseudoknots via Chord Diagrams and Intersection Graphs. Bull Math Biol. 2026. URL: https://pubmed.ncbi.nlm.nih.gov/42213302/

[4] Sacco G, Bussi G, Sanguinetti G. Machine learning for RNA secondary structure prediction: a review of current methods and challenges. RNA. 2026. URL: https://pubmed.ncbi.nlm.nih.gov/41577452/

[5] Lin H, Hou D, Li Z, et al. Enhanced Generalizability of RNA Secondary Structure Prediction via Convolutional Block Attention Network and Ensemble Learning. Molecules. 2025. URL: https://pubmed.ncbi.nlm.nih.gov/40871599/

[6] Mei X, Liu H, Zhu Y, et al. TVAE-RNA: ensemble-based RNA secondary structure prediction via transformer variational autoencoders. Bioinformatics. 2025. URL: https://pubmed.ncbi.nlm.nih.gov/40981507/

[7] Jin K, Zhang Z, Lan G, et al. NTFold: Structure-Sensing Nucleotide Attention Learning for RNA Secondary Structure Prediction. Sensors (Basel). 2026. URL: https://pubmed.ncbi.nlm.nih.gov/41600484/

[8] Siciliano AJ, Bao Y, Shrestha B, et al. Inferring the qualities of protein-RNA models with graph transformers. Bioinformatics. 2026. URL: https://pubmed.ncbi.nlm.nih.gov/42048142/

[9] Jiang Z, Zhang Y, Yang G, et al. Graph Learning-Based Scoring of RNA-Protein Complex Structures. J Chem Theory Comput. 2025. URL: https://pubmed.ncbi.nlm.nih.gov/40882035/

[10] Handke N, Gatter T, Reinhardt F, et al. TRAMbio: a flexible python package for graph rigidity analysis of macromolecules. BMC Bioinformatics. 2025. URL: https://pubmed.ncbi.nlm.nih.gov/41152782/

[11] Khoushehgir F, Noshad Z. Predicting ncRNA-Protein interactions with a graph attention model exploiting personalized subgraphs. J Bioinform Comput Biol. 2025. URL: https://pubmed.ncbi.nlm.nih.gov/41350235/

[12] Jang W, Shin WH. CoBRA: compound binding site prediction using RNA language model. Brief Bioinform. 2026. URL: https://pubmed.ncbi.nlm.nih.gov/41520231/

[13] Abir AR, Toki Tahmid M, Bayzid MS. BioLLMNet: enhancing RNA-interaction prediction with a specialized cross-LLM transformation network. Brief Bioinform. 2025. URL: https://pubmed.ncbi.nlm.nih.gov/41139312/

[14] Leopold D, Oxenfarth A, Thomasen FE, et al. Integrated NMR/MD investigation reveals differences after reweighting in conformational ensembles of GAAG and GCAA tetraloops. RNA. 2026. URL: https://pubmed.ncbi.nlm.nih.gov/42215280/

[15] Geng A, Roy R, Ken M, et al. Thermodynamic prediction of RNA cellular activity from sequence via conformational ensembles. Cell. 2026. URL: https://pubmed.ncbi.nlm.nih.gov/41856113/

[16] Wiechers H, Williams CJ, Eltzner B, et al. RNAprecis: Prediction of full-detail RNA conformation from the experimentally best-observed sparse parameters. PLoS Comput Biol. 2026. URL: https://pubmed.ncbi.nlm.nih.gov/42102156/

[17] Kato Y, Sato K. Prediction of RNA Joint Secondary Structures Based on Integer Programming. Methods Mol Biol. 2026. URL: https://pubmed.ncbi.nlm.nih.gov/41174198/

[18] Taneda A. In Silico Design of Interacting RNA Sequences Using the MODENA Web Server. Methods Mol Biol. 2026. URL: https://pubmed.ncbi.nlm.nih.gov/41174200/

[19] Bernhart SH, Fallmann J, Lorenz R, et al. Prediction of Circular RNA Secondary Structures and Their Targets. Adv Exp Med Biol. 2025. URL: https://pubmed.ncbi.nlm.nih.gov/40886270/

[20] Wang Z, Liu L, Lei X. circIRES-DAF: A dual-attenuation fusion framework for identification of internal ribosome entry sites in circular RNAs. Int J Biol Macromol. 2026. URL: https://pubmed.ncbi.nlm.nih.gov/41651269/

[21] Zhang X, Zou Q, Wang C, et al. CFGSCDSA: Predicting circRNA-drug sensitivity associations based on collaborative feature learning and graph structure learning. PLoS Comput Biol. 2026. URL: https://pubmed.ncbi.nlm.nih.gov/41824539/

[22] Boehmer K, Berkemer SJ, Will S, et al. RNA triplet repeats: improved algorithms for structure prediction and interactions. Algorithms Mol Biol. 2025. URL: https://pubmed.ncbi.nlm.nih.gov/41372930/

[23] Poznanović S, Cardwell O, Heitsch C. An efficient algorithm for exploring RNA branching conformations under the nearest-neighbor thermodynamic model. Algorithms Mol Biol. 2026. URL: https://pubmed.ncbi.nlm.nih.gov/41896877/

[24] Poznanović S, Cardwell O, Heitsch C. Can geometric combinatorics improve RNA branching predictions? BMC Bioinformatics. 2025. URL: https://pubmed.ncbi.nlm.nih.gov/41034743/

[25] Wu C, Zhang F, Jia P, et al. PROBind: A Web Server for Prediction, Analysis, and Visualization of Protein-Protein and Protein-Nucleic Acid Binding Residues. Genomics Proteomics Bioinformatics. 2026. URL: https://pubmed.ncbi.nlm.nih.gov/42032810/

[26] Xia W, Shu Y, Shu J, et al. RLSFmode: A deep learning approach for predicting RNA-small molecule binding modes via molecular surface modeling. Int J Biol Macromol. 2026. URL: https://pubmed.ncbi.nlm.nih.gov/42190788/

[27] Ren L, Li H, Zhang Y, et al. The Computational Journey of siRNA Silencing Efficiency: Resources, Methods, and Future Directions. Curr Drug Targets. 2026. URL: https://pubmed.ncbi.nlm.nih.gov/41832731/

[28] Sun T, Xia W, Shu J, et al. Advances and Challenges in Machine Learning for RNA-Small Molecule Interaction Modeling: Review. J Chem Theory Comput. 2025. URL: https://pubmed.ncbi.nlm.nih.gov/40921177/

[29] Zhang Z, Wang X, Zhou J, et al. StructRMDB: A database of RNA modification sites that affect RNA secondary structure. Comput Struct Biotechnol J. 2025. URL: https://pubmed.ncbi.nlm.nih.gov/41439023/

[30] de Lajarte AA, Taillades YJMD, Aruda J, et al. Diverse database and machine learning model to narrow the generalization gap in RNA structure prediction. Sci Adv. 2026. URL: https://pubmed.ncbi.nlm.nih.gov/41739924/

[31] Lee Y, He S, Oda T, et al. Template-based RNA structure prediction advanced through a blind code competition. bioRxiv. 2025. URL: https://pubmed.ncbi.nlm.nih.gov/41509375/

[32] Mokkedem W, Pedrielli G, Wu T. Mitigating Family Effects in RNA Secondary-Structure Prediction with Latent-Space Continual Learning. bioRxiv. 2026. URL: https://pubmed.ncbi.nlm.nih.gov/42094469/

[33] Makarova MO, Stiebritz MT, Basturk D, et al. NuConf: a rotamer library for DNA and RNA and its implementation in the protein design software MUMBO. Sci Rep. 2026. URL: https://pubmed.ncbi.nlm.nih.gov/42191795/

[34] Liang J, Zhou M, Xie M, et al. ShapeRNA: an integrated web server for RNA secondary structure, ensemble, and functional analysis. Nucleic Acids Res. 2026. URL: https://pubmed.ncbi.nlm.nih.gov/42023506/

[35] Hao Z, Yang Y, Zhao H, et al. [Research progress in RNA secondary structure prediction methods]. Sheng Wu Gong Cheng Xue Bao. 2025. URL: https://pubmed.ncbi.nlm.nih.gov/41755596/

[36] Elhajjajy SI, Weng Z. A novel NLP-based method and algorithm to discover RNA-binding protein (RBP) motifs, contexts, binding preferences, and interactions. RNA. 2026. URL: https://pubmed.ncbi.nlm.nih.gov/41714110/

[37] Li Y, Feng C, Zhang X, et al. DRfold2 is a deep learning-based tool that enables efficient and accurate RNA structure prediction. PLoS Biol. 2026. URL: https://pubmed.ncbi.nlm.nih.gov/41701781/

[38] Faber C, Upadhyay U, Taubert O, et al. Influence of contact map topology on RNA structure prediction. Nucleic Acids Res. 2025. URL: https://pubmed.ncbi.nlm.nih.gov/41404796/

[39] Xu Q, Li F, Han G. Noncoding RNA family classification based on multifeature fusion and convolutional block attention residual network. Brief Bioinform. 2025. URL: https://pubmed.ncbi.nlm.nih.gov/41212593/

[40] Karan A, Rivas E. All-at-once RNA folding with 3D motif prediction framed by evolutionary information. Nat Methods. 2025. URL: https://pubmed.ncbi.nlm.nih.gov/41044256/

[41] Rose L, Sanchez Giraldo L, Nguyen D, et al. When Does Additional Information Improve Accuracy of RNA Secondary Structure Prediction? J Chem Inf Model. 2025. URL: https://pubmed.ncbi.nlm.nih.gov/40958395/

[42] Chen H, Zuo Y, Liu X, et al. PreRBP: Interpretable deep learning for RNA-protein binding site prediction with attention mechanism. Anal Biochem. 2025. URL: https://pubmed.ncbi.nlm.nih.gov/40914406/

[43] Ferrari MM, Poznanović S, Riehl M, et al. The R-loop grammar predicts R-loop formation under different topological constraints. PLoS Comput Biol. 2025. URL: https://pubmed.ncbi.nlm.nih.gov/40880527/ *** Disclaimer: This article is for educational and informational purposes only. It is not intended to substitute for professional veterinary advice, diagnosis, treatment, or regulatory guidance. Always consult a licensed veterinarian or qualified specialist regarding animal health, disease diagnosis, and treatment decisions.