What is Dr. Zubair Khalid's research focus?

Dr. Zubair Khalid specializes in molecular virology, mRNA vaccine development, and computational biology, with a focus on avian pathogens like IBDV and Avian Reovirus.

Where is Dr. Zubair Khalid currently working?

Dr. Zubair Khalid is a Postdoctoral Research Associate at the University of Maryland (UMD), specifically within the Department of Animal and Avian Sciences.

Circular RNAs: Computational Identification and Analysis

Introduction

Circular RNAs (circRNAs) constitute a class of covalently closed, single-stranded RNA molecules generated through a non-canonical splicing event termed back-splicing. In this process, a downstream 5' splice site is joined to an upstream 3' splice site, producing a circular transcript lacking free 5' and 3' termini. These molecules are resistant to exonucleolytic degradation, exhibit tissue-specific and developmental stage-specific expression patterns, and have been implicated in diverse regulatory functions including microRNA sponging, protein scaffolding, and, in some cases, translation. The identification and characterization of circRNAs from high-throughput sequencing data present unique computational challenges distinct from those encountered in linear RNA analysis. This article provides an exhaustive technical review of the computational methods, algorithmic strategies, and analytical frameworks employed for circRNA identification and analysis, with emphasis on applications relevant to veterinary molecular diagnostics and host-pathogen interaction studies.

Biological Basis of Circular RNA Formation

CircRNA biogenesis occurs primarily through back-splicing, wherein the spliceosome catalyzes the ligation of a downstream donor site to an upstream acceptor site. This reaction is typically facilitated by complementary sequences within flanking introns, often Alu repeats in mammalian genomes, which bring the splice sites into proximity. The resulting circular transcript is composed of exonic sequences, intronic sequences retained as circular intronic RNAs, or a combination of both. The junction site, termed the back-splice junction (BSJ), is the defining feature of circRNAs and serves as the primary target for computational detection algorithms.

The biological stability of circRNAs, with half-lives frequently exceeding 48 hours, makes them attractive candidates for biomarker discovery in veterinary contexts. Their presence in extracellular vesicles, including exosomes, has been documented across multiple species, enabling non-invasive sampling from biofluids such as plasma, serum, and milk [1]. In the context of host-pathogen interactions, virus-derived circRNAs have been identified across a broad range of viral species and families, expanding the scope of circRNA biology beyond endogenous transcripts [2].

Sequencing Strategies for CircRNA Detection

Short-Read Sequencing Approaches

The predominant approach for circRNA discovery involves RNA sequencing (RNA-seq) using short-read platforms, typically generating reads of 100-150 nucleotides in length. Standard RNA-seq libraries are prepared following ribosomal RNA depletion or poly(A) enrichment. However, poly(A) enrichment substantially depletes circRNA populations because circular transcripts lack polyadenylated tails. Therefore, ribosomal RNA depletion is the preferred strategy for circRNA-focused studies.

The detection of circRNAs from short-read data relies on identifying reads that span the BSJ. These chimeric reads contain sequences from two non-contiguous genomic regions that are joined in the circular transcript. Computational tools must distinguish genuine back-splicing events from other sources of chimeric reads, including trans-splicing, genomic rearrangements, and template-switching artifacts during reverse transcription.

Long-Read Sequencing Approaches

Long-read sequencing technologies, including those based on single-molecule real-time sequencing and nanopore-based platforms, offer distinct advantages for circRNA analysis. These platforms can generate reads spanning entire circular transcripts, enabling the identification of full-length circRNA sequences and the resolution of complex isoform structures [3, 4]. Long-read approaches are particularly valuable for characterizing multi-exon circRNAs and for distinguishing between closely related circular isoforms that may share identical BSJs but differ in internal exon composition.

The computational analysis of long-read circRNA data requires specialized alignment strategies capable of handling reads that traverse circular junctions multiple times. Algorithms must account for the circular nature of the template, which can produce reads that wrap around the circle multiple times, generating complex alignment patterns [5].

Computational Algorithms for CircRNA Identification

Back-Splice Junction Detection

The core computational task in circRNA identification is the detection of BSJ-spanning reads. Most algorithms follow a multi-step pipeline: read alignment, chimeric read identification, junction filtering, and quantification.

Read Alignment. Short reads are typically aligned to a reference genome using splice-aware aligners such as STAR or HISAT2. These aligners are configured to report chimeric alignments where a single read maps to two distinct genomic loci. The alignment parameters must be optimized to detect the characteristic signature of back-splicing: a read segment mapping to a downstream exon followed by a segment mapping to an upstream exon.

Chimeric Read Identification. Following alignment, candidate BSJ reads are identified as those where the two aligned segments are in reverse orientation relative to the genomic coordinates. The junction site is defined by the coordinates of the donor and acceptor splice sites. A minimum of two unique reads supporting a given BSJ is typically required for initial candidate identification, though more stringent thresholds are applied for high-confidence calls.

Junction Filtering. Candidate BSJs are subjected to multiple filtering steps to remove false positives. Common filters include:

Removal of junctions supported by reads that also map to linear splice junctions
Exclusion of junctions arising from repetitive regions or segmental duplications
Filtering based on the presence of canonical splice site motifs (GT-AG, GC-AG, AT-AC)
Removal of junctions with excessive mismatches in the flanking regions

Tool-Specific Algorithmic Approaches

Multiple computational tools have been developed for circRNA identification, each employing distinct algorithmic strategies.

CirComPara2 implements a sensitive and robust detection approach that integrates multiple detection algorithms within a unified framework [6]. The tool combines results from several individual detectors, applying consensus-based filtering to improve specificity. Its modular architecture allows for the incorporation of new detection algorithms as they become available.

CircRNAFlow provides a comprehensive bioinformatics workflow that encompasses identification, quantification, and downstream functional analysis [7]. The pipeline incorporates quality control modules, alignment optimization, and statistical testing for differential expression analysis.

circ2LO leverages the LucaOne large language model for circRNA identification [8]. This approach represents a paradigm shift from traditional alignment-based methods to deep learning-based classification. The model is trained on sequence features extracted from RNA-seq data and can identify circRNAs without explicit alignment to a reference genome.

circIRES-DAF introduces a dual-attenuation fusion framework specifically designed for identifying internal ribosome entry sites (IRES) within circRNAs [9]. This tool addresses the growing interest in circRNA translation, as IRES elements enable cap-independent translation initiation on circular templates.

Deep Learning Approaches for CircRNA Analysis

The application of deep learning to circRNA analysis has expanded substantially, encompassing both identification and functional annotation tasks.

Sequence Self-Attention Neural Networks. CircSSNN employs sequence self-attention neural networks with pre-normalization to predict circRNA-binding protein interaction sites [10]. The self-attention mechanism captures long-range dependencies within the RNA sequence, enabling the identification of binding motifs that may be distributed across the circular transcript.

Multi-View Deep Learning. Multi-view deep learning frameworks integrate diverse feature representations, including sequence composition, secondary structure, and evolutionary conservation, to predict circRNA-protein interactions [11]. These approaches combine subspace learning with multi-view classifiers to leverage complementary information from different feature spaces.

Attention-Based Multiple-Instance Learning. Attention-based multiple-instance learning frameworks have been developed for classifying circRNAs and other long non-coding RNAs [12]. These methods operate on bags of instances, where each instance represents a segment of the RNA sequence, and learn to attend to the most informative regions for classification.

Feature Engineering and Interpretability

Sequence-Derived Features

Computational models for circRNA analysis rely on a diverse set of sequence-derived features. These include:

Nucleotide composition and k-mer frequencies
Splice site strength scores based on position weight matrices
Flanking intronic sequence features, including complementary repeat density
Secondary structure predictions, including minimum free energy calculations
Conservation scores across multiple species alignments

Feature Interpretability Analysis

Understanding which features contribute most strongly to circRNA identification and functional prediction is critical for model validation and biological insight. Feature interpretability analysis methods, including SHAP (SHapley Additive exPlanations) and LIME (Local Interpretable Model-agnostic Explanations), have been applied to circRNA prediction models [13]. These analyses reveal that splice site strength and flanking intron complementarity are among the most informative features for distinguishing genuine circRNAs from false positives.

Quantification and Differential Expression Analysis

Normalization Strategies

CircRNA quantification requires normalization approaches that account for both sequencing depth and transcript length. Unlike linear RNA quantification, where reads per kilobase per million (RPKM) or transcripts per million (TPM) are standard, circRNA quantification typically uses backsplice reads per million mapped reads (BSJ RPM) or similar metrics. The circular nature of the transcript complicates length normalization, as the effective length of a circRNA is not directly measurable from short-read data.

Differential Expression Testing

Statistical methods for differential circRNA expression analysis must account for the unique distributional properties of circRNA count data. Tools such as CircIMPACT enable the exploration of circRNA impact on gene expression and pathway activity [14]. This R package integrates circRNA expression data with linear gene expression data to identify circRNAs that may modulate the expression of their parent genes or affect specific biological pathways.

Validation and Quality Control

Experimental Validation Strategies

Computational predictions of circRNAs require experimental validation to confirm their circular structure. Standard validation approaches include:

RNase R treatment: Circular RNAs are resistant to RNase R, a 3' to 5' exoribonuclease that degrades linear RNAs. Enrichment following RNase R treatment provides evidence for circularity.
Divergent primer PCR: Primers designed to amplify across the BSJ produce a product only from circular templates, as linear templates would require outward-facing primers that do not yield amplification.
Sanger sequencing of PCR products: Direct sequencing of amplified BSJ regions confirms the exact junction sequence.

Computational Validation Metrics

Computational validation of circRNA predictions relies on multiple quality metrics:

Number of unique BSJ-spanning reads
Read distribution across the junction (even coverage supports genuine circularity)
Consistency of BSJ coordinates across biological replicates
Absence of supporting reads in linear RNA-enriched libraries

Primer Design for CircRNA-Specific Amplification

The design of specific primers for circRNA amplification presents unique challenges due to the need to amplify across the BSJ. The CircPrime web-based platform addresses this need by providing automated primer design for circRNA-specific PCR [15]. The platform considers factors including primer melting temperature, GC content, secondary structure formation, and specificity against the linear transcript. Proper primer design is essential for both validation experiments and quantitative PCR-based expression analysis.

Virus-Derived Circular RNAs

The identification of virus-derived circRNAs represents a rapidly expanding area of research with direct relevance to veterinary virology. Unbiased and comprehensive approaches have revealed circRNAs derived from a large range of viral species and families [2]. These virus-derived circRNAs may play roles in viral replication, host immune evasion, and pathogenesis.

Computational identification of viral circRNAs requires specialized approaches that account for the diversity of viral genome structures. Unlike host circRNAs, which are typically derived from annotated exons, viral circRNAs may arise from non-coding regions, overlapping reading frames, or intergenic sequences. Detection algorithms must be adapted to handle viral genomes with high mutation rates and complex transcriptional strategies.

Host-Pathogen Interaction Studies

CircRNAs have been implicated in host-pathogen interactions across multiple veterinary disease models. In silico identification and characterization of circRNAs during host-pathogen interactions requires integrated computational workflows that combine transcriptomic data from both host and pathogen [16]. These analyses can reveal circRNAs that are differentially expressed in response to infection and may serve as biomarkers or therapeutic targets.

The study of circRNAs in the context of bacterial infections, such as those caused by Escherichia coli in Chickens and Poultry Products or Mycoplasma bovis in Feedlot Cattle, may reveal novel regulatory mechanisms underlying host immune responses. Similarly, viral infections including Highly Pathogenic Avian Influenza (H5N1) in Poultry and Wild Birds and African Swine Fever may induce specific circRNA signatures with diagnostic utility.

Workflow for CircRNA Identification and Analysis

The following diagram illustrates a comprehensive computational workflow for circRNA identification and analysis from RNA-seq data.

flowchart TD
    A[RNA-seq Raw Data] --> B[Quality Control and Trimming]
    B --> C[Read Alignment to Reference Genome]
    C --> D[Chimeric Read Identification]
    D --> E[Back-Splice Junction Detection]
    E --> F[Junction Filtering and Validation]
    F --> G[CircRNA Quantification]
    G --> H[Differential Expression Analysis]
    H --> I[Functional Annotation and Pathway Analysis]
    I --> J[Experimental Validation]
    J --> K[Primer Design for Targeted Assays]
    K --> L[Clinical or Research Application]
    
    F --> M[Long-Read Sequencing for Full-Length Characterization]
    M --> N[Isoform Resolution and Sequence Reconstruction]
    N --> I

Challenges and Limitations

False Positive Rates

The accurate identification of circRNAs from short-read sequencing data remains challenging due to high false positive rates. Sources of false positives include:

Template-switching artifacts during reverse transcription
Trans-splicing events that produce chimeric linear transcripts
Genomic rearrangements or structural variants
Alignment errors in repetitive or low-complexity regions

Comparative evaluations of circRNA identification tools have revealed substantial variability in performance across different datasets and experimental conditions [17, 18]. No single tool consistently outperforms others across all metrics, suggesting that consensus-based approaches combining multiple detection algorithms may provide the most reliable results.

Quantification Accuracy

Accurate quantification of circRNA expression levels is complicated by several factors. The circular structure prevents the use of standard length normalization approaches. Additionally, circRNAs and their linear counterparts share identical exon sequences, making it difficult to attribute reads to the circular versus linear transcript. Computational deconvolution methods that model the relative abundance of circular and linear isoforms are under active development.

Reproducibility

Reproducibility of circRNA identification across technical and biological replicates remains a concern. Factors contributing to poor reproducibility include batch effects in library preparation, variability in sequencing depth, and stochastic detection of low-abundance circRNAs. Statistical frameworks that account for these sources of variability are essential for robust circRNA analysis.

Future Directions

Integration of Multi-Omics Data

The integration of circRNA expression data with other omics layers, including proteomics, metabolomics, and epigenomics, will provide a more comprehensive understanding of circRNA function. Computational frameworks that support multi-omics integration are needed to fully exploit the regulatory potential of circRNAs.

Machine Learning for Functional Prediction

Advanced machine learning approaches, including graph neural networks and transformer-based models, hold promise for predicting circRNA functions directly from sequence. These models can learn complex sequence-structure-function relationships without explicit feature engineering.

Single-Cell CircRNA Analysis

The application of single-cell RNA-seq technologies to circRNA analysis is an emerging frontier. Computational methods must be adapted to handle the increased sparsity and technical noise inherent in single-cell data while maintaining sensitivity for circRNA detection.

Veterinary Diagnostic Applications

The development of circRNA-based diagnostic assays for veterinary applications requires robust computational pipelines that can be deployed in clinical settings. Point-of-care molecular diagnostics for pathogens such as Feline Leukemia Virus and Canine Parvovirus could potentially incorporate circRNA biomarkers for improved sensitivity and specificity.

Conclusion

Computational identification and analysis of circular RNAs represent a rapidly evolving field at the intersection of transcriptomics, bioinformatics, and molecular biology. The unique structural features of circRNAs necessitate specialized algorithmic approaches that differ fundamentally from those used for linear RNA analysis. Advances in sequencing technologies, particularly long-read platforms, are enabling more comprehensive characterization of circRNA diversity. Deep learning methods are improving the accuracy of circRNA detection and functional prediction. As the field matures, the integration of circRNA analysis into routine veterinary molecular diagnostics workflows holds promise for improved disease surveillance, biomarker discovery, and therapeutic development.

References

[1] Zhao J, Li Q, Hu J, et al. Circular RNA landscape in extracellular vesicles from human biofluids. Genome Med. 2024. URL: https://pubmed.ncbi.nlm.nih.gov/39482783/

[2] Chasseur AS, Bellefroid M, Galais M, et al. Unbiased and comprehensive identification of virus-derived circular RNAs in a large range of viral species and families. PLoS Pathog. 2025. URL: https://pubmed.ncbi.nlm.nih.gov/40934191/

[3] Bessière C, Meggetto F, Gaspin C, et al. Identification of Circular RNA Variants by Oxford Nanopore Long-Read Sequencing. Methods Mol Biol. 2025. URL: https://pubmed.ncbi.nlm.nih.gov/39900754/

[4] Lu W, Yu K, Li X, et al. Identification of full-length circular nucleic acids using long-read sequencing technologies. Analyst. 2021. URL: https://pubmed.ncbi.nlm.nih.gov/34549740/

[5] Hossain MT, Zhang J, Reza MS, et al. Reconstruction of Full-Length circRNA Sequences Using Chimeric Alignment Information. Int J Mol Sci. 2022. URL: https://pubmed.ncbi.nlm.nih.gov/35743218/

[6] Gaffo E, Buratin A, Dal Molin A, et al. Sensitive, reliable and robust circRNA detection from RNA-seq with CirComPara2. Brief Bioinform. 2022. URL: https://pubmed.ncbi.nlm.nih.gov/34698333/

[7] Salinas EA, Edwards YJK. Circular RNA Identification and Characterization with CircRNAFlow: A Bioinformatics Approach. Adv Exp Med Biol. 2025. URL: https://pubmed.ncbi.nlm.nih.gov/40886271/

[8] Yu H, Yu Y, Xia Y. circ2LO: Identification of CircRNA Based on the LucaOne Large Model. Genes (Basel). 2025. URL: https://pubmed.ncbi.nlm.nih.gov/40282373/

[9] Wang Z, Liu L, Lei X. circIRES-DAF: A dual-attenuation fusion framework for identification of internal ribosome entry sites in circular RNAs. Int J Biol Macromol. 2026. URL: https://pubmed.ncbi.nlm.nih.gov/41651269/

[10] Cao C, Yang S, Li M, et al. CircSSNN: circRNA-binding site prediction via sequence self-attention neural networks with pre-normalization. BMC Bioinformatics. 2023. URL: https://pubmed.ncbi.nlm.nih.gov/37254080/

[11] Li H, Deng Z, Yang H, et al. circRNA-binding protein site prediction based on multi-view deep learning, subspace learning and multi-view classifier. Brief Bioinform. 2022. URL: https://pubmed.ncbi.nlm.nih.gov/34571539/

[12] Liu Y, Fu Q, Peng X, et al. Attention-Based Deep Multiple-Instance Learning for Classifying Circular RNA and Other Long Non-Coding RNA. Genes (Basel). 2021. URL: https://pubmed.ncbi.nlm.nih.gov/34946967/

[13] Niu M, Wang C, Chen Y, et al. CircRNA identification and feature interpretability analysis. BMC Biol. 2024. URL: https://pubmed.ncbi.nlm.nih.gov/38408987/

[14] Buratin A, Gaffo E, Dal Molin A, et al. CircIMPACT: An R Package to Explore Circular RNA Impact on Gene Expression and Pathways. Genes (Basel). 2021. URL: https://pubmed.ncbi.nlm.nih.gov/34356060/

[15] Sharko F, Rbbani G, Siriyappagouder P, et al. CircPrime: a web-based platform for design of specific circular RNA primers. BMC Bioinformatics. 2023. URL: https://pubmed.ncbi.nlm.nih.gov/37208611/

[16] Ealam Selvan M, Lim KS, Teo CH, et al. In Silico Identification and Characterization of circRNAs During Host-Pathogen Interactions. J Vis Exp. 2022. URL: https://pubmed.ncbi.nlm.nih.gov/36342167/

[17] Digby B, Finn S, Ó Broin P. Computational approaches and challenges in the analysis of circRNA data. BMC Genomics. 2024. URL: https://pubmed.ncbi.nlm.nih.gov/38807085/

[18] Bauer-Negrini G, Cordenonsi da Fonseca G, Gottfried C, et al. Usability evaluation of circRNA identification tools: Development of a heuristic-based framework and analysis. Comput Biol Med. 2022. URL: https://pubmed.ncbi.nlm.nih.gov/35780604/

[19] Liu N, Zhang Y. A deep learning approach based on molecular graph features and residual blocks to predict interaction sites between CircRNA and RBP. Biochem Biophys Res Commun. 2025. URL: https://pubmed.ncbi.nlm.nih.gov/40997583/

[20] Shao M, Hao S, Jiang L, et al. CRIT: Identifying RNA-binding protein regulator in circRNA life cycle via non-negative matrix factorization. Mol Ther Nucleic Acids. 2022. URL: https://pubmed.ncbi.nlm.nih.gov/36420213/

[21] Cochran KR, Gorospe M, De S. Bioinformatic Analysis of CircRNA from RNA-seq Datasets. Methods Mol Biol. 2022. URL: https://pubmed.ncbi.nlm.nih.gov/35604551/

[22] Ye Y, Wang Z, Yang Y. Comprehensive Identification of Translatable Circular RNAs Using Polysome Profiling. Bio Protoc. 2021. URL: https://pubmed.ncbi.nlm.nih.gov/34692916/

[23] Terrón-Camero LC, Andrés-León E. NGS Methodologies and Computational Algorithms for the Prediction and Analysis of Plant Circular RNAs. Methods Mol Biol. 2021. URL: https://pubmed.ncbi.nlm.nih.gov/34195961/

[24] Gaffo E, Buratin A, Dal Molin A, et al. Bioinformatic Analysis of Circular RNA Expression. Methods Mol Biol. 2021. URL: https://pubmed.ncbi.nlm.nih.gov/34160817/

[25] Weinberg CE, Olzog VJ, Eckert I, et al. Identification of over 200-fold more hairpin ribozymes than previously known in diverse circular RNAs. Nucleic Acids Res. 2021. URL: https://pubmed.ncbi.nlm.nih.gov/34096583/

Disclaimer: This article is for educational and informational purposes only. It is not intended to substitute for professional veterinary advice, diagnosis, treatment, or regulatory guidance. Always consult a licensed veterinarian or qualified specialist regarding animal health, disease diagnosis, and therapeutic decisions.