What is Dr. Zubair Khalid's research focus?

Dr. Zubair Khalid specializes in molecular virology, mRNA vaccine development, and computational biology, with a focus on avian pathogens like IBDV and Avian Reovirus.

Where is Dr. Zubair Khalid currently working?

Dr. Zubair Khalid is a Postdoctoral Research Associate at the University of Maryland (UMD), specifically within the Department of Animal and Avian Sciences.

Long-Read Sequencing Technologies: PacBio and Oxford Nanopore

Scientist in white coat using a computer in a laboratory setting, focusing on data analysis — Photo by Tima Miroshnichenko on Pexels.

Introduction

The evolution of nucleic acid sequencing has been marked by a transition from short-read platforms, which produce reads of 100-300 base pairs, to long-read technologies capable of generating contiguous sequences exceeding 10 kilobases. These advances have transformed the resolution of genomic analyses in veterinary science, particularly for complex genomes, repetitive regions, and haplotypic phasing. Two dominant platforms underpin this revolution: single-molecule real-time (SMRT) sequencing, originally commercialized by Pacific Biosciences, and nanopore-based sequencing, developed by Oxford Nanopore Technologies. Both technologies eliminate the requirement for polymerase chain reaction amplification during library preparation, thereby circumventing associated biases and enabling direct observation of base modifications. This article provides an exhaustive review of the biophysical mechanisms, computational workflows, and veterinary applications of these long-read sequencing methodologies, with emphasis on pathogen genomics, structural variant detection, and metagenomic profiling.

Biophysical Principles

Single-Molecule Real-Time (SMRT) Sequencing

SMRT sequencing relies on the immobilization of a single DNA polymerase molecule at the bottom of a zero-mode waveguide (ZMW), a nanophotonic confinement structure approximately 70 nanometers in diameter [1]. During sequencing, phospholinked nucleotides, each labeled with a distinct fluorophore at the terminal phosphate, are incorporated by the polymerase. The fluorophore is cleaved upon incorporation, emitting a pulse of fluorescence that is detected in real time. The ZMW confines the excitation volume to attoliter scale, reducing background fluorescence and enabling detection of single fluorophores. The resulting signal is a time-resolved trace of incorporation events.

The hallmark of SMRT sequencing is the circular consensus sequencing (CCS) mode. A double-stranded template is ligated to hairpin adapters, forming a closed circle. The polymerase reads the circular template multiple times, producing subreads that are aligned to generate a highly accurate consensus sequence (Q20 or higher). This method effectively reduces the per-base error rate from approximately 10-15 % in raw reads to less than 0.1 % in consensus, yielding so-called High-Fidelity (HiFi) reads [2]. HiFi reads typically range from 10 to 25 kilobases, with maximal read lengths exceeding 100 kilobases under optimized conditions.

Nanopore-Based Sequencing

Nanopore sequencing exploits a protein nanopore embedded in an electrically resistant polymer membrane. A voltage is applied across the membrane, driving an ionic current through the pore. During sequencing, a single-stranded DNA or RNA molecule is translocated through the pore by a motor protein (e.g., a helicase or polymerase) at a controlled speed. As each nucleotide passes through the pore, it modulates the ionic current in a sequence-dependent manner. The current signals are recorded as squiggles and subsequently converted into base calls using recurrent neural network, based basecallers [3].

The key biophysical parameters include pore diameter (approximately 1 nm for the CsgG-derived pore), applied voltage (typically 140-180 mV), and translocation speed (roughly 450 bases per second for the R10 pore). The latest pore versions (R10.4) feature dual-reader regions that improve the discrimination of homopolymer stretches. Unlike SMRT sequencing, nanopore sequencing does not require a polymerase for signal generation; it directly senses the nucleic acid, enabling direct RNA sequencing without reverse transcription. The read length is limited primarily by the integrity of the input DNA or RNA, with routine reads exceeding 100 kb and record reads approaching 2 megabases.

Comparison of Key Performance Metrics

The table below summarizes critical attributes of SMRT and nanopore sequencing from the perspective of veterinary molecular diagnostics.

Attribute	SMRT sequencing (HiFi mode)	Nanopore-based sequencing
Read length	10-25 kb (up to 100+ kb)	Median 10-50 kb; ultra-long >1 Mb
Per-base raw accuracy	>99.9% (CCS)	92-97% (R10.4 improved)
Consensus accuracy	Q40+ for specific applications	Q20, Q30 with polishing
Throughput per flow cell	10-30 Gb (current instruments)	10-50 Gb (promethION flow cell)
Error profile	Random substitution	Predominantly indels
Base modification detection	Kinetic signatures (polymerase pause)	Direct electrical current modulation
Real-time streaming	No	Yes (data available during run)
Typical library preparation time	4-6 hours	10-90 minutes (rapid kits)

Error profiles differ fundamentally. SMRT errors are largely random substitutions, making them amenable to correction by majority consensus. Nanopore errors are predominantly insertions and deletions, often in homopolymeric regions, which require specialized basecalling models and polishing algorithms to rectify.

Computational Bioinformatics Workflow

The bioinformatics pipeline for long-read sequencing includes three core stages: basecalling, read quality control, and downstream analysis. The following Mermaid diagram illustrates the general workflow for a de novo assembly project using long reads, applicable to viral, bacterial, or host genomes.

flowchart TD
 A[DNA extraction from veterinary sample] --> B[Library preparation]
 B --> C[SMRT or nanopore sequencing]
 C --> D[Basecalling with neural network model]
 D --> E[Raw read QC & adapter trimming]
 E --> F{Assembly strategy}
 F -->|Long-read-only| G[Overlap-layout-consensus assembler]
 F -->|Hybrid assembly| H[Short-read correction & scaffolding]
 G --> I[Polishing with Racon/Medaka]
 H --> I
 I --> J[Polished contigs]
 J --> K[Functional annotation & comparative genomics]

Basecalling

For SMRT sequencing, basecalling is performed instrument-internally in real time using the instrument's software. The output is already in base-space with quality values. For nanopore data, basecalling is performed either during the run (live basecalling) or post-run using software such as Guppy or Bonito. These programs employ convolutional neural networks (CNNs) or transformer architectures trained on reference sequences. The choice of basecalling model (fast, high accuracy, or super accuracy) affects throughput and accuracy. The user community has also developed alternative basecallers, including deep learning models that can simultaneously detect base modifications such as 5-methylcytosine.

Assembly and Polishing

Long-read assemblers have been designed to handle high error rates and long overlaps. Common tools for nanopore data include Flye, Canu, and Shasta, while SMRT assembly is typically performed with pb-assembly or HGAP. For hybrid assemblies, short-read data (e.g., from high-throughput sequencers) are used to correct long-read errors via tools such as Pilon or Racon. Polishing is an iterative process; Medaka is optimized specifically for nanopore data, while Arrow or Quiver are used for SMRT data.

Veterinary Applications

De Novo Assembly of Livestock and Avian Genomes

Long-read sequencing has enabled the generation of contiguous reference genomes for numerous domestic species. For example, the complete assembly of the chicken genome (Gallus gallus) has been significantly refined, resolving previously gap-filled regions such as the major histocompatibility complex. In cattle, long reads have resolved complex structural variants associated with production traits and disease susceptibility. These resources underpin comparative genomics and the identification of markers for antimicrobial resistance and immune response genes.

Pathogen Genome Characterization

Whole-genome sequencing of veterinary pathogens benefits from long reads in several ways. Repetitive regions, such as those found in the genome of Mycoplasma bovis (relevant to Mycoplasma bovis in Feedlot Cattle), can be fully resolved, enabling accurate typing. In RNA viruses, direct nanopore sequencing of viral RNA can capture the true quasispecies diversity without amplification artifacts. For Highly Pathogenic Avian Influenza (H5N1) in Poultry and Wild Birds, long-read sequencing allows full-length genome assembly from clinical samples, facilitating the tracking of reassortment events. Similarly, African Swine Fever studies have employed nanopore sequencing for rapid field-level genotyping.

Antimicrobial Resistance Gene Context

Short reads often cannot resolve the genomic context of antimicrobial resistance (AMR) genes, specifically whether they are chromosomal or located on mobile genetic elements such as plasmids. Long-read sequencing provides contiguous sequences spanning entire plasmids, as demonstrated in studies of Escherichia coli in Chickens and Poultry Products and Salmonella in Chickens. This resolution is critical for understanding horizontal gene transfer and the epidemiology of AMR.

Metagenomics and Microbiome Profiling

Full-length 16S ribosomal RNA gene sequencing (approximately 1,500 bp) is achievable with long reads, providing taxonomic resolution at the species level compared to the genus-level identification from short amplicons. Nanopore-based metagenomics has been applied to the swine gut microbiome and the avian cecal microbiome, including studies of Necrotic Enteritis in Broiler Chickens. Additionally, direct RNA sequencing with nanopores can profile transcriptomes and detect RNA viruses without the biases of reverse transcription.

Epigenetic Analysis

SMRT sequencing detects base modifications (e.g., 5-methylcytosine, 6-methyladenine) through kinetic signals, the polymerase pauses at modified bases. Nanopore sequencing directly measures the ionic current modulation induced by modified nucleotides, enabling simultaneous detection of base modifications along with the primary sequence. In veterinary contexts, this has been used to examine methylation patterns in host immune genes and to detect viral genome methylation.

Limitations and Considerations

The primary limitation of nanopore sequencing is its per-base error rate, which, despite improvements, remains higher than that of SMRT HiFi reads. For applications requiring immediate consensus accuracy (e.g., single-nucleotide variant detection), the error rate can propagate through assembly if not carefully polished. SMRT sequencing, while highly accurate, has higher per-base cost and lower throughput per flow cell compared to the highest-output nanopore configurations. Both technologies require high molecular weight DNA input; sheared or degraded DNA severely limits read length and assembly contiguity.

Library preparation for SMRT sequencing is more labor-intensive and requires specific protocols for damage repair and size selection. Nanopore library preparation is faster and can be performed with minimal equipment, making it suitable for field-based surveillance. However, the reliability of nanopore basecalling models, particularly for non-model organisms and extreme GC content, requires validation.

Cross-Linking to Relevant Site Articles

The use of long-read sequencing to resolve complex genomes is directly applicable to research on Flux Balance Analysis in Metabolic Networks and Epigenetics and Computational DNA Methylation Analysis. In pathogen detection, long-read approaches complement Point-of-Care Molecular Diagnostics for Feline Upper Respiratory Pathogens. For parasitic systems, long reads have been used to assemble the genomes of Eimeria spp. and to characterize anthelmintic resistance loci in Haemonchus placei.

Conclusions

Long-read sequencing technologies have fundamentally altered the landscape of veterinary genomics. SMRT sequencing offers high accuracy suitable for reference-grade assemblies and variant detection, while nanopore sequencing provides portability, ultra-long reads, and real-time data access. The choice between platforms depends on the specific research question: for high-confidence variant detection and base modification analysis, SMRT is strongly preferred; for rapid metagenomic surveys and structural variant discovery, nanopore sequencing offers unique advantages. Computational methods for basecalling, assembly, and polishing continue to mature, driven by advances in neural networks and graph-based algorithms. The integration of long-read sequencing into routine veterinary diagnostics and surveillance is increasing, particularly for emerging pathogens and antimicrobial resistance profiling.

References

[1] Eid, J., Fehr, A., Gray, J., Luong, K., Lyle, J., Otto, G.,... & Turner, S. (2009). Real-time DNA sequencing from single polymerase molecules. Science, 323(5910), 133-138.

[2] Wenger, A. M., Peluso, P., Rowell, W. J., Chang, P. C., Hall, R. J., Concepcion, G. T.,... & Hunkapiller, M. W. (2019). Accurate circular consensus long-read sequencing improves variant detection and assembly of a human genome. Nature Biotechnology, 37(10), 1155-1162.

[3] Deamer, D., Akeson, M., & Branton, D. (2016). Three decades of nanopore sequencing. Nature Biotechnology, 34(5), 518-524.

[4] Koren, S., Walenz, B. P., Berlin, K., Miller, J. R., Bergman, N. H., & Phillippy, A. M. (2017). Canu: scalable and accurate long-read assembly via adaptive k‑mer weighting and repeat separation. Genome Research, 27(5), 722-736.

[5] Jain, M., Olsen, H. E., Paten, B., & Akeson, M. (2016). The Oxford Nanopore MinION: delivery of nanopore sequencing to the genomics community. Genome Biology, 17(1), 239.

[6] van Dijk, E. L., Jaszczyszyn, Y., Naquin, D., & Thermes, C. (2014). The third revolution in sequencing technology. Trends in Genetics, 30(5), 195-204.

Disclaimer: This article is for educational and informational purposes only. It is not intended to substitute for professional veterinary advice, diagnosis, treatment, or regulatory guidance. Always consult a licensed veterinarian or qualified specialist regarding animal health, disease diagnosis, and therapeutic decisions.