Variant Calling Pipelines: GATK Best Practices, FreeBayes, and DeepVariant Comparison
Variant calling constitutes a core computational step in genomic analysis. It transforms raw sequencing data, typically in the form of aligned reads stored in BAM (Binary Alignment Map) format, into a list of genetic differences, stored in VCF format, relative to a reference genome [1]. The accuracy of this process directly influences downstream applications such as genome-wide association studies (GWAS), population genetics, and pathogen surveillance in veterinary medicine. This article provides a detailed, comparative analysis of three widely used variant calling methodologies: the Genome Analysis Toolkit (GATK) Best Practices pipeline, the Bayesian haplotype-based caller FreeBayes, and the deep learning-based DeepVariant. Understanding the algorithmic fundamentals and performance characteristics of each approach is essential for selecting an appropriate pipeline for specific research questions in veterinary diagnostics and computational biology.
Algorithmic Foundations of Variant Detection
GATK Best Practices
The GATK Best Practices pipeline, particularly for germline variant discovery, is built upon a probabilistic framework that explicitly models base quality scores, read mapping quality, and allele frequency priors [2]. The core engine for single nucleotide variant (SNV) and indel discovery is HaplotypeCaller. HaplotypeCaller operates by first identifying regions of the genome that show evidence of variation. Within these active regions, it performs a local de novo assembly of the read data using a De Bruijn graph. This assembly constructs candidate haplotypes, which are then realigned to the reference genome using a Smith-Waterman alignment algorithm. The likelihood of each haplotype given the observed read data is calculated using a Bayesian model.
The GATK workflow incorporates a critical step known as joint genotyping [2]. In this approach, individual sample GVCF (Genomic VCF) files, which contain per-sample genotype likelihoods, are combined into a cohort. This joint analysis allows the system to aggregate information across samples, improving sensitivity for low-frequency variants and enabling the calculation of cohort-level allele frequencies. This process is particularly valuable in veterinary population studies where multiple animals are sequenced to identify breed-specific or disease-associated variants. Following joint genotyping, the pipeline applies a series of filtering steps, including Variant Quality Score Recalibration (VQSR), which uses a training set of known true variants (e.g., from databases of known polymorphisms) to build a model that distinguishes true variants from sequencing artifacts based on multiple annotation metrics (e.g., QualByDepth, FisherStrand, MappingQualityRankSumTest) [2].
FreeBayes: A Bayesian Haplotype Caller
FreeBayes employs a fundamentally different approach to variant calling. It is a haplotype-based variant detector that models the probability of each possible genotype at every position in the genome [3, 1]. Unlike GATK, FreeBayes does not perform a local de novo assembly of the read data. Instead, it directly evaluates read alignment information. For each position, FreeBayes considers the set of reads that span that position and uses a Bayesian model to calculate the posterior probability of each potential genotype.
A key feature of FreeBayes is its handling of complex variation, particularly multi-allelic sites and indels [3]. The algorithm uses a "haplotype window" approach where it examines reads that span multiple variant positions. This allows FreeBayes to phase nearby variants and detect compound mutations that might be missed by simpler per-position models. The sensitivity of FreeBayes can be tuned by adjusting the prior probability of observing a variant, and it can be run in a mode that ignores alignment scores (base quality) to detect variants in very low-complexity regions where mapping is difficult [4]. FreeBayes is known to be sensitive when used in combination with certain read aligners, with studies showing higher sensitivity when paired with Bowtie2 compared to BWA-MEM for certain datasets [4]. This sensitivity, however, may come at the cost of reduced specificity for certain variant types, as evidenced by comparative studies on somatic mutation detection where FreeBayes showed lower specificity than GATK-based tools [1].
DeepVariant: A Deep Learning Approach
DeepVariant represents a paradigm shift from statistical models to deep convolutional neural networks (CNNs). Instead of explicitly modeling sequencing error rates or allele frequencies, DeepVariant transforms the problem of variant calling into an image classification task. For every candidate position in the genome, DeepVariant constructs a multi-channel image, or "pileup image," from the aligned reads. This image contains visual representations of the read data: one channel encodes the reference base, another encodes the read bases, and additional channels encode base quality scores, mapping quality, and strand information.
These pileup images are then fed into a CNN architecture, specifically a modified Inception-v3 model, which has been pre-trained on large datasets of human genomes. The network learns to recognize patterns that correspond to true variants (e.g., a clear cluster of reads supporting an alternate allele) and patterns that correspond to sequencing artifacts (e.g., systematic errors at specific motifs or strand-biased signals). DeepVariant outputs three probabilities for each candidate site: homozygous reference, heterozygous variant, and homozygous variant. This approach inherently captures complex interactions between read features that are difficult to model explicitly with traditional Bayesian methods. DeepVariant's performance is characterized by high accuracy, but it requires significant computational resources, particularly GPU accelerators, for inference and is heavily reliant on the quality of the basecalling and alignment steps.
Comparative Performance and Trade-offs
The selection of a variant calling pipeline depends on the specific requirements of the study, including the type of variants of interest, the sequencing depth, the ploidy of the organism, and available computational resources.
Sensitivity and Specificity
Comparative studies have revealed distinct performance profiles for these pipelines. For detection of SNVs, both GATK HaplotypeCaller and DeepVariant generally achieve high sensitivity and specificity, with DeepVariant often outperforming GATK in terms of overall accuracy by handling difficult genomic regions (e.g., homopolymer runs, repetitive elements) more robustly. FreeBayes, while highly sensitive, can show reduced specificity, particularly for indel detection, as it may produce a larger number of false positive calls in error-prone regions [1].
A study investigating sensitivity and specificity using viral data found that the performance of FreeBayes varies significantly with read depth and length [4]. Accuracy increased with the number of reads, while higher read lengths led to divergence in accuracy and sensitivity between different aligners and the caller [4]. For somatic variant detection in human whole-exome sequencing, FreeBayes demonstrated the highest sensitivity for detecting true positives, but this came at the cost of a significantly higher false positive rate compared to tools like MuTect2 (a GATK component) and Strelka2 [1]. This finding has direct relevance for veterinary oncology applications where minimizing false positives is critical to avoid erroneous clinical decisions.
Indel Detection
The accurate detection of insertions and deletions remains a significant challenge in variant calling. GATK's local assembly approach is effective for identifying indels, but can be computationally expensive. FreeBayes has been specifically studied for complex indel detection and uses a simulation-based framework to parse alignment information [3]. DeepVariant's CNN model, when trained on datasets containing a diverse set of indels, can learn the complex read patterns associated with these events. However, all methods struggle with large structural variants, which often require specialized tools (see Deep Learning for Annotating Structural Variants in Viral Genomes).
Computational Requirements
The computational cost of each pipeline is a critical practical consideration. GATK's Best Practices pipeline, especially when performing joint genotyping across many samples, involves a multi-step workflow with significant RAM and disk I/O requirements [2]. DeepVariant is the most computationally intensive, requiring a capable GPU to run in a reasonable timeframe. FreeBayes is the least resource-intensive, making it well-suited for rapid analysis or for use in resource-constrained settings.
Performance Summary Table
| Feature | GATK HaplotypeCaller | FreeBayes | DeepVariant | |, - |, - |, - |, - | | Core Method | De Bruijn assembly + Bayesian model | Bayesian haplotype window | Convolutional Neural Network (Inception-v3) | | Variant Types | SNVs, small indels | SNVs, indels, complex multi-allelic sites | SNVs, small indels | | Computational Load | High (RAM intensive) | Low to Moderate | Very High (GPU required) | | Sensitivity (SNVs) | High | Very High | Very High | | Specificity (SNVs) | High | Moderate to High | Very High | | Indel Sensitivity | Moderate to High | High | High | | Phasing | Local phasing (haplotypes) | Local phasing (haplotype window) | None (per-site) |
Algorithmic Decision Workflow
The following Mermaid flowchart illustrates a decision process for selecting an appropriate variant calling pipeline in a veterinary genomics context.
graph TD
A[Raw Sequencing Reads (FASTQ)], > B[Read Alignment to Reference Genome]
B, > C{Align to Metagenomic<br>Reference?}
C, Yes, > D[Use Viral/Bacterial<br>Specific Workflows]
C, No, > E[Standard Alignment<br>(BWA-MEM, Bowtie2)]
E, > F{Study Type?}
F, Germline Variant<br>Discovery, > G[Single Sample?]
G, Yes, > H[GATK HaplotypeCaller<br>or DeepVariant]
G, No, > I[GATK Joint Genotyping<br>Pipeline]
F, Somatic Variant<br>(e.g., Cancer), > J[High Accuracy Needed?]
J, Yes, > K[DeepVariant or GATK Mutect2]
J, No, > L[FreeBayes<br>(High Sensitivity)]
F, Rapid Screening /<br>Resource Limited, > M[FreeBayes]
H, > N[VCF File]
I, > N
K, > N
L, > N
M, > N
N, > O[Variant Filtering and<br>Quality Control]
O, > P[Downstream Analysis<br>(GWAS, Phylogenetics)]
Recommendations for Veterinary Applications
Purity and Ploidy Considerations
Many veterinary applications involve non-model organisms or mixed samples. For diploid organisms (e.g., dogs, cats, horses), both GATK and DeepVariant perform well with standard ploidy settings. For haploid genomes (e.g., viral, bacterial pathogens), it is critical to set the ploidy parameter correctly. Default settings are typically designed for diploid human genomes; using these for haploid data can produce false heterozygous calls. FreeBayes offers flexible ploidy specification, making it suitable for polyploid plant or fungal genomes that are sometimes encountered in veterinary environmental samples.
Signal-to-Noise Ratio
In clinical veterinary diagnostics, the signal-to-noise ratio in sequencing data can be low due to sample quality, low viral load, or host contamination. DeepVariant, due to its training on large, diverse datasets, can generalize well to such noisy data but requires careful filtering of variant calls. GATK's VQSR step, which relies on a set of known variants, may be less applicable to non-model organisms where such databases are unavailable. In these cases, hard filtering based on quality scores and read depth is often used.
Workflow Integration and Data Formats
All three pipelines operate on an input BAM file that has been sorted, indexed, and processed with duplicate marking [2]. The output is standard VCF format, the anatomy of which is described in detail in the article Variant Call Format (VCF): Anatomy of Genomic Variant Representations and BCFtools Processing. Post-calling processing typically involves hard filtering of low-quality calls, normalization of multi-allelic sites (e.g., using bcftools norm), and quality control metrics calculation. The tools are integrated into containerized workflows using Docker or Singularity to ensure reproducibility, a topic covered in Docker and Containerization in Reproducible Research.
Conclusion
The choice between GATK, FreeBayes, and DeepVariant is not a simple matter of one being universally superior. GATK's Best Practices pipeline offers a mature, well-supported workflow with robust joint genotyping capabilities. FreeBayes provides a sensitive, computationally efficient alternative that is particularly well-suited for complex variation and non-standard ploidy. DeepVariant delivers the highest accuracy for SNVs and small indels by leveraging deep learning, but at a substantial computational cost. For veterinary genomics, the optimal pipeline selection is dictated by the specific research question, sample type, and available computational infrastructure. A thorough understanding of the algorithmic strengths and limitations of each tool is essential for generating accurate and reproducible variant calls in animal health and disease research.
References
[1] López-Cade I, Gómez-Sanz A, Sanvicente A et al. Comparative Evaluation of Mutect2, Strelka2, and FreeBayes for Somatic SNV Detection in Synthetic and Clinical Whole-Exome Sequencing Data. Biomolecules. 2025. URL: https://pubmed.ncbi.nlm.nih.gov/41301450/
[2] Brouard JS, Bissonnette N. Variant Calling from RNA-seq Data Using the GATK Joint Genotyping Workflow. Methods in Molecular Biology. 2022. URL: https://www.semanticscholar.org/paper/e875d7fd4fb3787643e8d4bb39901323ccd571ee *** Disclaimer: This article is for educational and informational purposes only. It is not intended to substitute for professional veterinary advice, diagnosis, treatment, or regulatory guidance. Always consult a licensed veterinarian or qualified specialist regarding animal health, disease diagnosis, and therapeutic decisions.
[3] Loh YHE, Lieber MR, Hsieh CL et al. Complex Indel Detection: A Simulation-Based Framework and Parsing with FreeBayes. bioRxiv. 2026. URL: https://pubmed.ncbi.nlm.nih.gov/42339259/
[4] Krishna A, Choi JS. Investigating Sensitivity, Specificity and Accuracy of Variant Calling Pipelines for Analyzing SARS-CoV-2 Data. bioRxiv. 2024. URL: https://www.semanticscholar.org/paper/25c71aef8b25552e864b1c1bdb74a43602fad198