Section: Foundations & History

Variant Calling in Whole Exome Sequencing (WES)

Technological Advances and Methodologies in Whole Exome Sequencing

Whole Exome Sequencing (WES) has emerged as a pivotal technology in the field of genomics, offering insights into the coding regions of the genome which are crucial for understanding genetic variations and their implications in various diseases. This section delves into the technological advancements and methodologies that have shaped WES, providing a comprehensive analysis of its biological mechanisms, methodological innovations, and contextual applications.

Technological Innovations in Whole Exome Sequencing

The advent of Next-Generation Sequencing (NGS) technologies has revolutionized the landscape of genomic research, making WES a feasible and cost-effective approach for large-scale genomic studies. The exponential reduction in sequencing costs has shifted the focus from data generation to data analysis and interpretation, as highlighted in the discussions at the Lake Louise Mutation Detection Meeting [1]. This paradigm shift underscores the importance of optimizing workflows and developing robust methodologies for variant calling and interpretation.

WES targets the exome, which comprises approximately 1-2% of the human genome but harbors around 85% of known disease-related variants. The precision of WES in capturing these coding regions is facilitated by advanced sequencing platforms that offer high throughput and accuracy. The integration of short-read sequencing technologies, such as those employed in NGS, has enabled the efficient capture and sequencing of exonic regions, although challenges remain in accurately mapping reads, particularly in regions with high GC content or repetitive sequences.

Methodological Developments in Variant Calling

Variant calling in WES involves identifying genetic variants from sequencing data, a process that is fraught with challenges due to the complexity of the human genome. The methodologies employed in variant calling have evolved significantly, incorporating sophisticated algorithms and statistical models to enhance accuracy and reduce false-positive rates.

One of the key advancements in this domain is the use of variant annotations and external controls to improve statistical power, as discussed in the context of rare variant association analysis [2]. These methodologies leverage prior knowledge of variant pathogenicity and population-specific allele frequencies to refine variant calling processes. Additionally, the incorporation of machine learning techniques has further enhanced the ability to distinguish true variants from sequencing artifacts, thereby improving the reliability of WES data.

Biological Mechanisms and Contextual Applications

The biological underpinnings of WES are rooted in its ability to capture and sequence exonic regions, which are the primary sites of protein-coding genes. This focus on the exome is particularly relevant for identifying causal variants in rare genetic diseases and understanding the genetic basis of complex traits. The large effect sizes associated with rare variants, as noted in biobank-scale studies, highlight the potential of WES to uncover novel genetic associations that may be missed by genome-wide association studies (GWAS) focusing on common variants [2].

In clinical settings, WES has been instrumental in diagnosing monogenic disorders and guiding personalized medicine approaches. The diagnostic utility of WES is exemplified in its application to conditions such as autosomal-dominant polycystic kidney disease (ADPKD), where genomic tests provide critical diagnostic and prognostic information. Despite these advances, challenges persist in achieving high sensitivity and specificity, particularly in genetically heterogeneous populations and diseases with atypical phenotypes.

Challenges and Future Directions

While WES has made significant strides in genomics, several challenges remain that necessitate further methodological advancements. One such challenge is the accurate interpretation of variants of uncertain significance (VUS), which require comprehensive databases and functional studies to elucidate their pathogenicity. The integration of proteomics, as advocated in recent literature, could provide additional layers of information to enhance the interpretation of WES data [3].

Another critical challenge is the management and analysis of the vast amounts of data generated by WES. The need for standardized workflows and unified taxonomies for genomic tests is paramount to facilitate data sharing and collaborative research efforts, as emphasized in discussions on the clinical translation of NGS technologies [1]. Furthermore, addressing statistical challenges in sequence-based association studies, particularly in diverse population and family-based designs, is crucial for maximizing the utility of WES in genetic research.

In conclusion, the technological advances and methodologies in WES have significantly expanded our understanding of the genetic basis of diseases, offering unprecedented opportunities for research and clinical applications. As we continue to refine these technologies and address existing challenges, WES is poised to play an increasingly central role in the era of precision medicine, guided by the collaborative efforts of researchers, clinicians, and organizations such as the WHO and NCBI. The future of WES lies in its ability to integrate multi-omics data, enhance variant interpretation, and ultimately translate genomic insights into tangible health outcomes.

Data Processing and Quality Control in Variant Calling

Whole Exome Sequencing (WES) has revolutionized the field of genomics by enabling the identification of genetic variants that contribute to various diseases, including complex disorders like Tetralogy of Fallot, schizophrenia, and endometriosis [4, 5]. The process of variant calling in WES involves several critical steps, each requiring meticulous attention to data processing and quality control to ensure the accuracy and reliability of the results. This section delves into the methodologies, biological mechanisms, and contextual considerations involved in data processing and quality control in variant calling, drawing insights from recent studies and authoritative guidelines.

Methodologies in Data Processing

The initial step in variant calling involves the processing of raw sequencing data, which includes quality assessment and filtering of reads. This is crucial to remove low-quality data that could lead to erroneous variant calls. The quality of sequencing data is typically assessed using metrics such as base quality score, read length, and sequencing depth. Tools like FastQC are commonly employed to perform this initial quality check.

Following quality assessment, the next step is read alignment, where sequencing reads are mapped to a reference genome. This step is pivotal as accurate alignment ensures that subsequent variant calling is based on the correct genomic context. The Burrows-Wheeler Aligner (BWA) and the Genome Analysis Toolkit (GATK) are widely used for read alignment and processing. GATK, in particular, provides a robust framework for post-alignment processing, including duplicate marking, realignment around indels, and base quality score recalibration, which are essential for reducing false-positive variant calls [5].

Quality Control in Variant Calling

Quality control (QC) is a continuous process that spans the entire workflow of variant calling. It begins with the assessment of raw data and continues through alignment, variant calling, and annotation. The importance of QC is underscored by the need to ensure that the variants identified are not artifacts of sequencing or analysis errors [6, 7].

One of the critical QC steps in variant calling is the evaluation of coverage metrics. Coverage refers to the number of times a particular region of the genome is sequenced. Adequate coverage is essential for reliable variant detection, particularly in regions with high GC content or repetitive sequences, which are prone to sequencing biases. Tools like CNVkit and BBSplit are utilized to assess copy number variations and the purity of samples, ensuring that the data reflects true biological signals rather than technical noise [6].

Another crucial aspect of QC is the filtering of variants based on quality scores. Variants are typically filtered using thresholds for read depth, genotype quality, and mapping quality. For instance, a study on endometriosis employed stringent criteria, retaining only variants with a read depth greater than 10, genotype quality of at least 30, and mapping quality of 40 or higher. Such rigorous filtering is vital for minimizing false positives and ensuring that the variants called are biologically relevant.

Biological Mechanisms and Context

The biological context of variant calling in WES is integral to understanding the implications of identified variants. Variants can be classified into different types, including single nucleotide polymorphisms (SNPs), insertions, deletions, and copy number variations (CNVs). Each type of variant can have different biological consequences, ranging from benign polymorphisms to pathogenic mutations that disrupt gene function [8].

In the context of diseases like Tetralogy of Fallot and schizophrenia, variant calling has revealed insights into the genetic architecture of these conditions. For instance, in Tetralogy of Fallot, WES has identified novel polymorphisms and variants in genes such as MUTYH, RARB, and GFM1, which are implicated in the pathogenesis of the disease [4]. Similarly, in schizophrenia, meta-analyses of exome data have highlighted the enrichment of ultra-rare damaging variants in genes associated with neurodevelopmental disorders, underscoring the complex genetic underpinnings of the disease.

The integration of bioinformatics tools for variant annotation further enhances the biological interpretation of variants. Tools like SIFT, PolyPhen2, and ClinPred are employed to predict the functional impact of nonsynonymous mutations, providing insights into how these variants may affect protein function and contribute to disease [4]. Additionally, databases such as gnomAD and OncoKB are invaluable resources for annotating variants with allele frequencies, pathogenicity scores, and clinical significance, facilitating the identification of variants with potential clinical relevance [6, 9].

Challenges and Future Directions

Despite the advancements in data processing and QC in variant calling, several challenges remain. One of the primary challenges is the accurate detection of variants in regions with low complexity or high homology, which can lead to misalignments and false variant calls. Moreover, the interpretation of variants of uncertain significance (VUS) poses a significant challenge, as these variants require further functional validation to ascertain their clinical relevance.

To address these challenges, ongoing efforts are focused on developing more sophisticated algorithms and pipelines that incorporate machine learning techniques for variant filtering and prioritization. For example, the GotCloud pipeline employs machine learning to filter likely artifacts and refine genotype calls, enhancing the accuracy of variant detection [7].

Furthermore, the integration of multi-omics data, such as transcriptomics and proteomics, with WES data holds promise for providing a more comprehensive understanding of the functional consequences of genetic variants. Such integrative approaches could pave the way for personalized medicine, where genetic information is used to tailor treatment strategies to individual patients [10].

In conclusion, data processing and quality control are foundational to the success of variant calling in WES. Through meticulous QC and the application of advanced bioinformatics tools, researchers can uncover the genetic variants that drive disease, providing insights into pathogenesis and informing clinical decision-making. As sequencing technologies continue to evolve, the integration of robust QC measures and innovative analytical approaches will be crucial for harnessing the full potential of WES in genomics research and clinical practice.

Algorithms and Tools for Variant Detection in Exome Data

Whole Exome Sequencing (WES) has emerged as a pivotal technology for uncovering genetic variants that are predominantly located within the coding regions of the human genome. This technology is especially significant in the context of cancer genomics and rare genetic disorders, where the identification of variants can lead to the development of targeted therapies and improved diagnostic accuracy. The process of variant detection in WES data involves several computational steps, each crucial for the accurate identification of single nucleotide variants (SNVs), insertions and deletions (INDELs), and copy number variants (CNVs). This section delves into the algorithms and tools employed for variant detection in exome data, analyzing their methodologies, biological implications, and contextual applications.

Single Nucleotide Variant (SNV) Detection

SNVs are the most common type of genetic variation and are critical in understanding the genetic basis of diseases. The detection of SNVs from WES data involves aligning sequencing reads to a reference genome and identifying nucleotide differences. Tools such as Mutect2, Strelka2, and FreeBayes are widely used for SNV detection, each employing distinct algorithms and filtering strategies. Mutect2, for instance, utilizes a probabilistic model to differentiate between somatic and germline variants, achieving high recall rates due to its sensitivity to low allele frequency variants. In contrast, Strelka2 employs a Bayesian inference method, which is particularly effective in detecting SNVs in tumor-normal paired samples by modeling the joint distribution of allelic counts. FreeBayes, on the other hand, uses a haplotype-based approach, allowing it to consider the context of nearby variants, which can be advantageous in regions with complex variant patterns.

The choice of SNV detection tool can significantly impact the results, as evidenced by the variability in SNV calls across different tools. In a comparative study, FreeBayes detected the most variants in real tumor samples, yet only a small fraction of SNVs were consistently identified across all tools. This underscores the importance of selecting tools based on the specific requirements of the study, such as the need for high sensitivity or specificity. Moreover, ensemble approaches, which combine calls from multiple tools, have been shown to enhance the robustness of SNV detection by leveraging the strengths of different algorithms.

Insertion and Deletion (INDEL) Detection

INDELs, which include insertions and deletions of bases in the genome, are the second most common type of genetic variant and pose unique challenges in detection due to their structural complexity. Algorithms such as GATK, SAMtools, Dindel, and FreeBayes have been developed to address these challenges, each with varying degrees of sensitivity and positive predictive value (PPV) [11]. GATK's HaplotypeCaller, for instance, is known for its high sensitivity in detecting short INDELs, making it suitable for datasets with high read depth [12]. SAMtools, while less sensitive, offers high PPV, indicating its reliability in confirming true INDELs [11]. Dindel, which focuses on realignment around potential INDEL sites, provides a balance between sensitivity and specificity, making it a versatile tool for various datasets [11].

The presence of repetitive sequences and heterozygous INDELs are significant sources of error in INDEL detection, necessitating the use of algorithms that can accurately distinguish true variants from sequencing errors [11]. Studies have shown that combining results from multiple algorithms can improve the accuracy of INDEL detection, as each tool may capture different aspects of the variant landscape [12]. For instance, Pindel is particularly adept at identifying large deletions, which are often missed by other tools, highlighting the need for a tailored approach based on the specific characteristics of the dataset [12].

Copy Number Variant (CNV) Detection

CNVs represent a substantial portion of genomic variation and are implicated in numerous genetic disorders. Detecting CNVs from WES data is inherently challenging due to the short read lengths and the complexity of the genomic regions involved. Tools such as ExomeDepth, EXCAVATOR, and ClinCNV have been developed to address these challenges by leveraging read depth information and sophisticated statistical models [13, 14]. ExomeDepth, for example, uses a Bayesian framework to model read depth variations and identify CNVs with high sensitivity, significantly enhancing the diagnostic yield of WES in rare genetic diseases. EXCAVATOR employs a heterogeneous hidden Markov model to classify genomic regions into distinct copy number states, offering a robust solution for large-scale CNV detection projects [15].

The accuracy of CNV detection is heavily influenced by the normalization and segmentation steps, which aim to mitigate biases introduced during library preparation and sequencing. Tools like ClinCNV implement multi-sample normalization techniques and novel algorithms that combine circular binary segmentation with hidden Markov models, providing a comprehensive approach to CNV calling in clinical and research settings. The choice of reference sample set is also critical, as it directly impacts the signal-to-noise ratio and the subsequent detection accuracy. Methods such as k-means clustering for reference sample selection have been shown to enhance the precision of CNV detection without compromising sensitivity.

Integration and Clinical Applications

The integration of SNV, INDEL, and CNV detection tools is essential for a comprehensive analysis of WES data. The combined use of these tools allows for the identification of complex genotypes, including compound heterozygosity and true homozygosity, which are crucial for accurate genetic diagnosis and the development of personalized treatment strategies. In clinical oncology, the identification of pharmacologically targetable mutations through WES can inform treatment decisions, such as the use of PARP inhibitors in homologous recombination-deficient tumors or immunotherapy in tumors with high mutation burden [16].

Despite the advancements in variant detection algorithms, challenges remain, particularly in achieving consensus on reference standards and minimal application requirements for clinical use [10]. The maturation of next-generation sequencing technologies, coupled with the development of FDA-approved methods for cancer screening and diagnostics, is paving the way for the routine clinical application of WES [10]. However, the continued evolution of bioinformatics tools and the refinement of variant detection methodologies are essential to fully realize the potential of WES in personalized medicine and genetic research.

Challenges and Solutions in Variant Interpretation and Annotation

Whole Exome Sequencing (WES) has emerged as a pivotal tool in genomics, enabling the detailed examination of the protein-coding regions of the genome. Despite its potential, the interpretation and annotation of variants identified through WES present significant challenges. These challenges are rooted in the complexity of genomic data, the limitations of current computational tools, and the need for accurate clinical translation. This section delves into the intricacies of these challenges and explores potential solutions, drawing on recent advancements and methodologies.

Complexity of Genomic Variants

The human genome is a mosaic of genetic variations, ranging from single nucleotide polymorphisms (SNPs) to complex structural variants (SVs). Each type of variant poses unique challenges in interpretation and annotation. SNPs and small insertions or deletions (indels) are relatively straightforward to identify and annotate, thanks to well-established databases and tools. However, structural variants, which include large deletions, duplications, inversions, and translocations, are notoriously difficult to characterize due to their size and complexity [17].

The challenge lies in the fact that existing DNA re-sequencing tools exhibit a bias against the recovery of these complex variants. This bias stems from the limitations of short-read sequencing technologies, which struggle to span large or repetitive regions of the genome. Consequently, many structural variants remain undetected or poorly characterized, hindering our understanding of their role in disease [17].

Methodological Limitations

The interpretation of genomic variants requires sophisticated computational tools capable of integrating vast amounts of data. Traditional gene-wise selection methods, which rely on univariate analyses, fall short in this regard. They often fail to account for the correlational, structural, or functional relationships among genes, leading to incomplete or inaccurate interpretations.

To address these limitations, novel approaches such as the integrative Bayesian variable selection (iBVS) framework have been developed. This framework allows for the simultaneous identification of causal genes and regulatory pathways by incorporating prior knowledge of gene-gene interactions and functional relationships. By targeting the joint effects of multiple genes and pathways, iBVS provides a more holistic view of the genomic landscape, facilitating more accurate predictions of disease status or phenotype.

Biological Mechanisms and Context

Understanding the biological significance of a variant is crucial for its clinical interpretation. This requires not only the identification of the variant itself but also an understanding of its impact on gene function and its potential role in disease. The annotation of variants with information such as ancestral state, formation mechanism, and functional impact is essential for this purpose [17].

One promising approach to achieving this is the use of de novo genome assemblies, which provide a more comprehensive view of the genome structure. By capturing a wide spectrum of structural variants and novel sequences, de novo assemblies facilitate the construction of population-scale pan-genomes. This, in turn, enhances our ability to annotate variants with biologically relevant information, improving our understanding of their role in health and disease [17].

Clinical Translation and Decision-Making

The ultimate goal of variant interpretation and annotation is to translate genomic data into actionable clinical insights. However, this process is fraught with challenges, particularly in the context of precision oncology. The sheer volume of genomic data, coupled with the complexity of cancer biology, makes it difficult for clinicians to make informed decisions based on genomic findings.

To address this challenge, support systems such as the Cancer Core Europe Molecular Tumor Board Portal have been developed. These systems provide a platform for the integration and interpretation of genomic data, facilitating clinical decision-making in precision oncology. By leveraging computational tools and expert knowledge, these platforms help clinicians navigate the complexities of genomic data, enabling more personalized and effective treatment strategies.

Integrating Knowledge-Based Priors

Another critical aspect of variant interpretation is the integration of knowledge-based priors. These priors, which are derived from existing biological knowledge, provide a valuable context for interpreting genomic data. The iBVS framework, for example, incorporates a novel partial least squares (PLS) g-prior, which allows for the inclusion of prior knowledge on gene-gene interactions and functional relationships.

By integrating these priors, researchers can enhance the accuracy and reliability of their interpretations, providing a more nuanced understanding of the genomic landscape. This approach not only improves the identification of molecular biomarkers but also facilitates the development of more targeted and effective therapeutic strategies.

Conclusion

The interpretation and annotation of variants in whole exome sequencing are complex processes that require a multifaceted approach. By addressing the challenges posed by genomic complexity, methodological limitations, and clinical translation, researchers can unlock the full potential of WES. Advances in computational tools, such as de novo genome assemblies and integrative Bayesian frameworks, offer promising solutions, enhancing our ability to interpret and annotate genomic variants accurately. As we continue to refine these methodologies and integrate knowledge-based priors, we move closer to realizing the promise of personalized medicine, where genomic data informs every aspect of patient care.

Clinical and Research Applications of Variant Calling in Whole Exome Sequencing

Whole Exome Sequencing (WES) is a transformative technology that has significantly advanced our understanding of the human genome, particularly in the context of clinical and research applications. The ability to sequence all protein-coding regions of the genome, which constitute approximately 1-2% of the entire genome but harbor the majority of known disease-related variants, has opened new avenues for diagnosing genetic disorders, understanding disease mechanisms, and developing targeted therapies. Variant calling, the process of identifying variants from sequence data, is a critical step in WES and has diverse applications in both clinical and research settings.

Methodologies in Variant Calling

Variant calling in WES involves several computational steps, including read alignment, variant detection, and annotation. The choice of algorithms and tools can significantly influence the accuracy and reliability of variant calling. Commonly used tools include GATK HaplotypeCaller, FreeBayes, and SAMtools, each with unique strengths and limitations [18]. For instance, GATK's HaplotypeCaller is renowned for its ability to model haplotypes and detect variants with high sensitivity, while FreeBayes is appreciated for its ability to call variants in pooled samples or populations [18].

Recent advancements have seen the integration of machine learning techniques into variant calling pipelines. Tools like Permutect leverage deep learning to enhance the detection of technical artifacts, thereby improving the precision and recall of variant calls. These innovations address the challenges posed by the high error rates and complex noise patterns inherent in sequencing data, particularly in tumor samples where distinguishing between somatic and germline variants is crucial.

Biological Mechanisms and Context

The biological relevance of variant calling in WES is underscored by its ability to uncover genetic variants associated with a wide range of diseases. In oncology, for instance, WES enables the identification of somatic mutations that drive cancer progression and informs the development of targeted therapies. The identification of homologous recombination deficiency, which can predict responsiveness to PARP inhibitors, exemplifies the clinical utility of variant calling in cancer treatment [19, 20]. Furthermore, the determination of microsatellite instability and tumor mutation burden through WES data can guide immunotherapy decisions, highlighting the role of variant calling in precision medicine [21].

In the context of Mendelian disorders, variant calling facilitates the discovery of pathogenic variants responsible for rare genetic diseases. The application of WES in diagnosing conditions like Charcot-Marie-Tooth disease has demonstrated the power of this technology to identify novel causative variants, thereby expanding our understanding of disease etiology and inheritance patterns [22]. The ability to detect both inherited and de novo variants is particularly valuable in the diagnosis of pediatric genetic disorders, where early and accurate diagnosis can significantly impact clinical management and outcomes [23].

Clinical Applications

The clinical applications of variant calling in WES are vast and continually expanding. One of the most promising areas is preimplantation genetic testing (PGT), where WES-based approaches have shown high accuracy in detecting monogenic disorders, aneuploidy, and structural rearrangements in embryos. This comprehensive method allows for the simultaneous detection of inherited conditions and chromosomal abnormalities, providing a robust tool for assessing embryo viability and genetic health. The integration of custom bioinformatics pipelines, aligned with best practices such as those outlined by the Genome Analysis Toolkit (GATK), enhances the precision of variant calling in this context, enabling the detection of subtle genomic alterations that might be missed by conventional methods.

In oncology, the clinical utility of WES is exemplified by its application in cancer diagnostics and treatment planning. The identification of actionable mutations through variant calling can inform targeted therapy decisions, while the detection of copy number variations (CNVs) and structural variants provides insights into tumor biology and potential resistance mechanisms [20]. The maturation of next-generation sequencing technologies, coupled with FDA-approved methods for cancer screening, underscores the readiness of WES for routine clinical use [21].

Research Applications

In research settings, variant calling in WES is instrumental in advancing our understanding of genetic diversity and disease mechanisms. The ability to analyze large-scale WES datasets enables researchers to identify novel genetic variants associated with complex traits and diseases. This is particularly relevant in the study of rare diseases, where WES has facilitated the discovery of new disease genes and pathogenic mechanisms [19]. The integration of variant calling with other genomic data types, such as transcriptomics and epigenomics, further enhances the ability to dissect the molecular underpinnings of disease [24].

Moreover, the development of tools like WEScover, which assesses the coverage of exons in WES datasets, addresses the challenge of false negatives due to incomplete coverage [25, 26]. This tool provides researchers with the ability to evaluate the reliability of WES data for specific genes and conditions, thereby informing study design and ensuring the robustness of research findings [27].

Challenges and Future Directions

Despite the significant advancements in variant calling for WES, several challenges remain. The variability in coverage across exonic regions can lead to false negatives, particularly for genes with clinical significance [27]. The development of hybrid approaches that integrate WES with targeted gene panel testing offers a potential solution to this issue, ensuring comprehensive coverage of clinically relevant genes [25, 26].

Additionally, the lack of consensus regarding reference standards and minimal application requirements for WES in clinical settings poses a barrier to its widespread adoption [21]. Efforts to standardize variant calling methodologies and establish robust quality control measures are essential to ensure the accuracy and reproducibility of WES data.

Looking forward, the integration of multi-omics data and the application of advanced computational techniques, such as deep learning and artificial intelligence, hold promise for further enhancing the capabilities of variant calling in WES. These innovations have the potential to improve the detection of complex variants, refine the annotation of genetic variants, and ultimately, advance our understanding of the genetic basis of disease.

In conclusion, variant calling in whole exome sequencing is a powerful tool with diverse applications in clinical and research settings. Its ability to provide detailed insights into the genetic architecture of diseases has transformed diagnostics, informed therapeutic decisions, and propelled research into new frontiers. As technologies continue to evolve, the potential of WES and variant calling to revolutionize precision medicine and genomic research remains immense.

References

[1] Lake Louise Mutation Detection Meeting 2013: Clinical Translation of Next‐Generation Sequencing Requires Optimization of Workflows and Interpretation of Variants. DOI: 10.1002/humu.22480

[2] Recent advances and challenges of rare variant association analysis in the biobank sequencing era. DOI: 10.3389/fgene.2022.1014947

[3] A Clarion Call for Proteomics.. DOI: 10.1016/j.cels.2016.03.005

[4] Detection of Genetic Variations in Children with Tetralogy of Fallot Using Whole Exome Sequencing Technology Integrated Bioinformatics Analysis. DOI: 10.1089/gtmb.2024.0350

[5] Pathophysiologic effects of CHCHD2 variants associated with late‐onset Parkinson disease. DOI: 10.1002/humu.23264

[6] Abstract 1913: Quality control workflows developed for the NCI Patient-Derived Models Repository using low pass whole genome sequencing and whole exome sequencing. DOI: 10.1158/1538-7445.am2022-1913

[7] An efficient and scalable analysis framework for variant extraction and refinement from population-scale DNA sequence data. DOI: 10.1101/gr.176552.114

[8] A beginners guide to SNP calling from high-throughput DNA-sequencing data. DOI: 10.1007/s00439-012-1213-z

[9] A community-based resource for automatic exome variant-calling and annotation in Mendelian disorders. DOI: 10.1186/1471-2164-15-S3-S5

[10] Comprehensive Outline of Whole Exome Sequencing Data Analysis Tools Available in Clinical Oncology. DOI: 10.3390/cancers11111725

[11] Optimized detection of insertions/deletions (INDELs) in whole-exome sequencing data. DOI: 10.1371/journal.pone.0182272

[12] Comparison of insertion/deletion calling algorithms on human next-generation sequencing data. DOI: 10.1186/1756-0500-7-864

[13] An efficient and tunable parameter to improve variant calling for whole genome and exome sequencing data. DOI: 10.1007/s13258-017-0608-6

[14] Comprehensive whole genome sequencing of a three generation pedigree : genetic components of a new syndrome with Severe Developmental Delay and Dysmorphic Features. DOI: No DOI

[15] EXCAVATOR: detecting copy number variants from whole-exome sequencing data. DOI: 10.1186/gb-2013-14-10-r120

[16] Benchmarking germline CNV calling tools from exome sequencing data. DOI: 10.1038/s41598-021-93878-2

[17] Discovery, genotyping and characterization of structural variation and novel sequence at single nucleotide resolution from de novo genome assemblies on a population scale. DOI: 10.1186/s13742-015-0103-4

[18] VariantMetaCaller: automated fusion of variant calling pipelines for quantitative, precision-based filtering. DOI: 10.1186/s12864-015-2050-y

[19] Whole Exome Sequencing Identifies Novel Genetic Alterations in Patients with Pheochromocytoma/Paraganglioma. DOI: 10.3803/EnM.2020.756

[20] WEScover: whole exome sequencing vs. gene panel testing. DOI: 10.1101/367607

[21] Comprehensive Outline of Whole Exome Sequencing Data Analysis Tools Available in Clinical Oncology. DOI: 10.3390/cancers11111725

[22] Application of variant‐calling algorithms for Mendelian disorders: lessons from whole‐exome sequencing in Charcot-Marie-Tooth disease. DOI: 10.1111/cge.12281

[23] T-Rex: Standardized Analysis of Germline Variants in Whole-Exome Sequencing Trios. DOI: 10.64898/2026.03.30.715083

[24] The use of exome capture RNA-seq for highly degraded RNA with application to clinical cancer sequencing. DOI: 10.1101/gr.189621.115

[25] Whole genome sequencing and its applications in medical genetics. DOI: 10.1007/s40484-016-0067-0

[26] TOSCA: an automated Tumor Only Somatic CAlling workflow for somatic mutation detection without matched normal samples. DOI: 10.1093/bioadv/vbac070

[27] WEScover: selection between clinical whole exome sequencing and gene panel testing. DOI: 10.1186/s12859-021-04178-5