Section: Clinical Methods & Interventions

The KEGG Database and Pathway Analysis: A Reference for Veterinary Systems Biology

Introduction

The Kyoto Encyclopedia of Genes and Genomes (KEGG) is an integrated bioinformatics resource that systematically captures molecular interaction networks, metabolic pathways, and functional orthologs across diverse organisms [1]. For veterinary researchers and diagnosticians, KEGG provides a critical framework for interpreting high-throughput omics data including transcriptomics, proteomics, and metabolomics derived from domestic animals, poultry, fish, and wildlife. This article presents a detailed overview of the KEGG database structure, pathway representation, analytical methods, and specific applications in veterinary medicine.

Database Architecture and Data Types

KEGG consists of several sub-databases that are deeply interconnected. The core components include KEGG PATHWAY, KEGG GENES, KEGG ORTHOLOGY (KO), KEGG MODULE, and KEGG BRITE. Each sub-database serves a specific role in functional annotation and pathway mapping.

Table 1. Principal KEGG sub-databases and their contents.

Sub-database Content Primary Use
KEGG PATHWAY Manually drawn pathway maps (metabolic, signaling, disease) Visualizing molecular interactions
KEGG GENES Gene catalogs for complete genomes Sequence-based functional assignment
KEGG ORTHOLOGY (KO) Functional ortholog groups Cross-species functional inference
KEGG MODULE Functional units and complexes Higher-order functional analysis
KEGG BRITE Hierarchical functional classifications Ontology-based browsing

Pathway maps are graphical representations of molecular networks. Nodes represent genes, proteins, or metabolites; edges denote enzymatic reactions, protein-protein interactions, or regulatory relationships. Maps are color-coded for species-specific gene presence and expression changes.

KEGG Orthology and Functional Annotation

A central concept in KEGG is the KEGG Orthology (KO) system. KO groups contain manually curated orthologs from multiple species. Each KO identifier (e.g., K00161 for pyruvate dehydrogenase E1 component) links to a specific functional role. When a user submits a set of gene identifiers, KEGG's KAAS (KEGG Automatic Annotation Server) assigns KO numbers based on sequence similarity [2]. This enables functional annotation for non-model organisms that are common in veterinary research, such as Gallus gallus, Sus scrofa, and Bos taurus.

Pathway Maps and Their Interpretation

KEGG pathway maps are classified into six major categories: Metabolism, Genetic Information Processing, Environmental Information Processing, Cellular Processes, Organismal Systems, and Human Diseases (with veterinary-relevant disease maps available for specific pathogens). For veterinary applications, the Metabolic pathways (e.g., glycolysis, citric acid cycle, amino acid metabolism) and Organismal Systems pathways (e.g., immune system, digestive system) are frequently used.

Each map is a static image overlaid with clickable KO nodes. Users can upload expression data to color-code nodes according to upregulation, downregulation, or no change. This visual overlay facilitates rapid identification of altered pathway components in diseases such as Escherichia coli colibacillosis in poultry, Clostridium perfringens necrotic enteritis, or metabolic disturbances in bovine ketosis.

Pathway Analysis Methods

Pathway analysis comprises two broad approaches: over-representation analysis (ORA) and functional class scoring (FCS). ORA tests whether a predefined set of genes (e.g., differentially expressed genes from a microarray experiment) is enriched in a given KEGG pathway compared to background. The hypergeometric test or Fisher's exact test is typically applied. FCS methods, such as gene set enrichment analysis (GSEA), consider the entire ranked list of genes and assess whether members of a pathway cluster at the top or bottom of the list.

For veterinary omics studies, ORA is straightforward when the study design yields a clear list of differentially expressed genes. However, FCS is more powerful for detecting coordinated but subtle expression changes across a pathway. Both methods require correction for multiple testing, often using the false discovery rate (FDR) approach.

A typical workflow for KEGG-based pathway analysis is illustrated in Figure 1.

flowchart TD
    A[High-throughput data: RNA-seq, proteomics, metabolomics], > B[Data preprocessing and normalization]
    B, > C[Differential expression analysis or metabolite quantification]
    C, > D[Gene/Metabolite list with identifiers]
    D, > E[KO assignment via KAAS]
    E, > F[Mapping KO to KEGG pathways]
    F, > G{Pathway analysis method}
    G, > H[Over-representation analysis]
    G, > I[Functional class scoring]
    H, > J[Hypergeometric test / Fisher's exact test]
    I, > K[Gene set enrichment analysis]
    J, > L[Multiple test correction (FDR)]
    K, > L
    L, > M[Significant pathway identification]
    M, > N[Biological interpretation and hypothesis generation]

Figure 1. Conceptual workflow for KEGG pathway analysis in veterinary systems biology.

Veterinary Applications

Infectious Disease Research

KEGG pathway analysis aids in understanding host-pathogen interactions. For example, in bovine mastitis caused by Staphylococcus aureus, transcriptomic data from infected mammary tissue can be mapped to KEGG pathways. Upregulation of toll-like receptor signaling, NF-kappa B, and cytokine-cytokine receptor interaction pathways is commonly observed. Similarly, in necrotic enteritis in broiler chickens, KEGG analysis of ileal transcriptomes reveals perturbations in tight junction signaling and amino acid metabolism.

Metabolic Disease and Nutritional Studies

KEGG is instrumental for studying metabolic disorders such as canine pancreatitis. By aligning serum metabolomic profiles to KEGG metabolic maps, researchers can pinpoint disruptions in lipid metabolism, oxidative stress pathways, and pancreatic enzyme regulation. In ruminants, KEGG analysis has been applied to investigate hepatic gluconeogenesis and ketogenesis during negative energy balance, offering insights into disease prevention.

Comparative Genomics and Antimicrobial Resistance

KEGG orthology allows cross-species comparison of drug targets and resistance mechanisms. In livestock-associated Staphylococcus aureus (see antimicrobial resistance in livestock-associated Staphylococcus aureus), KEGG mapping of resistance genes to beta-lactam degradation pathways and efflux pump modules helps characterize resistance profiles. The KEGG DRUG database further links chemical compounds with target pathways.

Limitations and Considerations

KEGG is primarily curated for model organisms, which may limit direct applicability to some veterinary species. However, the KO system partially overcomes this through orthology. Another limitation is that pathway representations are static and may not capture dynamic regulatory changes such as post-translational modifications. Additionally, KEGG maps are biased toward well-studied metabolic and signaling pathways, leaving less coverage for species-specific processes (e.g., avian-specific immune responses). Despite these caveats, KEGG remains the most widely used pathway resource for veterinary bioinformatics.

Integration with Other Resources

KEGG is often used in conjunction with other databases. For instance, the European Bioinformatics Institute (EMBL-EBI) provides complementary resources such as Reactome and Ensembl. Gene Ontology (GO) enrichment provides functional annotation at a different granularity. Combining KEGG pathway analysis with network theory and flux balance analysis enables a more comprehensive systems-level understanding.

Conclusion

The KEGG database offers an indispensable platform for pathway analysis in veterinary molecular research. Its structured orthology system, manually curated pathway maps, and computational tools support a wide range of applications from infectious disease to nutritional metabolism. As veterinary omics data continue to grow, KEGG will remain a cornerstone for functional interpretation and hypothesis generation in animal health.

References

[1] Kanehisa M, Goto S. KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucleic Acids Res. 2000;28(1):27-30.

[2] Moriya Y, Itoh M, Okuda S, Yoshizawa AC, Kanehisa M. KAAS: an automatic genome annotation and pathway reconstruction server. Nucleic Acids Res. 2007;35(Web Server issue):W182-W185.

[3] Kanehisa M, Furumichi M, Sato Y, Ishiguro-Watanabe M, Tanabe M. KEGG: integrating viruses and cellular organisms. Nucleic Acids Res. 2021;49(D1):D545-D551.

[4] Subramanian A, Tamayo P, Mootha VK, et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci U S A. 2005;102(43):15545-15550.

[5] Khatri P, Sirota M, Butte AJ. Ten years of pathway analysis: current approaches and outstanding challenges. PLoS Comput Biol. 2012;8(2):e1002375.