Section: Computational Biology

Ethical Considerations in Computational Genomics: Frameworks for Veterinary Bioinformatics

Introduction

The integration of high-throughput sequencing and computational analysis into veterinary medicine has generated unprecedented volumes of genomic data. These datasets, derived from livestock, companion animals, and wildlife, are used for pathogen surveillance, host genetics, antimicrobial resistance profiling, and population management. The scale and sensitivity of these data introduce ethical obligations that extend beyond traditional research oversight. Computational genomics in veterinary contexts must address data privacy, informed consent for animal-derived samples, equitable representation in reference cohorts, and the security of genomic information during storage and transmission.

This article examines the ethical dimensions of computational genomics with a focus on veterinary applications. It draws on recent developments in privacy-preserving computation, stakeholder governance, and cohort stratification to outline a framework for responsible data stewardship.

Data Privacy and Confidentiality in Veterinary Genomic Databases

Genomic data from animals can indirectly reveal information about human handlers, breeders, and owners. For example, genomic sequences of livestock pathogens may be linked to specific farms or geographic regions, creating risks of economic reprisal or stigmatization. Similarly, genomic data from companion animals may be traceable to individual owners through breed registries or veterinary records.

The principle of data minimization applies: only the genomic features necessary for a specific diagnostic or research question should be extracted and retained. Whole-genome sequencing of pathogens or hosts should be followed by selective analysis of target loci, with raw data discarded or encrypted after analysis. This approach reduces the surface area for potential re-identification.

Secure computation methods are emerging as tools to address these concerns. Provatas et al. described KmerCrypt, a framework that uses homomorphic encryption to enable private k-mer search across genomic databases without revealing the query sequence or the database contents [1]. In a veterinary context, such a system would allow a diagnostic laboratory to query a central pathogen genome database for a match to a field isolate without exposing the farm of origin or the submitting veterinarian. The computational overhead of homomorphic encryption remains substantial, but advances in algorithmic efficiency are making these methods practical for routine use.

Informed Consent and Sample Governance

Informed consent in veterinary genomics is complicated by the fact that the research subject is an animal, while the data may have implications for human stakeholders. For samples collected from production animals, consent is typically obtained from the owner or herd manager. However, the scope of consent must be clearly defined. A sample collected for diagnostic testing should not be automatically repurposed for genomic research without explicit authorization.

The concept of dynamic consent, where stakeholders can specify permissible uses of their data over time, is gaining traction in human genomics and is adaptable to veterinary contexts. For example, a poultry producer might consent to the use of flock samples for avian influenza surveillance but not for studies on growth traits that could be used for commercial breeding selection. Governance frameworks must include mechanisms for withdrawal of consent and data deletion.

Adebamowo et al. conducted a qualitative study on stakeholder knowledge and recommendations regarding ethical oversight of data science health research [2]. Their findings emphasized the need for transparent communication about data use, storage, and sharing. In veterinary settings, analogous stakeholder groups include veterinarians, producers, breed associations, and wildlife managers. Each group has distinct expectations regarding data confidentiality and benefit sharing.

Algorithmic Bias and Cohort Representation

Computational models trained on genomic data can perpetuate or amplify biases present in the training cohorts. In veterinary genomics, this issue arises when reference populations are skewed toward certain breeds, geographic regions, or production systems. A model trained primarily on Holstein-Friesian cattle may perform poorly when applied to indigenous breeds, leading to misclassification of disease risk or antimicrobial resistance markers.

Mohsen et al. proposed a dynamic clustering approach for genomics cohorts that moves beyond static categories of race, ethnicity, and ancestry [3]. Their method uses unsupervised learning to identify population substructures based on genetic similarity rather than predefined labels. This approach is directly transferable to veterinary populations. For example, instead of grouping animals by breed name, which may be imprecise or culturally loaded, dynamic clustering can reveal true genetic relationships. This improves the accuracy of genomic predictions and reduces the risk of bias against underrepresented populations.

In the context of pathogen genomics, biased sampling can lead to incomplete surveillance. If sequencing efforts focus on high-production regions while neglecting smallholder farms or wildlife reservoirs, the resulting genomic databases will not reflect the true diversity of circulating pathogens. This has ethical implications for disease control, as interventions based on incomplete data may fail in underserved populations.

Security of Genomic Data in Transit and at Rest

Genomic data are sensitive not only because of their content but also because of their permanence. Unlike clinical test results that may become obsolete, a genome sequence is a stable identifier that can be linked to an individual or population indefinitely. Breaches of genomic databases can have long-term consequences for animal owners, breeders, and conservation programs.

Encryption standards for genomic data should match or exceed those used for human health information. Data in transit between sequencing facilities, computational servers, and end users should be protected by transport layer security protocols. Data at rest should be encrypted using authenticated encryption schemes. Access controls must be granular, with separate permissions for viewing summary statistics, querying raw sequences, and downloading full datasets.

The KmerCrypt framework [1] exemplifies a privacy-by-design approach where computation is performed on encrypted data. This eliminates the need to decrypt data for analysis, reducing the risk of exposure during processing. Veterinary bioinformatics platforms should consider adopting such architectures, particularly for multi-institutional databases that aggregate samples from numerous farms or clinics.

Ethical Oversight Structures

Traditional institutional animal care and use committees (IACUCs) are not always equipped to evaluate the ethical dimensions of computational genomics. Issues such as data sharing agreements, secondary use of archived samples, and algorithmic fairness fall outside the typical scope of IACUC review. Dedicated data ethics committees or bioinformatics review boards should be established within veterinary research institutions.

These committees should include members with expertise in genomics, data security, law, and stakeholder representation. Their responsibilities include reviewing protocols for data collection, storage, and sharing; evaluating the potential for group harm (e.g., stigmatization of a breed or region); and ensuring that computational methods are validated across diverse populations.

The study by Adebamowo et al. [2] highlighted that stakeholders often lack awareness of how their data are used in data science research. Educational initiatives targeting veterinarians, producers, and animal owners are necessary to build trust and enable informed participation.

A Decision Framework for Ethical Computational Genomics

The following decision tree summarizes key ethical checkpoints in a veterinary computational genomics workflow.

flowchart TD
    A[Sample Collection] --> B{Consent for Genomics?}
    B -->|Yes| C[Data Minimization]
    B -->|No| D[Diagnostic Use Only]
    C --> E{Data Storage}
    E -->|Local| F[Encrypted Database]
    E -->|Cloud| G[Encrypted Transfer & Access Control]
    F --> H{Analysis Type}
    G --> H
    H --> I[Population-Level Query]
    H --> J[Individual-Level Query]
    I --> K{Bias Check}
    K -->|Representative Cohort| L[Proceed with Analysis]
    K -->|Underrepresented Groups| M[Stratify or Exclude]
    J --> N{Re-identification Risk}
    N -->|Low| O[Return Results]
    N -->|High| P[Anonymize or Aggregate]
    L --> Q[Secure Output]
    M --> Q
    O --> Q
    P --> Q
    Q --> R[Stakeholder Feedback]

This framework emphasizes that ethical considerations are not a single checkpoint but an ongoing process integrated into each stage of data handling.

Implications for Pathogen Surveillance and One Health

Computational genomics is central to One Health surveillance, where pathogen sequences from animals, humans, and the environment are compared to track transmission and emergence. Ethical challenges in this domain include the sharing of animal pathogen data across national borders and the potential for attribution of outbreaks to specific farms or regions.

For example, genomic surveillance of highly pathogenic avian influenza (H5N1) in poultry and wild birds relies on open-access sequence databases. While openness facilitates rapid response, it also exposes producers to economic risk if their flocks are identified as sources of a novel variant. Anonymization of geographic metadata and the use of privacy-preserving query systems, such as those enabled by homomorphic encryption [1], can mitigate these risks while preserving the utility of the data for public health.

Similarly, genomic epidemiology of livestock-associated Staphylococcus aureus and antimicrobial resistance markers requires careful handling of farm-level data. The ethical obligation to protect producer confidentiality must be balanced against the need for transparency in food safety and zoonotic risk communication.

Conclusion

Ethical considerations in computational genomics are not peripheral to the science; they are integral to the responsible conduct of research and diagnostics. Veterinary bioinformatics must adopt frameworks that address data privacy, informed consent, algorithmic bias, and data security. Advances in homomorphic encryption, dynamic cohort clustering, and stakeholder governance provide practical tools for implementing these frameworks. As genomic technologies become more embedded in veterinary practice, the ethical infrastructure must evolve in parallel to maintain trust and ensure equitable benefit across all animal populations.

References

[1] Provatas K, Mouratidis I, Georgakopoulos-Soares I. KmerCrypt: private k-mer search with homomorphic encryption. Brief Bioinform. 2025. URL: https://pubmed.ncbi.nlm.nih.gov/41359546/

[2] Adebamowo C, Akintola A, Maduka OC, et al. Knowledge and Recommendations of Stakeholders Regarding Ethical Oversight of Data Science Health Research: Protocol for a Qualitative Study. JMIR Res Protoc. 2025. URL: https://pubmed.ncbi.nlm.nih.gov/41130595/

[3] Mohsen H, Blenman K, Emani PS, et al. Dynamic clustering of genomics cohorts beyond race, ethnicity and ancestry. BMC Med Genomics. 2025. URL: https://pubmed.ncbi.nlm.nih.gov/40375077/ *** Disclaimer: This article is for educational and informational purposes only. It is not intended to substitute for professional veterinary advice, diagnosis, treatment, or regulatory guidance. Always consult a licensed veterinarian or qualified specialist regarding animal health, disease diagnosis, and therapeutic decisions.