AI and the Future of Biology, Genomics, and Medicine
Executive Summary
Artificial intelligence is no longer merely an auxiliary tool in the life sciences. It has become a new layer of scientific infrastructure, serving as a way to compress vast biological datasets into usable representations, generate hypotheses at machine speed, prioritize experiments, and increasingly connect laboratory discovery to clinical decision-making.
The sharpest advances have come where biology can be rendered as machine-readable sequences, structures, images, or longitudinal records: DNA and RNA as strings, proteins as three-dimensional objects, cells as high-dimensional expression profiles, and patient histories as multimodal timelines. This is why the last five years have felt discontinuous. Systems such as AlphaFold transformed structural biology, genomic foundation models began to treat the genome as a language, single-cell models started learning cellular state spaces from millions of cells, and clinical AI moved from narrow prediction models to multimodal copilots, ambient documentation, and increasingly sophisticated diagnostic dialogue systems.
The most important conclusion is that AI's impact is real but uneven. In research, AI has already changed what is feasible: protein-structure prediction at proteome scale, variant-effect prioritization, de novo molecular design, automated image interpretation, and increasingly autonomous design-build-test-learn loops in biofoundries. In medicine, the nearer-term successes are concentrated in bounded tasks with clear inputs and outputs, including imaging triage, digital pathology scoring, diabetic retinopathy screening, documentation support, and retrospective risk prediction.
The farther a use case moves toward open-ended diagnosis, treatment planning, or autonomous clinical action, the more the limitations become decisive: data shift, spurious correlations, hidden confounding, weak calibration, poor external validation, workflow mismatch, liability ambiguity, and the persistent problem that excellent benchmark performance is not the same thing as demonstrable clinical value.
Over the next three to five years, the likeliest outcome is not "AI doctors" or fully autonomous biology, but a dense layer of specialized copilots: models that read scans, triage pathology, draft notes, prioritize drug leads, interpret variants, annotate single cells, and help design and monitor trials. Over the next ten to twenty years, a more profound transformation is plausible if several conditions are met: stronger multimodal data governance, routine prospective validation, better causal and mechanistic grounding, robust post-deployment monitoring, and credible biosecurity controls.
If those conditions are met, AI could help move biology from retrospective description toward generative design, and medicine from population averages toward genuinely individualized intervention. If they are not met, the likely future is a fragmented patchwork of impressive demos, local efficiencies, and recurring failures of trust.
From Expert Systems to Foundation Models
The history of AI in biology and medicine is often told as a recent story, but its roots run back to the era of expert systems. In the 1970s and 1980s, systems such as MYCIN represented medical reasoning as human-authored rules. They were important not because they solved medicine, but because they established a lasting ambition: that parts of diagnosis and treatment might be formalized, encoded, and scaled. What those systems lacked was not logic but data. Biology was still sparse, expensive, and slow.
The modern era became possible only when genomics, imaging, and electronic records turned living systems into large computational corpora. The Human Genome Project and the subsequent collapse in sequencing costs changed the information base of biology; later, deep learning changed the methods used to learn from it.
That shift accelerated in the 2010s. Convolutional neural networks made medical imaging a natural early domain for progress, while growing biobanks and health-record datasets supported supervised learning for risk prediction and phenotyping. In parallel, the transformer architecture created a more general recipe for modeling biological sequences and longitudinal records. By 2020, EHR transformers such as BEHRT were treating diagnoses, medications, and visits as time-ordered tokens; around the same period, international work in mammography showed that AI could match or exceed expert readers in retrospective breast-cancer prediction studies.
Then came the structural-biology break. In 2021, AlphaFold showed that a deep-learning system could predict protein structures with striking accuracy, and the public AlphaFold Protein Structure Database rapidly expanded access to predicted structures for essentially the known protein universe. In 2023, AlphaMissense extended the paradigm to missense-variant interpretation. In 2024, AlphaFold 3 widened the target from protein folding to biomolecular interactions involving proteins, DNA, RNA, ligands, and ions, using a diffusion-based approach to predict complexes rather than isolated chains. In 2025 and 2026, the same logic spread outward: genomic foundation models such as Nucleotide Transformer and Evo 2 aimed to learn regularities across genomes at unprecedented scale, while regulators began to publish explicit guidance for AI-enabled medical devices and AI in drug development.
Key Milestones
| Year | Milestone |
|---|---|
| 1970s | Expert-system era in medicine, including MYCIN |
| 2001 | Human genome sequence era; large-scale genomics becomes a computational substrate |
| 2018 | DeepVariant and the first FDA-authorized autonomous AI diagnostic device (IDx-DR) |
| 2020 | Transformer models move into EHRs; large retrospective imaging studies show expert-level performance in mammography |
| 2021 | AlphaFold 2 transforms protein-structure prediction |
| 2023 | AlphaMissense extends AI to missense-variant interpretation |
| 2024 | AlphaFold 3 predicts biomolecular interactions; FDA AI-device governance expands |
| 2025 | Rentosertib phase 2a marks a major clinical milestone for AI-discovered therapeutics |
| 2026 | Evo 2 is published; FDA and EMA issue joint good-AI-practice principles for drug development |
This arc reveals a pattern. Each leap came not from "general intelligence" in the abstract, but from the combination of three ingredients: large machine-readable corpora, architectures suited to the data's structure, and benchmark tasks that were ambitious enough to matter but concrete enough to evaluate. Biology now has those ingredients in abundance, which is why AI is beginning to feel less like a tool attached to biology and more like one of biology's organizing methods.
How the Current Generation of Models Works
At a technical level, the modern landscape is easier to understand if one starts with the data rather than the algorithms. Biology offers several major data regimes:
- Genomics provides long symbolic sequences with sparse labels and long-range dependencies
- Structural biology adds geometry, contacts, and physical constraints
- Single-cell omics creates huge cell-by-feature matrices whose signal is noisy, sparse, and context-dependent
- Imaging contributes richly labeled but site-sensitive pixel data
- Clinical medicine adds longitudinal time series, free text, claims, labs, prescriptions, and administrative metadata
Modern AI systems differ largely in how they represent these modalities, how they pretrain on weakly labeled or unlabeled corpora, and how they adapt to downstream tasks.
Genomics and Proteomics as Language
For genomics and proteomics, the decisive idea has been that biological sequences can be modeled as language. A transformer does not "understand" biology in the human sense; it learns statistical regularities by predicting masked tokens, next tokens, or other self-supervised objectives. Yet in the genome, those regularities often correspond to useful biology: motifs, splice sites, regulatory syntax, enhancer to promoter relationships, or the effect of sequence variation. Models such as Enformer learn long-range regulatory dependencies, while foundation models such as Nucleotide Transformer and Evo 2 push toward general sequence representations that can be adapted across tasks.
Structural Biology and Molecular Design
For proteins and molecules, the representation problem is different. A protein is not merely a sequence; it is a folded object in space, often one that binds to other molecules. This is why structure-aware models, graph neural networks, equivariant networks, and diffusion models matter. AlphaFold's core contribution was to use multiple sequence alignments, structural templates, and learned geometric reasoning to infer likely three-dimensional structures. AlphaFold 3 extended this logic to molecular interactions. In molecular design, some systems operate on string encodings such as SMILES, others on graphs or three-dimensional conformers, and newer generative models use diffusion or multimodal latent spaces to propose candidate compounds with desired properties. The gain is not magical creativity; it is a more efficient traversal of a chemical design space too large to search by enumeration.
Single-Cell Foundation Models
Single-cell modeling sits between language and state-space modeling. A cell's transcriptome can be treated as a bag, an ordered token sequence, or a graph-structured object; the model's goal is often to learn embeddings that preserve cell identity, trajectory, perturbation response, or regulatory relationships. Single-cell foundation models typically use transformer architectures or related sequence models to learn latent representations across diverse cell types and modalities. In principle, this creates reusable "cellular coordinates" that can support annotation, perturbation prediction, and transfer learning across experiments and tissues. In practice, performance still depends heavily on batch effects, assay conventions, tissue coverage, and whether the model is evaluated on out-of-distribution biology rather than on nearby datasets.
Clinical AI: Beyond Architecture to Evaluation
In clinical medicine, the technical story is less about any single architecture than about evaluation and deployment. A model may discriminate well, meaning it ranks higher-risk patients above lower-risk ones, yet still be poorly calibrated, meaning its predicted probabilities are wrong. It may succeed on internal validation and fail on external validation because coding practices, patient mix, or workflow differ at the next hospital. It may perform well retrospectively and generate little or no clinical value prospectively because clinicians already catch the easy cases or because the alert arrives too late to change care.
This is why the most credible evaluation ladder in medical AI runs from internal validation to external validation, then to silent prospective testing, impact studies, and real-world post-market monitoring.
Where AI Is Already Changing Biology and Medicine
Genomics: Variant Calling and Beyond
The first great application area is genomics. Variant calling, once dominated by hand-tuned statistical pipelines, has been materially improved by machine learning; DeepVariant is the canonical example of a system that reframed sequencing reads as image-like inputs to a deep network. Beyond variant calling, foundation models are now being used to prioritize regulatory variants, predict gene expression from sequence, identify functional motifs, and infer pathogenicity. AI's value here is primarily triage: ranking, filtering, and contextualizing an overwhelming hypothesis space so that experiments and clinical review can focus where the posterior probability of relevance is highest.
Single-Cell Biology
Modern atlases contain tens of millions of cells, and the bottleneck is no longer raw measurement but interpretation. Foundation models for single-cell omics promise reusable representations that can label cell types, map trajectories, infer perturbation responses, and connect disease-associated states across tissues. That matters for immunology, oncology, developmental biology, and drug discovery, because many diseases are less about a single gene than about a shifted cellular ecosystem. The promise is a generative map of cell state; the challenge is that cellular phenotype is profoundly contingent on assay, tissue, species, and intervention.
Structural Biology and Protein Design
AlphaFold changed the default scientific workflow. Instead of asking whether a protein structure can be known, many researchers now begin with a predicted structure and ask what must still be validated experimentally. The AlphaFold database provides open access to over 200 million structure predictions, which has altered target selection, protein engineering, mechanistic reasoning, and the speed of literature interpretation. AlphaFold 3 goes further by predicting interactions among proteins and other biomolecules. Around that structural core, protein language models and design systems such as RFdiffusion and ESM-family models are expanding the design space toward antibodies, enzymes, and other therapeutically relevant biomolecules.
Drug Discovery
AI has clearly improved parts of the discovery stack: target nomination, hit finding, ADMET prediction, de novo generation, and multimodal data integration. Yet the field's central test is translation, not design elegance. One of the strongest recent case studies is rentosertib, a TNIK inhibitor for idiopathic pulmonary fibrosis discovered and designed using generative AI, whose phase 2a trial in 2025 provided an unusually concrete proof that an AI-enabled target-to-molecule pipeline can reach mid-stage human studies. At the same time, isolated successes should not be mistaken for a solved industrial pipeline; broader validation, mechanistic depth, manufacturing readiness, and regulatory alignment remain central constraints.
Diagnostics and Clinical AI
The 2018 authorization of IDx-DR marked the first FDA-cleared autonomous AI diagnostic system for diabetic retinopathy screening in primary care. Subsequent years have seen rapid proliferation of AI-enabled devices across radiology, cardiology, neurology, pathology, and other specialties. Digital pathology now includes models that score trial endpoints; in late 2025, the FDA qualified AIM-NASH, an AI tool for MASH trial pathology, as the first AI drug-development tool in its category. In imaging, the best evidence remains concentrated in specific workflows such as mammography and stroke triage, where retrospective and prospective evaluations can be clearly framed.
The larger lesson: narrow autonomy is easiest where the task is highly standardized, the data are structured, and the downstream action is constrained.
Clinical Language Models and Ambient AI
Systems such as GatorTron, Med-PaLM, and AMIE demonstrate that transformers and LLMs can reason over clinical language, structured records, or patient dialogue. Yet the current best-validated operational gain is not grand diagnostic replacement but the reduction of clerical burden. Ambient AI scribes have shown measurable improvements in clinician-reported burnout and documentation burden. In a 2025 multicenter quality-improvement study across six U.S. health systems, clinician burnout fell from 51.9% to 38.8% after 30 days of using the same ambient AI scribe platform.
The field is learning that workflow support may be the path by which general models create the most immediate value in care.
Synthetic Biology and Lab Automation
The classical design-build-test-learn cycle is being recomposed as a machine-learning pipeline in which models nominate constructs, robotic systems execute experiments, and the resulting data retrain the next generation of models. A key recent example is an AI-powered autonomous enzyme-engineering platform that integrated machine learning, large language models, and biofoundry automation. The long-run significance may not be any single robotic platform, but the emergence of a new style of science in which the unit of acceleration is not only prediction, but the rate of experimental iteration itself.
Tools and Datasets That Define the Present Landscape
Key Platforms and Companies
| Platform | Core Focus | Why It Matters Now |
|---|---|---|
| Google DeepMind / AlphaFold | Protein structure and biomolecular interaction prediction | Public database now contains over 200 million predictions; AlphaFold 3 extends to multimolecular interactions |
| Insilico Medicine | AI-driven drug discovery across target, molecule, and clinical stages | Rentosertib phase 2a trial provides one of the strongest clinical-stage proofs of an AI-native pipeline |
| Recursion Pharmaceuticals | Phenomics-first drug discovery using massive cell-imaging datasets | Demonstrates value of high-content imaging combined with large-scale screening |
| Tempus AI | Clinical data integration for oncology, cardiology, and research | Shows how commercial value is built on multimodal patient data |
| Arc Institute / Evo 2 | Genome-scale foundation modeling and biological design | Demonstrates the push from protein-scale toward all-domain genome-scale modeling |
Strategic Datasets
| Dataset or Resource | Data Modalities | Strategic Importance |
|---|---|---|
| UK Biobank | Genomics, imaging, EHR, wearables | Largest multimodal longitudinal cohort widely used in AI research |
| All of Us Research Program | EHR, surveys, wearables, genomics | Important U.S. precision-medicine resource with explicit diversity goals |
| The Cancer Genome Atlas | Tumor genomics and molecular profiles | Foundational for AI in oncology, biomarker discovery, and multimodal cancer prediction |
| MIMIC-IV | ICU EHR, notes, vitals, labs | Canonical open dataset for clinical sequence modeling and risk prediction |
| Human Cell Atlas | Single-cell and spatial omics | Critical for learning cross-tissue cellular state maps and benchmarking single-cell models |
| CZ CELLxGENE Census | Standardized single-cell expression data | Major substrate for single-cell foundation models and transfer learning |
| Protein Data Bank | Experimentally solved biomolecular structures | Core ground-truth resource for structure learning and evaluation |
| AlphaFold Protein Structure Database | Predicted protein structures at proteome scale | Turned structure prediction from a bottleneck into a default starting point |
These resources are strategic not only because they are large, but because they combine scale with governance, metadata, and community adoption. In practice, that combination is often more decisive than raw model size.
Ethics, Regulation, and Society
Bias and Fairness
The ethical difficulty of medical and biological AI is not that the models are powerful; it is that their power is mediated through institutions that already distribute risk unevenly. Bias in health AI rarely arrives as cartoonishly malicious output. More often it appears as differential triage intensity, different thresholds for testing, or systematically different recommendations for patients whose sociodemographic profiles differ while the medical facts remain the same. A 2025 Nature Medicine study on large language models found that sociodemographic cues could alter medical decision making.
Privacy and Consent
Health records are protected in the United States by HIPAA, and the NIH Genomic Data Sharing policy requires planning for broad and responsible sharing of large-scale genomic data. In Europe, the European Health Data Space is intended to create a harmonized infrastructure for primary and secondary use of health data, while the EU AI Act adds a cross-sector risk framework for high-risk systems. Yet patients may consent to research use of their data without anticipating that future foundation models will infer new traits, generate synthetic records, or be combined across previously siloed domains.
Explainability and Regulation
In life-science AI, the practical question is rarely whether a model can produce a mechanistic explanation in a philosophical sense. More often the relevant questions are: can users know the model's intended use, training domain, confidence, known failure modes, update policy, and post-deployment safeguards? The FDA's recent AI-enabled-device guidance emphasizes lifecycle management, documentation, and total-product-lifecycle risk management. For drug development, the FDA and the EMA jointly issued good-AI-practice principles in early 2026.
Workforce Implications
Current evidence points more strongly toward task reallocation than wholesale substitution. Each efficiency gain creates new supervisory burdens: reviewing model output, managing liability, handling patient consent, monitoring drift, and maintaining data governance. The skills premium will rise for people who can operate across domains: clinician-informaticians, computational biologists who understand assay design, and regulators who can read both software documentation and trial evidence.
Biosecurity
AI-assisted protein and genome design could speed vaccines, enzymes, and therapeutics, but the same capabilities can reduce barriers to designing harmful biological sequences. A 2025 Science study showed that AI-designed toxic proteins could exploit weaknesses in DNA-synthesis screening tools. OECD analysis has stressed that the convergence of synthetic biology, AI, and automation creates a governance gap. This is one reason the future of AI in biology cannot be governed only as "healthcare AI." It must also be governed as a dual-use technology.
Key Regulatory Frameworks
| Framework | Coverage | Practical Relevance |
|---|---|---|
| FDA AI-enabled device guidance and device list | Premarket submissions, lifecycle management, post-market expectations | Central U.S. framework for AI-enabled medical devices |
| EMA reflection paper and joint FDA–EMA good-AI-practice principles | AI across the medicinal-product lifecycle | Signals increasing expectations for AI in drug development |
| WHO guidance on ethics and governance of AI for health | Ethics, human oversight, equity, safety, accountability | Global normative frame for cross-border policy |
| EU AI Act | Cross-sector risk-based obligations for high-risk AI | Health applications often fall into the high-risk tier |
| European Health Data Space | Rules and infrastructure for primary and secondary health-data use | Relevant to European research access and data federation |
| NIST AI Risk Management Framework | Voluntary risk-management framework | Useful for health systems operationalizing trustworthy AI governance |
| HHS Section 1557 final rule | Nondiscrimination in health programs including AI-based decision support | Makes bias mitigation a civil-rights issue |
| NIH Genomic Data Sharing policy | Sharing of large-scale genomic data and metadata | Foundational for responsible genomic AI research |
Economic and Health-System Consequences
Economically, the attraction of AI in biomedicine is obvious because the underlying systems are so inefficient. Drug development remains long, expensive, and failure-prone. Estimates for bringing a new drug to market range from hundreds of millions to several billions of dollars. The point is not that AI will erase development costs, but that even small improvements in target quality, trial enrichment, or attrition reduction can have disproportionate value in a system whose baseline is so costly.
Health systems face a different economics. The central problem is not medicinal chemistry but workflow friction: staffing pressure, coding burden, fragmented records, and documentation overload. The OECD's 2026 cross-country review found universal use of AI in administration across OECD members, but only limited national-scale implementation in clinical areas such as medical imaging. That asymmetry suggests AI adoption in healthcare does not primarily fail because models are unavailable; it fails because scaling requires interoperability, governance, reimbursement, procurement capacity, and local trust.
Even where AI shows clear benefit, such as ambient documentation reducing clinician burnout, the financial case is not yet as clean as the burnout case. Some current AI investments are justified not because they obviously lower total costs, but because they relieve labor scarcity, improve retention, and make professional work more tolerable. In health care, preserving human capacity is often as important as reducing expense.
Trajectories, Research Priorities, and Recommended Actions
Three-to-Five-Year Outlook
The most plausible near-term trajectory is a layered one. In research, biological foundation models will become normal tools for prioritization rather than exotic demonstrations. Structural prediction, variant scoring, cell-state annotation, image interpretation, and small-molecule generation will increasingly be embedded in ordinary workflows. In medicine, the dominant form of deployment will be "narrow autonomy plus human oversight": AI that drafts, triages, flags, summarizes, standardizes, and occasionally scores regulated endpoints, but still leaves final accountability to people.
Ten-to-Twenty-Year Horizon
If multimodal biological and clinical models continue to improve, and if experimentation, simulation, and automation become more tightly coupled, then the field could move toward a new mode of science in which wet-lab cycles are routinely guided by high-capacity models trained on structural, genomic, phenotypic, and clinical data. The long-term prize is not simply better prediction of existing phenomena. It is design: new proteins, new regulatory constructs, better-targeted therapies, adaptive trial strategies, and perhaps eventually patient-specific models that integrate genotype, phenotype, exposures, and treatment response.
Key Research Priorities
- Data quality and representativeness must improve; bigger pretraining corpora do not solve biased or mislabeled clinical reality
- Causal and mechanistic grounding is needed especially when models choose interventions rather than merely classify patterns
- Evaluation must move beyond leaderboards toward external validation, prospective trials, calibration checks, subgroup performance reporting, and post-deployment monitoring
- Multimodal integration needs to be biologically and clinically meaningful, not just architecturally elegant
- Open scientific infrastructure should be strengthened where possible, because reproducibility and broad access are still among the field's best correctives to hype
- Dual-use and biosecurity safeguards must advance in parallel with technical capability
Recommended Actions by Stakeholder
| Stakeholder | Recommended Actions |
|---|---|
| Researchers | Build around benchmark diversity, external validation, assay-aware error analysis, and reproducible release practices; pair foundation-model ambition with stronger biological grounding and experimental feedback loops |
| Clinicians and health systems | Prefer use cases with clear workflows, measurable endpoints, and human override; require calibration checks, subgroup analyses, silent pilots, and continuous monitoring |
| Policymakers and regulators | Treat health AI as both an innovation agenda and a governance agenda; harmonize privacy, nondiscrimination, post-market surveillance, and dual-use biosecurity controls |
| Industry | Shift from claims of general superiority to evidence of contextual value; invest in documentation, real-world monitoring, transparent update policies, and governance that can survive procurement, audit, and litigation |
These actions are not anti-innovation. They are what a mature innovation strategy looks like in a field where errors can affect diagnoses, therapies, trials, and, increasingly, the design of living systems themselves.
Open Questions and Limitations
Several uncertainties remain genuinely unresolved. Commercial pipelines change quickly, so any company comparison is illustrative rather than exhaustive. Many recent biological foundation models still rely on preprints or limited external benchmarks, making it difficult to separate genuine capability from architecture-specific overfitting. Regulatory frameworks are evolving concurrently with the technologies they govern, creating temporal mismatches between innovation speed and governance capacity. The technical frontier in AI-assisted protein and genome design is moving faster than the consensus on biosecurity governance.
These are not reasons for pessimism; they are reasons to distinguish clearly between demonstrated capability, regulated utility, and plausible future potential.