Section: Computational Biology

Single-cell RNA-seq Trajectory Inference and Cell Lineage Tracing

Abstract

Single-cell RNA sequencing (scRNA-seq) has transformed the study of cellular heterogeneity and developmental dynamics in biological systems. Trajectory inference and cell lineage tracing represent two complementary computational frameworks for reconstructing the temporal order of cell state transitions from static snapshot data. This article provides an exhaustive technical review of the algorithmic foundations, biophysical principles, and computational workflows underlying pseudotime inference, RNA velocity analysis, and lineage tracing methods. Emphasis is placed on the application of these techniques in veterinary and comparative biological contexts, including the study of host-pathogen interactions, tissue development, and disease progression in non-human species. The review covers key algorithms including Monocle, Slingshot, and Totem, as well as emerging approaches based on optimal transport, Schrödinger bridges, and Bayesian phylogenetic frameworks.

1. Introduction

The advent of single-cell RNA sequencing (scRNA-seq) has enabled the interrogation of transcriptomic heterogeneity at unprecedented resolution [1, 2]. Unlike bulk sequencing methods, which average gene expression signals across millions of cells, scRNA-seq captures the discrete transcriptional state of individual cells [3]. This granularity is essential for resolving rare cell populations, transitional states, and dynamic processes such as differentiation, activation, and disease progression [4]. In veterinary medicine, scRNA-seq has been applied to study immune cell dynamics during infection, tissue regeneration in livestock species, and developmental biology in model organisms [5].

A fundamental challenge in scRNA-seq analysis is that the data represent a static snapshot of gene expression at a single time point [6]. Cells are destroyed during the sequencing process, precluding direct longitudinal observation of individual cell fates [7]. Trajectory inference methods address this limitation by computationally ordering cells along a continuous path that represents the underlying biological process [8]. These methods assume that cells at different stages of a process will exhibit distinct but related gene expression profiles, and that the relationships between these profiles can be used to reconstruct the temporal sequence of events [9].

Cell lineage tracing provides an orthogonal approach to trajectory inference by directly recording the ancestry of individual cells through heritable marks [10]. These marks can be introduced through genetic barcoding, CRISPR-mediated editing, or natural somatic mutations [11]. The combination of transcriptomic profiling with lineage information enables the construction of phylogenetic trees that map the relationships between cell states [12]. This review examines the computational and biophysical principles underlying both trajectory inference and lineage tracing, with a focus on their integration for comprehensive analysis of cellular dynamics.

2. Biophysical Basis of Single-Cell Transcriptomic Data

2.1 Stochastic Gene Expression and Transcriptional Noise

Gene expression in individual cells is inherently stochastic due to the low copy numbers of mRNA molecules and the probabilistic nature of transcription factor binding [2]. This stochasticity generates cell-to-cell variability even in genetically identical populations [3]. The magnitude of transcriptional noise is influenced by promoter architecture, chromatin accessibility, and the availability of RNA polymerase and splicing machinery [4]. In the context of trajectory inference, this noise is both a challenge and a source of information. Methods that model the underlying stochastic process can extract meaningful signals from the variability [2].

The mechanistic model of stochastic gene expression forms the foundation for several trajectory inference algorithms [2]. These models describe the production and degradation of mRNA as a continuous-time Markov process [2]. The steady-state distribution of transcript counts follows a negative binomial distribution, which is the basis for the scLANE testing framework [5]. This distribution accounts for both the Poisson sampling noise inherent in sequencing and the biological variability arising from transcriptional bursting [5].

2.2 RNA Splicing Dynamics and Velocity

RNA velocity exploits the temporal information contained in the ratio of unspliced to spliced mRNA transcripts [3]. During transcription, newly synthesized pre-mRNA molecules contain intronic sequences that are subsequently removed by the splicing machinery [3]. The abundance of unspliced transcripts relative to spliced transcripts provides a proxy for the instantaneous rate of gene expression [3]. When a gene is being upregulated, the proportion of unspliced mRNA increases; when it is being downregulated, the proportion decreases [3].

The mathematical formulation of RNA velocity is based on a simple ordinary differential equation model of transcription, splicing, and degradation [3]. For each gene, the rate of change of spliced mRNA abundance is given by the difference between the splicing rate and the degradation rate [3]. This model can be extended to account for multiple isoforms and alternative splicing events [3]. The comparison between conventional tools and deep learning models for RNA velocity analysis has revealed that neural network approaches can capture nonlinear relationships between splicing dynamics and cell state transitions [3].

3. Pseudotime Algorithms

3.1 Monocle Family

Monocle was one of the first widely adopted algorithms for pseudotime inference in scRNA-seq data [8]. The algorithm operates by first reducing the dimensionality of the gene expression matrix using techniques such as principal component analysis or independent component analysis [8]. It then constructs a minimum spanning tree on the reduced representation, and orders cells along the longest path through this tree [8]. The pseudotime value assigned to each cell represents its position along this inferred developmental trajectory [8].

The Monocle framework has been extended to incorporate multiple branching points, enabling the reconstruction of complex differentiation hierarchies [8]. The algorithm can identify genes whose expression changes significantly along the pseudotime trajectory, providing a list of candidate marker genes for each transitional state [8]. The scTrends package provides automated classification and strength quantification of gene expression trends along pseudotime, enabling systematic comparison across datasets [1].

3.2 Slingshot

Slingshot is a trajectory inference method that combines cluster-based lineage identification with smooth curve fitting [8]. The algorithm first performs unsupervised clustering to identify discrete cell populations [8]. It then constructs a minimum spanning tree connecting the cluster centroids, which defines the backbone of the lineage structure [8]. Principal curves are fitted along each branch of this tree, and cells are projected onto these curves to obtain their pseudotime coordinates [8].

The principal curve fitting approach used by Slingshot is robust to noise and can accommodate nonlinear trajectories [8]. The algorithm also provides a measure of uncertainty for each pseudotime assignment, which is useful for downstream statistical analyses [8]. The data-driven selection of analysis decisions in scRNA-seq trajectory inference, including the choice of clustering algorithm and dimensionality reduction method, has been systematically evaluated using the Slingshot framework [9].

3.3 Totem

Totem is a method for inferring tree-shaped single-cell trajectories that explicitly models the branching structure of the data [8]. The algorithm uses a probabilistic graphical model to represent the relationships between cell states, with edges representing transitions and nodes representing stable states [8]. The tree structure is inferred using a maximum likelihood approach, and the algorithm can accommodate both continuous and discrete cell states [8].

The Totem framework is particularly well suited for datasets with multiple branching points and complex lineage hierarchies [8]. It provides a natural representation of the developmental process as a tree, which can be directly compared with known anatomical or functional relationships [8]. The method has been applied to study hematopoietic differentiation and immune cell activation in model systems [8].

4. RNA Velocity and Its Integration with Trajectory Inference

4.1 Conventional Velocity Estimation

The conventional approach to RNA velocity estimation relies on the steady-state assumption that the splicing and degradation rates are constant across the dataset [3]. Under this assumption, the velocity of a gene is proportional to the difference between the observed unspliced-to-spliced ratio and the expected steady-state ratio [3]. Cells with positive velocity for a gene are interpreted as being in the process of upregulating that gene, while cells with negative velocity are interpreted as downregulating it [3].

The comparison between conventional tools and deep learning models for RNA velocity analysis has demonstrated that deep learning approaches can outperform traditional methods in several key respects [3]. Deep learning models can learn the splicing dynamics directly from the data without requiring explicit specification of the kinetic parameters [3]. They can also capture nonlinear relationships between splicing and expression that are not accounted for in the linear steady-state model [3]. However, the interpretability of deep learning models is reduced compared to conventional approaches, which has motivated the development of hybrid methods [3].

4.2 Deep Learning Approaches

Deep learning models for RNA velocity analysis typically use a neural network architecture that takes the unspliced and spliced counts as input and predicts the future state of the cell [3]. These models can be trained on time-series data or on datasets with known developmental trajectories [3]. The network learns a mapping from the current splicing state to the expected future splicing state, which can then be used to compute a velocity vector for each cell [3].

The application of deep learning to RNA velocity analysis has been facilitated by the availability of large-scale scRNA-seq datasets and the development of efficient training algorithms [3]. The models can be regularized to prevent overfitting and can incorporate prior knowledge about the biological system [3]. The performance of these models has been evaluated on a range of synthetic and real datasets, and they have been shown to produce more accurate velocity estimates than conventional methods in many cases [3].

5. Lineage Tracing Methods

5.1 Genetic Barcoding

Genetic barcoding is a lineage tracing technique in which a heritable, unique DNA sequence is introduced into the genome of a founder cell [10]. This barcode is then passed to all daughter cells during cell division, enabling the reconstruction of the clonal relationships between cells [10]. The barcode can be introduced using a variety of methods, including retroviral integration, transposon insertion, or CRISPR-mediated editing [10].

The analysis of barcoded cells requires sequencing of the barcode locus and computational reconstruction of the phylogenetic tree [10]. The SciPhy framework provides a Bayesian phylogenetic approach for analyzing sequential genetic lineage tracing data [10]. This method accounts for the possibility of barcode loss or mutation during cell division and provides a probabilistic estimate of the true lineage tree [10]. The comparison between CARLIN and DNA Typewriter in CRISPR-mediated lineage tracing has highlighted the trade-offs between barcode diversity, editing efficiency, and sequencing depth [13].

5.2 CRISPR-Based Lineage Recording

CRISPR-based lineage recording systems use a guide RNA library to introduce targeted mutations at specific genomic loci [12]. These mutations accumulate over time, creating a record of the cell's division history [12]. The resulting mutation pattern can be read out by sequencing the target loci and used to reconstruct the lineage tree [12]. The assessment of single-cell phylogenies and population dynamics from CRISPR lineage recordings requires careful modeling of the mutation process and the possibility of editing errors [12].

The CARLIN system (CRISPR Array for Lineage Inference) uses a designed array of target sites that are sequentially edited by a CRISPR-Cas9 system [13]. The order and pattern of edits at each site provide a temporal record of cell divisions [13]. The DNA Typewriter system uses a similar approach but with a different array design and editing strategy [13]. The comparison between these two systems has revealed that both are capable of reconstructing complex lineage trees, but that they differ in their sensitivity to editing errors and their ability to resolve deep lineages [13].

5.3 Natural Somatic Mutations

Natural somatic mutations, such as those occurring in mitochondrial DNA or in microsatellite repeats, can also be used for lineage tracing [14]. These mutations accumulate stochastically during cell division and can be detected by sequencing [14]. The advantage of using natural mutations is that they do not require any experimental manipulation of the genome, making them suitable for studies in which genetic modification is not feasible [14]. The evolving strategies for lineage tracing, including the use of genetic markers, synthetic barcodes, and natural variants, have been comprehensively reviewed [14].

The use of natural variants for lineage tracing is particularly relevant in veterinary medicine, where the introduction of synthetic barcodes may not be practical [14]. The analysis of natural somatic mutations requires careful discrimination between true lineage-informative mutations and sequencing errors [14]. The development of computational methods for this purpose is an active area of research [14].

6. Optimal Transport and Schrödinger Bridge Approaches

6.1 Optimal Transport Fate Mapping

Optimal transport is a mathematical framework for comparing probability distributions by minimizing the cost of moving mass from one distribution to another [4]. In the context of trajectory inference, optimal transport can be used to map cells from one time point to the next by finding the most likely correspondence between cells [4]. This approach does not require any assumptions about the underlying dynamics and can be applied to any dataset with multiple time points [4].

The optimal transport fate mapping approach has been applied to resolve T cell differentiation dynamics across tissues [4]. This method uses the gene expression profiles of cells at two time points to construct a transport map that assigns each cell at the earlier time point to a likely descendant at the later time point [4]. The resulting map provides a probabilistic estimate of the lineage relationships between cells [4]. The optimal transport framework can be extended to incorporate additional information, such as cell surface markers or spatial location [4].

6.2 Smooth Schrödinger Bridges

The Schrödinger bridge problem is a generalization of optimal transport that accounts for the stochastic nature of the underlying process [15]. In the context of trajectory inference, the Schrödinger bridge problem seeks to find the most likely path connecting two distributions of cells, given a prior model of cell dynamics [15]. The smooth Schrödinger bridge approach extends this framework by allowing the reference process to be a smooth Gaussian process, which leads to more regular and interpretable trajectories [15].

The smooth Schrödinger bridge approach has been shown to outperform existing methods on simulated and real single-cell RNA-seq datasets [15]. The method can be solved in polynomial time for a class of processes that includes the Matérn processes, which are commonly used in spatial statistics [15]. The practical approximation of this algorithm has been implemented and made available for use [15].

7. Integration of Transcriptomic and Structural Data

7.1 Mapping Marker Proteins onto 3D Structures

The integration of transcriptomic data with structural information is a key challenge in understanding the functional consequences of cell state transitions [19]. The TIPS (Trajectory Inference of Pathway Significance) method provides a framework for mapping marker genes and proteins onto known biological pathways and structures [19]. This approach enables the identification of pathways that are significantly associated with a given trajectory and the visualization of the timing of pathway activation [19].

The mapping of marker proteins onto 3D structures, such as those determined by cryo-electron microscopy, can reveal the structural basis of functional changes [19]. For example, the upregulation of a gene encoding a cell surface receptor can be mapped onto the structure of that receptor to identify the conformational changes that accompany activation [19]. The PathPinpointR method provides a framework for predicting the progression of scRNA-seq samples through reference trajectories, enabling the comparison of experimental data with known structural models [20].

7.2 Spatial Transcriptomics and Lineage Tracing

The SPACE-seq method integrates spatial transcriptomics with lineage tracing in native tissues [11]. This approach enables the simultaneous measurement of gene expression and cell lineage in the context of the tissue architecture [11]. The spatial information provided by SPACE-seq is critical for understanding how cell state transitions are influenced by the local microenvironment [11].

The integration of spatial and lineage information requires computational methods that can handle the complex structure of the data [11]. The SPACE-seq framework uses a combination of imaging and sequencing to capture both the spatial location of cells and their lineage barcodes [11]. The resulting data can be used to construct spatial maps of lineage relationships, which reveal how clonal populations are distributed within tissues [11].

8. Workflow and Decision Tree

The following Mermaid diagram illustrates a typical workflow for single-cell RNA-seq trajectory inference and lineage tracing, from data acquisition to biological interpretation.

flowchart TD
    A[Single-cell RNA-seq Data Acquisition], > B[Quality Control and Filtering]
    B, > C[Normalization and Batch Correction]
    C, > D[Dimensionality Reduction]
    D, > E[Clustering and Cell Type Identification]
    
    E, > F{Trajectory Inference Method}
    F, >|Pseudotime| G[Monocle / Slingshot / Totem]
    F, >|RNA Velocity| H[Conventional / Deep Learning]
    F, >|Lineage Tracing| I[Genetic Barcoding / CRISPR / Natural Mutations]
    
    G, > J[Pseudotime Assignment]
    H, > K[Velocity Vector Calculation]
    I, > L[Phylogenetic Tree Reconstruction]
    
    J, > M[Gene Expression Trend Analysis]
    K, > M
    L, > M
    
    M, > N[Pathway and Structural Mapping]
    N, > O[Biological Interpretation]
    
    O, > P[Validation and Hypothesis Testing]

9. Applications in Veterinary and Comparative Biology

9.1 Immune Cell Dynamics During Infection

The application of trajectory inference to study immune cell dynamics during infection has been a major focus of research [4]. In the context of veterinary medicine, scRNA-seq has been used to study the response of immune cells to pathogens such as avian influenza virus and Pasteurella multocida [4]. The trajectory inference methods described in this article can be used to reconstruct the sequence of activation states that immune cells undergo during the response to infection [4].

The lineage tracing of innate immune cells in human cancer has provided insights that are directly transferable to veterinary oncology [16]. The clonal relationships between immune cells in the tumor microenvironment can be reconstructed using lineage tracing methods, revealing the patterns of clonal expansion and contraction that accompany the immune response [16]. The functional lineage tracing using CaTCH (Cancer Tracing) has been applied to study the dynamics of tumor cell populations in response to therapy [21].

9.2 Developmental Biology

The tracing of hematopoietic stem cell formation at single-cell resolution has been a landmark study in developmental biology [17]. This work used a combination of scRNA-seq and lineage tracing to map the emergence of hematopoietic stem cells from the embryonic endothelium [17]. The resulting trajectory revealed the sequence of transcriptional events that accompany the transition from endothelial to hematopoietic fate [17].

The application of these methods to study development in veterinary species, such as the formation of the immune system in chickens or the development of the respiratory system in pigs, is an active area of research [17]. The clonal lineage tracing of innate immune cells in human cancer has provided a framework for similar studies in veterinary species [16].

9.3 Host-Pathogen Interactions

The study of host-pathogen interactions at the single-cell level has been advanced by the application of trajectory inference methods [4]. In the context of bacterial infection, scRNA-seq can be used to study the response of host cells to pathogen challenge [4]. The trajectory inference methods described in this article can be used to reconstruct the sequence of host cell states that accompany the progression of infection [4].

The use of lineage tracing to study the dynamics of pathogen populations within the host is a complementary approach [12]. The CRISPR lineage recording methods can be used to track the spread of a pathogen through the host, revealing the patterns of transmission and the bottlenecks that limit population growth [12]. The assessment of single-cell phylogenies and population dynamics from CRISPR lineage recordings has been applied to study the evolution of pathogens within the host [12].

10. Computational Considerations and Best Practices

10.1 Data Quality and Preprocessing

The quality of the input data is a critical determinant of the accuracy of trajectory inference [9]. Low-quality cells, such as those with high levels of ambient RNA or low sequencing depth, can introduce artifacts that distort the inferred trajectory [9]. The data-driven selection of analysis decisions in scRNA-seq trajectory inference provides a framework for choosing the optimal preprocessing steps for a given dataset [9].

The preprocessing of scRNA-seq data typically includes filtering of low-quality cells, normalization of gene expression counts, and correction of batch effects [9]. The choice of normalization method can have a significant impact on the inferred trajectory, and it is recommended to test multiple methods and compare the results [9]. The scLANE testing framework provides a statistical test for assessing the significance of gene expression trends along a trajectory, which can be used to validate the results of the inference [5].

10.2 Algorithm Selection

The choice of trajectory inference algorithm depends on the structure of the data and the biological question being addressed [8]. For datasets with a simple, linear trajectory, methods such as Monocle or Slingshot are appropriate [8]. For datasets with multiple branching points and complex lineage hierarchies, methods such as Totem or the Schrödinger bridge approaches are more suitable [8].

The comparison between different trajectory inference methods has been the subject of several benchmark studies [9]. These studies have shown that no single method is universally optimal, and that the choice of method should be guided by the characteristics of the data [9]. The data-driven selection of analysis decisions provides a framework for making this choice in a systematic and reproducible manner [9].

10.3 Validation and Interpretation

The validation of inferred trajectories is a critical step in the analysis [6]. The semi-supervised Bayesian approach for marker gene trajectory inference provides a method for incorporating prior knowledge into the inference, which can improve the accuracy of the results [6]. The use of known marker genes to validate the inferred trajectory is a common practice [6].

The interpretation of trajectory inference results requires careful consideration of the biological context [19]. The TIPS method provides a framework for mapping the inferred trajectory onto known biological pathways, which can aid in the interpretation of the results [19]. The visualization of the trajectory using tools such as UMAP or t-SNE can provide a qualitative assessment of the quality of the inference [19].

11. Future Directions

11.1 Integration of Multi-Omics Data

The integration of scRNA-seq data with other omics modalities, such as proteomics, metabolomics, and epigenomics, is a promising direction for future research [11]. The SPACE-seq method provides a framework for integrating spatial transcriptomics with lineage tracing, and similar approaches could be developed for other modalities [11]. The integration of multi-omics data would provide a more comprehensive view of the cell state and its relationship to the lineage [11].

11.2 Machine Learning and Deep Learning

The application of machine learning and deep learning to trajectory inference is an active area of research [3]. The comparison between conventional tools and deep learning models for RNA velocity analysis has highlighted the potential of deep learning approaches to capture nonlinear relationships [3]. The development of new deep learning architectures for trajectory inference is a promising direction for future research [3].

11.3 Clinical Applications

The application of trajectory inference and lineage tracing to clinical diagnostics is a promising direction for future research [16]. The clonal lineage tracing of innate immune cells in human cancer has demonstrated the potential of these methods to provide insights into disease progression and treatment response [16]. The extension of these methods to veterinary species and the development of diagnostic tools based on trajectory inference are active areas of research [16].

12. Conclusion

Single-cell RNA-seq trajectory inference and cell lineage tracing are powerful computational frameworks for reconstructing the temporal order of cell state transitions from static snapshot data. The algorithms described in this article, including Monocle, Slingshot, Totem, and the Schrödinger bridge approaches, provide a range of tools for analyzing scRNA-seq data. The integration of these methods with lineage tracing techniques, such as genetic barcoding and CRISPR-based recording, enables the construction of comprehensive maps of cellular dynamics. The application of these methods in veterinary and comparative biology has the potential to advance our understanding of development, disease, and host-pathogen interactions.

References

[1] Qing J, Hu J, Wang X et al. scTrends: automated classification and strength quantification of gene expression trends along pseudotime in single-cell RNA-seq. BMC Genomics. 2026. URL: https://pubmed.ncbi.nlm.nih.gov/42218405/

[2] Fournié C, Ventre E, Herbach U et al. Cell trajectory inference based on schrödinger problem and a mechanistic model of stochastic gene expression. NPJ Syst Biol Appl. 2026. URL: https://pubmed.ncbi.nlm.nih.gov/42168225/

[3] Sauda MR, Rodrigues AB, de Oliveira Lyra ML et al. Comparison between a conventional tool and deep learning models for RNA velocity analysis of scRNA-Seq data. Mol Genet Genomics. 2026. URL: https://pubmed.ncbi.nlm.nih.gov/42126634/

[4] Plotkin AL, Mullins GN, Green WD et al. Optimal transport fate mapping resolves T cell differentiation dynamics across tissues. bioRxiv. 2026. URL: https://pubmed.ncbi.nlm.nih.gov/42124734/

[5] Leary JR, Dong X, Bacher R. Interpretable trajectory inference with single-cell linear adaptive negative-binomial expression (scLANE) testing. Nucleic Acids Res. 2026. URL: https://pubmed.ncbi.nlm.nih.gov/41533563/

[6] Wang J, Sun L, Wei N et al. A semi-supervised Bayesian approach for marker gene trajectory inference from single-cell RNA-seq data. Bioinformatics. 2025. URL: https://pubmed.ncbi.nlm.nih.gov/40802529/

[7] Sashittal P, Zhang RY, Law BK et al. Inferring cell differentiation maps from lineage tracing data. bioRxiv. 2024. URL: https://pubmed.ncbi.nlm.nih.gov/39314473/

[8] Sousa AGG, Smolander J, Junttila S et al. Inferring Tree-Shaped Single-Cell Trajectories with Totem. Methods Mol Biol. 2024. URL: https://pubmed.ncbi.nlm.nih.gov/39068362/

[9] Dong X, Leary JR, Yang C et al. Data-driven selection of analysis decisions in single-cell RNA-seq trajectory inference. Brief Bioinform. 2024. URL: https://pubmed.ncbi.nlm.nih.gov/38725155/

[10] Seidel S, Zwaans A, Regalado S et al. SciPhy: A Bayesian phylogenetic framework using sequential genetic lineage tracing data. Nat Commun. 2026. URL: https://pubmed.ncbi.nlm.nih.gov/42270637/

[11] Jia Y, Sun D, Weir JA et al. SPACE-seq integrates spatial transcriptomics and lineage tracing in native tissues. Cell Stem Cell. 2026. URL: https://pubmed.ncbi.nlm.nih.gov/42263684/

[12] Pilarski J, Stadler T, Seidel S. Assessing the inference of single-cell phylogenies and population dynamics from CRISPR lineage recordings. PLoS Comput Biol. 2026. URL: https://pubmed.ncbi.nlm.nih.gov/42258486/

[13] Liu F, Zhang X, Yang Y. A comparison between CARLIN and DNA Typewriter in CRISPR-mediated lineage tracing. BMC Bioinformatics. 2026. URL: https://pubmed.ncbi.nlm.nih.gov/42215861/

[14] Kang Z, Chen H, Li S et al. Evolving strategies for lineage tracing: Genetic markers, synthetic barcodes, and natural variants. Cell Stem Cell. 2026. URL: https://pubmed.ncbi.nlm.nih.gov/42214336/

[15] Hong W, Shi Y, Niles-Weed J. Trajectory Inference with Smooth Schrödinger Bridges. International Conference on Machine Learning. 2025. URL: https://www.semanticscholar.org/paper/b1fecae000eb

[16] Liu V, Sandor K, Yan PK et al. Clonal lineage tracing of innate immune cells in human cancer. Cancer Cell. 2026. URL: https://pubmed.ncbi.nlm.nih.gov/42242233/

[17] Zhou F, Li X, Wang W et al. Tracing haematopoietic stem cell formation at single-cell resolution. Nature. 2016. URL: https://pubmed.ncbi.nlm.nih.gov/27225119/

[18] Yang SJ, Wang Y, Lin KZ. Leveraging Lineage Barcodes as Natural Augmentations for Contrastive Learning of Cell Fate in scRNA-seq Data. bioRxiv. 2026. URL: https://pubmed.ncbi.nlm.nih.gov/42239199/