View markdown source on GitHub

Trajectory analysis

Contributors

Authors: AvatarJulia Jakiela

Questions

Objectives

Requirements

last_modification Last modification: Feb 7, 2023

What is trajectory analysis?

.footnote[Deconinck et al. 2021]

Trajectory inference (TI) methods have emerged as a novel subfield within computational biology to better study the underlying dynamics of a biological process of interest, such as:

cellular development differentiation immune responses
.image-40[ Petri dish with a magnified scheme of embryonic development ] .image-40[ Scheme of one cell differentiating into three different types ] .image-40[ Different types of immune cells taking part in inflammation ]

– TI allows us to study how cells evolve from one cell state to another, and subsequently when and how cell fate decisions are made.

.image-40[ A cell changing its shape in three steps (showing the evolution from one state to another) ]

Speaker Notes

Clustering, trajectory and pseudotime

.footnote[Deconinck et al. 2021] –

.pull-left[ Clustering calculates cell similarities to group specific cell types, that can be identified based on the marker genes expressed in each cluster.

.image-75[ A graph with axis labels ‘t-SNE’, showing the cells being clustered into several groups – each group marked in a  different colour. ]

]

Speaker Notes

.image-60[ A graph with axis labels ‘UMAP’, showing the cells ordered in pseudotime along learned trajectory path. Pseudotime goes through range of colours to depict progress through the transition: dark blue meaning root cells, yellow meaning end cells. ]

]

Speaker Notes _______________


Assumptions

Speaker Notes


Why different TI methods are designed to infer different kinds of biological processes?

Speaker Notes There are multiple algorithms used to analyse scRNA-seq data to make it more readable. Note that this is a graph from 2016 - many new approaches has been developed since then, but this scheme shows how those methods are built. –

Look at the pipeline presented by Cannoodt et al. 2016

.image-100[ A scheme showing a pipeline how single cell expression data is being processed: 1) similarity: cosine and Euclidean, 2) manifold learning:  t-SNE, diffusion maps, LLE, PCA, ICA, 3) clustering: Mclust, k-Means, Hierarchical, 4) graph: Bootstrap, KNN, MST, 5) pathfinding: principal curve, shortest path from origin, longest path, binary tree, 6) cell ordering: principal curve, average orderings, mutual disagreement, detect branches, OU process, project cells to path, optimize tree, 7) method: SCUBA pseudotime, Wanderlust, Wishbone, SLICER, SCOUP, Waterfall, Mpath, TSCAN, Monocle, SCUBA ]


There are multiple methods using particular algorithms, or even their combinations, so you must consider which one would be best for analysing your sample.

Speaker Notes We will consider some aspects shown here to better understand similarities and differences of TI methods. Those are also the things that highly affect the output or are quite important to be aware of.


Similarity

.pull-left[

Fragment cut from the pipeline - similarity: cosine and Euclidean (additionally Pearson, Spearman)

]

.pull-right[

A count matrix of genes vs cells is plotted in N-dimensional space with each gene representing the different axes. A distance formula for 3 dimensions is shown, and then a final table is shown from the count matrix with the distances between each of the cells, based on their genes.

]

Speaker Notes


Manifold learning / dimensionality reduction

.pull-left[

Fragment cut from the pipeline - manifold learning:  t-SNE, diffusion maps, LLE, PCA, ICA (additionally CAP, GPLVM, Isomap, MDS)

]

.pull-right[

.image-100[ Four graphs showing the alignment of cell types depending on the algorithm of dimensionality reduction that was chosen: UMAP, tSNE, PCA, LSI. UMAP shows distinct cell groups, transitioning smoothly from one to another, creating kind of semicircle. tSNE shows distinct cell groups, however no smooth transitions are observed, all groups gathered into one big grouping. PCA shows cell groups whose boundaries are blurred between each other. On LSI graph, the cell types are all mixed together. ]

]

Speaker Notes


Clustering

.pull-left[

Fragment cut from the pipeline - clustering: Mclust, k-Means, Hierarchical (additionally SOM, PAM, Mean shift)

]

.pull-right[

Grey set of cells getting clustered into several distinct groups, each group marked in different colour

]

Speaker Notes


Graph-based approach

.pull-left[

Fragment cut from the pipeline - graph: Bootstrap, KNN, MST

]

.pull-right[

Grey set of cells getting clustered into several distinct groups, each group marked in a different colour. Coloured clusters are connected by straight lines with weights, but the chosen path is the one that minimises the total total edge weight.

]

Speaker Notes

Graph-based approach

.footnote[Haghverdi et al. 2016]

.pull-left[

Fragment cut from the pipeline - graph: Bootstrap, KNN, MST

]

.pull-right[

Multi-dimensional plane with cells projected onto it. From there the construction of transition matrix is performed: the smallest distance between points means higher probability than for longer distance between points. From there the diffusion pseudotime is performed which means scale-free average over random walks ]


Extensions of trajectory inference: RNA velocity methods

.footnote[Deconinck et al. 2021] –

.image-40[ A graph showing unspliced vs spliced reads for any given gene with two plots: one goes like e^x (state likelihood off) and the other one: ln(x) (state likelihood: on) ]


Extensions of trajectory inference: RNA velocity methods

.footnote[Deconinck et al. 2021]

.image-25[ Pseudotime plot with arrows going from root cells towards branches ]


When analysing your data, consider the following:

.pull-left[

]

.pull-right[

To help you evaluate which method would work best for your data, check out this awesome comparision site - dynguidelines, a part of a larger set of open packages for doing and interpreting trajectories called the dynverse.

Screenshot of the user interface of dynguidelines, comparing multiple trajectory analysis methods

]


When analysing your data, consider the following: Tissue from which the cells were analysed

.pull-left[

]

.pull-right[

Experimental workflow summarising the identification of diverse lymph node stromal cells by single-cell RNA sequencing. 1) taking sample from a mouse, 2) CD45 depletion, 3) FACS sorting, 4) scRNA seq analysis (a graph showing clustered cells), 5) magnified scheme showing the multiplicity of cells that can be identified, in this case within llymphatic endothelial cells, vascular endothelial cells, fibrobastic reticular cells

Reprinted from “Heterogeneity of Murine Lymph Node Stromal Cell Subsets”, by BioRender.com (2022) ]


When analysing your data, consider the following: Branching points

Cells can differentiate or develop in various way, so they may exhibit different topologies.

A table showing trajectory topologies: cycle, linear, bifurcation, multifurcation, tree, connected graph, disconnected graph


When analysing your data, consider the following: Branching points

.pull-left[

]

.pull-right[







Branching points are identified as points where anticorrelated distances from branch ends become correlated

]

Speaker Notes


When analysing your data, consider the following: Supervised and unsupervised learning

.pull-left[
Biologist with a speech bubble, saying ‘What are my root cells? Where should the trajectory start?

]

.pull-right[

]


When analysing your data, consider the following: Supervised and unsupervised learning

If you know which cells are root cells, you should enter this information to the method to make the computations more precise. However, some methods use unsupervised algorithms, so you will get a trajectory based on the tools they use and topology they can infer.

Unsupervised Priors needed: start cells Priors needed: end cells Priors needed: both start and end cells
Slingshot, SCORPIUS, Angle, MST, Waterfall, TSCAN, SLICE, pCreode, SCUBA, RaceID/StemID, Monocle DDRTree PAGA Tree, PAGA, Wanderlust, Wishbone, topslam, URD, CellRouter, SLICER MFA, GrandPrix, GPfates, MERLoT Monocle ICA

When analysing your data, consider the following: Format of the data

.image-120[ A table showing implementations for different methods. R: Monocle, SLICE, TSCAN, Waterfall, SLICER, StemID, Slingshot, RNA velocity, FateID, DPT. Python: Wanderlust, Wishbone, PAGA, P-creode, RNA velocity, GPfates, DPT, Waddington-OT. Matlab: SCUBA, Wishbone, Pseudo-dynamics. Java: GRAND-SLAM. ]


When analysing your data, consider the following: Number of cells and features


When analysing your data, consider the following: Computing power & running time

It doesn’t directly affect your analysis, however do bear in mind that calculations performed during dimensionality reduction, especially on large datasets, can be really time-consuming. Therefore, you might consider if you won’t be limited by any of those factors.


Trajectory analysis methods used in Galaxy


Trajectory analysis methods used in Galaxy: PAGA (Partition-based graph abstraction)

.footnote[Wolf et al. 2018]

.pull-left[

Plots generated using PAGA based on cell types,  CD4 and Cd8a genes expression. Graph nodes connected to each other.

]

Speaker Notes

.pull-right[

Screenshot of the interface of ‘Scanpy PAGA’ Galaxy tool

]

Speaker Notes


Trajectory analysis methods used in Galaxy: Diffusion Pseudotime in Scanpy

.pull-left[

]

.pull-right[

Screenshot of the interface of ‘Scanpy DPT’ Galaxy tool

]


Trajectory analysis methods used in Galaxy: RaceID

.footnote[Grün et al. 2015]

– .pull-left[

]

.pull-right[

Screenshot of the interface of ‘Lineage Branch Analysis using StemID’ Galaxy tool

]

Speaker Notes


Trajectory analysis methods used in Galaxy: RaceID

.footnote[Grün et al. 2015]

.pull-left[

]

.pull-right[

Lineage Computation Plots

]

Speaker Notes From the mentioned tutorial:


Trajectory analysis methods used in Galaxy: Monocle3

.footnote[C. Trapnell cole-trapnell-lab]

– .pull-left[

]

.pull-right[

Screenshot of the Monocle3 tools available in Galaxy

]

Speaker Notes


Trajectory analysis methods used in Galaxy: Monocle3

.footnote[C. Trapnell cole-trapnell-lab]

.pull-left[

]

.pull-right[

Pseudotime plot, showing the development of T-cells – starting in dark blue on double negative cells and ending up on mature T-cells, marked in yellow on pseudotime scale.

]

Speaker Notes


Trajectory analysis methods used in Galaxy: scVelo

.footnote[Bergen et al. 2020]

.image-25[ scVelo plot showing arrows going in ordered directions both within the cell types and between them ]


Key Points

Thank you!

This material is the result of a collaborative work. Thanks to the Galaxy Training Network and all the contributors! Galaxy Training Network This material is licensed under the Creative Commons Attribution 4.0 International License.

References

  1. Grün, D., A. Lyubimova, L. Kester, K. Wiebrands, O. Basak et al., 2015 Single-cell messenger RNA sequencing reveals rare intestinal cell types. Nature 525: 251–255. 10.1038/nature14966
  2. Cannoodt, R., W. Saelens, and Y. Saeys, 2016 Computational methods for trajectory inference from single-cell transcriptomics. European Journal of Immunology 46: 2496–2506. 10.1002/eji.201646347
  3. Haghverdi, L., M. Büttner, F. A. Wolf, F. Buettner, and F. J. Theis, 2016 Diffusion pseudotime robustly reconstructs lineage branching. Nature Methods 13: 845–848. 10.1038/nmeth.3971
  4. Chen, S., Y. Zhou, Y. Chen, and J. Gu, 2018 fastp: an ultra-fast all-in-one FASTQ preprocessor. 10.1093/bioinformatics/bty560
  5. Wolf, F. A., P. Angerer, and F. J. Theis, 2018 SCANPY: large-scale single-cell gene expression data analysis. Genome Biology 19: 10.1186/s13059-017-1382-0
  6. Bergen, V., M. Lange, S. Peidli, F. A. Wolf, and F. J. Theis, 2020 Generalizing RNA velocity to transient cell states through dynamical modeling. Nature Biotechnology 38: 1408–1414. 10.1038/s41587-020-0591-3
  7. Sagar, and D. Grün, 2020 Deciphering Cell Fate Decision by Integrated Single-Cell Sequencing Analysis. Annual Review of Biomedical Data Science 3: 1–22. 10.1146/annurev-biodatasci-111419-091750
  8. Deconinck, L., R. Cannoodt, W. Saelens, B. Deplancke, and Y. Saeys, 2021 Recent advances in trajectory inference from single-cell omics data. Current Opinion in Systems Biology 27: 100344. 10.1016/j.coisb.2021.05.005
  9. cole-trapnell-lab monocle3. https://cole-trapnell-lab.github.io/monocle3/