Youâve done all the hard work of preparing a single-cell matrix, processing it, plotting it, interpreting it, and finding lots of lovely genes. Now you want to infer trajectories, or relationships between cells⊠you can do that here, using the Galaxy interface, or head over to the Jupyter notebook version of this tutorial to learn how to perform the same analysis using Python.
Traditionally, we thought that differentiating or changing cells jumped between discrete states, so âCell Aâ became âCell Bâ as part of its maturation. However, most data shows otherwise. Generally, there is a spectrum (a âtrajectoryâ, if you willâŠ) of small, subtle changes along a pathway of that differentiation. Trying to analyse cells every 10 seconds can be pretty tricky, so âpseudotimeâ analysis takes a single sample and assumes that those cells are all on slightly different points along a path of differentiation. Some cells might be slightly more mature and others slightly less, all captured at the same âtimeâ. These cells are sorted accordingly along these pseudotime paths of differentiation to build a continuum of cells from one state to the next. We therefore âassumeâ or âinferâ relationships from this continuum of cells.
We will use the same sample from the previous three tutorials, which contains largely T-cells in the thymus. We know T-cells differentiate in the thymus, so we would assume that we would capture cells at slightly different time points within the same sample. Furthermore, our cluster analysis alone showed different states of T-cells. Now itâs time to look further!
Tools are frequently updated to new versions. Your Galaxy may have multiple versions of the same tool available. By default, you will be shown the latest version of the tool. This may NOT be the same tool used in the tutorial you are accessing. Furthermore, if you use a newer tool in one step, and try using an older tool in the next step⊠this may fail! To ensure you use the same tool versions of a given tutorial, use the Tutorial mode feature.
Open your Galaxy server
Click on the curriculum icon on the top menu, this will open the GTN inside Galaxy.
Navigate to your tutorial
Tool names in tutorials will be blue buttons that open the correct tool for you
Note: this does not work for all tutorials (yet)
You can click anywhere in the grey-ed out area outside of the tutorial box to return back to the Galaxy analytical interface
Warning: Not all browsers work!
Weâve had some issues with Tutorial mode on Safari for Mac users.
Try a different browser if you arenât seeing the button.
Did you know we have a unique Single Cell Omics Lab with all our single cell tools highlighted to make it easier to use on Galaxy? We recommend this site for all your single cell analysis needs, particularly for newer users.
The Single Cell Omics Lab is a different view of the underlying Galaxy server that organises tools and resources better for single-cell users! It also provides a platform for communities to engage and connect; distribute more targeted news and events; and highlight community-specific funding sources.
When something goes wrong in Galaxy, there are a number of things you can do to find out what it was. Error messages can help you figure out whether it was a problem with one of the settings of the tool, or with the input data, or maybe there is a bug in the tool itself and the problem should be reported. Below are the steps you can follow to troubleshoot your Galaxy errors.
Expand the red history dataset by clicking on it.
Sometimes you can already see an error message here
View the error message by clicking on the bug icongalaxy-bug
Check the logs. Output (stdout) and error logs (stderr) of the tool are available:
Expand the history item
Click on the details icon
Scroll down to the Job Information section to view the 2 logs:
This is a "Choose Your Own Tutorial" section, where you can select between multiple paths. Click one of the buttons below to select how you want to follow the tutorial
Importing via History is quickest. Works only on Galaxy EU for now.
Click galaxy-uploadUpload Data at the top of the tool panel
Select galaxy-wf-editPaste/Fetch Data
Paste the link(s) into the text field
Press Start
Close the window
Renamegalaxy-pencil the .h5ad object as Final cell annotated object
Click on the galaxy-pencilpencil icon for the dataset to edit its attributes
In the central panel, change the Name field to Final cell annotated object
Click the Save button
Check that the datatype is h5ad
Click on the galaxy-pencilpencil icon for the dataset to edit its attributes
In the central panel, click galaxy-chart-select-dataDatatypes tab on the top
In the galaxy-chart-select-dataAssign Datatype, select h5ad from âNew typeâ dropdown
Tip: you can start typing the datatype into the field to filter the dropdown menu
Click the Save button
Filtering for T-cells
One problem with our current dataset is that itâs not just T-cells: we found in the previous tutorial that it also contains macrophages. This is a problem, because trajectory analysis will generally try to find relationships between all the cells in the sample. We need to remove those cell types to analyse the trajectory.
Hands On: Removing macrophages
Manipulate AnnData ( Galaxy version 0.7.5+galaxy1) with the following parameters:
param-fileâAnnotated data matrixâ: Final cell annotated object (Input dataset)
âFunction to manipulate the objectâ: Filter observations or variables
You should now have 8569 cells, as opposed to the 8605 you started with. Youâve only removed a few cells (the contaminants!), but it makes a big difference in the next steps.
Force-directed graph
First, we will calculate a force-directed graph (FDG), as an alternate to tSNE, which will likely work better for trajectory analysis.
Calculate force-directed graph
Hands On: Draw FDG
Scanpy RunFDG ( Galaxy version 1.8.1+galaxy9) with the following parameters:
param-fileâInput object in AnnData/Loom formatâ: T-cell_object.h5ad (output of Manipulate AnnDatatool)
âUse programme defaultsâ: galaxy-toggleNo
âGraph layoutâ: fa
Comment: Graph Layout
Weâre using the fa or ForceAtlas2 layout for our FDGs. It is the same layout used in the Jupyter notebook version of this tutorial and works well for our data. As well as choosing the fa layout when we create the FDGs, we will also specify the draw_graph_fa embedding when drawing the plots.
Plot the FDG
And now time to plot it!
Hands On: Plot the FDG
Scanpy PlotEmbed ( Galaxy version 1.8.1+galaxy9) with the following parameters:
param-fileâInput object in AnnData/Loom formatâ: FDG object Anndata (output of Scanpy RunFDGtool)
âname of the embedding to plotâ: draw_graph_fa
âcolor by attributes, comma separated textsâ: cell_type
âUse raw attributesâ: galaxy-toggleNo
âLocation of legendâ: On data
What has the FDG done to our clusters of T-cells and what might this suggest about the relationships between these groups?
Well now this is exciting! Our DP-late is more clearly separating, and we might also suppose that DP-M1, DP-M2, and DP-M3 are actually earlier on in the differentiation towards mature T-cells. And weâre only just getting started!
Weâll now perform an optional step, that basically takes the place of the standard Principle Component Analysis (PCA). Instead of using PCs, we can use diffusion maps.
Draw diffusion map
Hands On: Draw the Diffusion Map
Scanpy DiffusionMap ( Galaxy version 1.8.1+galaxy9) with the following parameters:
param-fileâInput object in AnnData/Loom formatâ: FDG object Anndata (output of Scanpy RunFDGtool)
âNumber of diffusion components to calculateâ: 15
Comment: Choosing the number of diffusion components
We could change the number of diffusion components and end up with a slightly different plot - a bit like if we changed the number of principal components used in the PCA we ran in the Filter, Plot and Explore tutorial. 15 seems to work well for this dataset and matches the number used in the Jupyter version of this tutorial, so weâll stick with that.
Re-calculate Nearest Neighbours
Now that we have our diffusion map, we need to re-calculate neighbors using the diffusion map instead of the PCs.
Hands On: Compute neighbours using diffusion map
Scanpy ComputeGraph ( Galaxy version 1.8.1+galaxy9) with the following parameters:
param-fileâInput object in AnnData/Loom formatâ: DiffusionMap Anndata (output of Scanpy DiffusionMaptool)
âUse programme defaultsâ: param-toggleNo
âUse the indicated representationâ: X_diffmap
If youâre using the latest versions of these tools (e.g. Scanpy ComputeGraph ( Galaxy version 1.9.3+galaxy0), rather than the ones suggested in the tutorial (e.g. Scanpy ComputeGraph ( Galaxy version 1.8.1+galaxy9) then you may need to change one more parameter here to set the Number of PCs to use to 15. These are the 15 diffusion components we just calculated, rather than actual PCs.
Re-draw the FDG
Now that weâve re-calculated the nearest neighbours, we can use these new neighbours to re-draw the FDG to see how this changes the plot.
Hands On: Plot a new FDG
Scanpy RunFDG ( Galaxy version 1.8.1+galaxy9) with the following parameters:
param-fileâInput object in AnnData/Loom formatâ: Graph object Anndata (output of Scanpy ComputeGraphtool)
âUse programme defaultsâ: param-toggleNo
âGraph layoutâ: fa
Scanpy PlotEmbed ( Galaxy version 1.8.1+galaxy9) with the following parameters:
param-fileâInput object in AnnData/Loom formatâ: FDG object Anndata (output of Scanpy RunFDGtool)
âname of the embedding to plotâ: draw_graph_fa
âcolor by attributes, comma separated textsâ: cell_type
âUse raw attributesâ: No
âLocation of legendâ: On data
Does this plot seem better or worse than before? Remember that weâre trying to understand the relationships between our groups of cells in time.
Oh dear! This doesnât look great. Maybe the DP-M4 cells are a whole other trajectory? That doesnât seem right. Saying that, this does spread out our T-mature cells, which makes a lot more sense when it comes to T-cell biology (we expect T-cells to differentiate into two types of T-cells, Cd8+Cd4- and Cd4+Cd8-). If you wanted to, you could also re-cluster your cells (since youâve changed the neighborhood graph on which the clusterisation depends). However, we tried that, and it called far too many clusters given the depth of sequencing in this dataset. Letâs stick with our known cell types and move from there.
Figure 2: FDG Plot after recalculating neighbours from the diffusion map
If you are working in a group, you can now divide up a decision here with one control and the rest can vary in numbers so that you can compare results throughout the tutorials.
you could recluster your cells using Scanpy FindCluster ( Galaxy version 1.8.1+galaxy0) at a different resolution, perhaps lower than the 0.6 we used before (Take a look at the Cell clusters step in the Filter, Plot and Explore tutorial if you need help with this.) Please note that in this case, you will want to change the PAGA step Scanpy PAGA to group by louvain rather than cell_type. You can certainly still plot both, we only didnât because with using our old Louvain calls, the cell_type and louvain categories are identical.
you could undo the optional diffusion map step by recalculating the neighbours again using X_pca instead of X_diffmap
you could also try changing the number of neighbors used in that step when running Scanpy ComputeGraph ( Galaxy version 1.8.1+galaxy9)
Everyone else: You will want to compare FREQUENTLY with your control team member.
Partition-based Graph Abstraction (PAGA)
PAGA is used to generalise relationships between groups, or likely clusters, in this case. It will make it much easier to see the trajectories between our clusters of T-cells.
Hands On: Plot PAGA
Scanpy PAGA ( Galaxy version 1.8.1+galaxy9) with the following parameters:
param-fileâInput object in AnnData/Loom formatâ: FDG object Anndata (output of Scanpy RunFDGtool)
âName of the clusteringâ: cell_type
Comment: Plotting gene expression
We can now draw our PAGA plot and we might also be interested in colouring our plot by genes as well. In this case, remembering that we are dutifully counting our genes by their EnsemblIDs rather than Symbols (which do not exist for all EnsemblIDs), we have to look up our genes of interest (CD4, CD8a) and plot the corresponding IDs in the next step.
Scanpy PlotTrajectory ( Galaxy version 1.8.1+galaxy9) with the following parameters:
param-fileâInput object in AnnData/Loom formatâ: PAGA object Anndata (output of Scanpy PAGAtool)
âLayout functionsâ: ForceAtlas2
âLocation of legendâ: On data
âUse programme defaultsâ: param-toggleNo
âName of cell annotation or gene that is used to color the nodesâ: cell_type,ENSMUSG00000023274,ENSMUSG00000053977
Comment: Choosing the layout
Weâre going to pick the ForceAtlas2 layout function for our PAGA plots as this is the same type of layout that we used for our FDG.
How have the relationships between our cell clusters changed now?
Which clusters are expressing our genes of interest, Cd4 and Cd8, at the highest levels?
The way the clusters are arranged has changed a bit now. The M4 cluster is right in the middle of the M1-3 clusters, rather than heading off on its own. The M1 cluster is looking like it is driving towards differentiation, which is not something we had necessarily been able to specify before by just looking at our cluster graphs or applying our biological knowledge.
Cd4 and Cd8 expression appear highest in the DP-L cluster. The expression of both Cd4 and Cd8 also appears higher than we might expect in the DP-M4 cluster - perhaps this is a sign that it is closer to the DP-L cluster than it seems in this simple plot.
Figure 3: PAGA plots coloured by cell type, Cd4 expression, and Cd8 expression
Re-draw force-directed graph (again!)
Force-directed graphs can be initialised randomly, or we can prod it in the right direction. Weâll prod it with our PAGA calculations. Note that you could also try prodding it with tSNE or UMAP. A lot of these tools can be used on top of each other or with each other in different ways, this tutorial is just one example. Similarly, you could be using any obs information for grouping, so could do this for louvain or cell_type for instance.
Hands On: Initialise FDG using PAGA
Scanpy RunFDG ( Galaxy version 1.8.1+galaxy9) with the following parameters:
param-fileâInput object in AnnData/Loom formatâ: Plotted PAGA Anndata (output of Scanpy PlotTrajectorytool)
âUse programme defaultsâ: param-toggleNo
âMethod to initialise embedding, any key for adata.obsm or choose from the preset methodsâ: paga
âGraph layoutâ: fa
Scanpy PlotEmbed ( Galaxy version 1.8.1+galaxy9) with the following parameters:
param-fileâInput object in AnnData/Loom formatâ: FDG object Anndata (output of Scanpy RunFDGtool)
âname of the embedding to plotâ: draw_graph_fa
âcolor by attributes, comma separated textsâ: cell_type
âUse raw attributesâ: No
âLocation of legendâ: On data
How has basing our FDG on the PAGA plot changed the relationships between our cells?
The interesting change here occurs between the Double Negative (DN) and Double Positive Module 4 (DP-M4) cells. Our DP-M4 cells are now heading on a clear trajectory towards differentiation. It looks like weâve got the correct ordering of cells from DN through the DP groups and on towards T-mature. We didnât see this in our previous plots.
The experiment that produced this data used two different groups of mice - the control or wildtype group and the knockout mice that were missing a gene involved in placental development, which impacted thymus development. Since we know the genotype of the mice from which each sample was collected, we can colour in our plots to see if there are any differences in the cells present in wildtype and knockout mice.
Plotting by Genotype
The easiest way to do this is just to rerun galaxy-refresh the previous step, but change the attribute we want to use to colour the FDG plot.
Hands On: Plot by genotype
Scanpy PlotEmbed ( Galaxy version 1.8.1+galaxy9) with the following parameters:
param-fileâInput object in AnnData/Loom formatâ: FDG object Anndata (output of Scanpy RunFDGtool)
âname of the embedding to plotâ: draw_graph_fa
âcolor by attributes, comma separated textsâ: genotype
âUse raw attributesâ: No
âLocation of legendâ: Right margin
Are there any differences in the distribution of the wildtype and knockout cells?
Weâre seeing a clear trajectory issue whereby the knockout cells are not found along the trajectory into T-mature (which, well, we kind of already figured out with just the cluster analysis, but we can feel even more confident about our results!)
Weâre also interested in the expression of the two genes that are known to be markers of the two different types of mature T-cells: Cd4 and Cd8. We can colour in our plot to show which cells are expressing these genes.
Hands On: Plot for gene expression
Scanpy PlotEmbed ( Galaxy version 1.8.1+galaxy9) with the following parameters:
param-fileâInput object in AnnData/Loom formatâ: FDG object Anndata (output of Scanpy RunFDGtool)
âname of the embedding to plotâ: draw_graph_fa
âcolor by attributes, comma separated textsâ: ENSMUSG00000023274,ENSMUSG00000053977
âUse raw attributesâ: No
âLocation of legendâ: On data
Comment: Gene Symbols
Weâre using the EnsemblIDs during this tutorial, as discussed above. If you like, you could change the names of these plots to the gene symbols by filling in the optional âFigure titleâ field with Cd4,Cd8. Make sure that the order of your figure titles matches the order of the EnsemblIDs in the colour by field. ENSMUSG00000023274 is Cd4 and ENSMUSG00000053977 is Cd8.
Does the expression pattern of these genes tell us anything about our cells?
Itâs clear that both Cd4 and Cd8 are being expressed mainly in the later stages of T-cell development, as they head towards the DP-L cluster - although Cd4 expression is a bit more widespread. This is what we would expect to see in genes that are associated with mature T-cells. You might also be able to spot some differences in the expression of the two genes in our mature T-cell group, but there doesnât seem to be a very clear division between the cells that express Cd4 and those that express Cd8. This is a bit disappointing, as we know that there are two types of mature T-cells, which each express a different gene.
Figure 6: FDG plot showing expression of Cd4 and Cd8
Diffusion pseudotime
Now that we have a reasonable FDG plot for our cells, based on the diffusion map (if used) and PAGA plot, we can place our cells into pseudotime. Pseudotime lets us imagine that instead of looking at a sample of cells taken at a single timepoint, we are looking at cells moving through time. Our sample included cells at different stages of their development, but we can use pseudotime to think of these as different timepoints in the journey of individual cells.
We know that our cells are initialising at DN. We can feed that information into our algorithms by naming DN as the root cell type to then calculate a trajectory starting from these cells.
If you called new clusters using Scanpy FindCluster ( Galaxy version 1.8.1+galaxy0), you might want to choose one of those clusters to be your root cell instead, so change the cell_type for louvain and then name the cluster number. Use the plots you created to help you pick the number!
On to the diffusion pseudotime, where we infer multiple time points within the same piece of data!
Hands On: DPT Plot
Scanpy DPT ( Galaxy version 1.8.1+galaxy9) with the following parameters:
param-fileâInput object in AnnData/Loom formatâ: FDG object Anndata (output of Scanpy RunFDGtool)
âName of attribute that defines clusteringâ: cell_type
âName of the clustering that defines the root cell typeâ: DN
Scanpy PlotEmbed ( Galaxy version 1.8.1+galaxy9) with the following parameters:
param-fileâInput object in AnnData/Loom formatâ: Diffusion pseudotime inference Anndata (output of Scanpy DPTtool)
âname of the embedding to plotâ: draw_graph_fa
âcolor by attributes, comma separated textsâ: cell_type,dpt_pseudotime
âUse raw attributesâ: No
âLocation of legendâ: On data
Does the pseudotime plot match with your expectations of which cells represent earlier or later stages of T-cell development?
When we look at the cell type and DPT plots, we can see that thereâs a clear progression from our root DN cells, through the various groups of DP cells, into DP-L and then T-mat. This matches with our expectations that the DP-L and DP-mat clusters represent later stages in T-cell development. The DPT plot also confirms that the DP-M4 cluster is heading towards differentiation, which makes sense given its position on our FDG plot.
Figure 7: FDG plots showing cell types and pseudotime
This is nice, as it supports our conclusions thus far on the trajectory of the T-cell differentiation. With single-cell, the more ways you can prove to yourself what youâre seeing is real, the better! If we did not find consistent results, we would need to delve in further to see if the cause is the algorithm (not all algorithms fit all data!) or the biology.
Where might we go from here? We might consider playing with our louvain resolutions, to see if we can get the two groups of Cd4+ and Cd8+ cells to be called as different clusters, and then comparing them to each other for gene differences or genotype differences. We might also use different objects (for instance, what if we regressed out cell cycle genes?) and see if that changes the results. What would you do?
Look at each others images! How do yours differ, what decisions were made? Previously, when calling clusters in the Filter, Plot and Explore Single-cell RNA-seq Data tutorial, the interpretation at the end is largely consistent, no matter what decisions are made throughout (mostly!). Is this the case with your trajectory analyses? You may find that it is not, which is why pseudotime analysis even more crucially depends on your understanding of the underlying biology (we have to choose the root cells, for instance, or recognise that DN cells should not be found in the middle of the DPs) as well as choosing the right analysis. Thatâs why it is a huge field! With analysing scRNA-seq data, itâs almost like you need to know about 75% of your data and make sure your analysis shows that, for you to then identify the 25% new information.
Congratulations! Youâve made it to the end! You might be interested in the workflowworkflow for this tutorial or this galaxy-historyExample History which shows the results you should expect to see if you follow this tutorial.
In this tutorial, you moved from called clusters to inferred relationships and trajectories using pseudotime analysis. You found an alternative to PCA (diffusion map), an alternative to tSNE (force-directed graph), a means of identifying cluster relationships (PAGA), and a metric for pseudotime (diffusion pseudotime) to identify early and late cells. If you were working in a group, you found that such analysis is slightly more sensitive to your decisions than the simpler filtering/plotting/clustering is. We are inferring and assuming relationships and time, so that makes sense!
You've Finished the Tutorial
Please also consider filling out the Feedback Form as well!
Key points
Trajectory analysis is less robust than pure plotting methods, as such âinferred relationshipsâ are a bigger mathematical leap
As always with single-cell analysis, you must know enough biology to deduce if your analysis is reasonable, before exploring or deducing novel insight
Frequently Asked Questions
Have questions about this tutorial? Have a look at the available FAQ pages and support channels
Further information, including links to documentation and original publications, regarding the tools, analysis techniques and the interpretation of results described in this tutorial can be found here.
Bacon, W. A., R. S. Hamilton, Z. Yu, J. Kieckbusch, D. Hawkes et al., 2018 Single-Cell Analysis Identifies Thymic Maturation Delay in Growth-Restricted Neonatal Mice. Frontiers in Immunology 9: 10.3389/fimmu.2018.02523
Did you use this material as an instructor? Feel free to give us feedback on how it went.
Did you use this material as a learner or student? Click the form below to leave feedback.
Hiltemann, Saskia, Rasche, Helena et al., 2023 Galaxy Training: A Powerful Framework for Teaching! PLOS Computational Biology 10.1371/journal.pcbi.1010752
Batut et al., 2018 Community-Driven Data Analysis Training for Biology Cell Systems 10.1016/j.cels.2018.05.012
author = "Marisa Loach and Wendi Bacon and Julia Jakiela and Mehmet Tekman",
title = "Inferring single cell trajectories with Scanpy (Galaxy Training Materials)",
year = "",
month = "",
day = "",
url = "\url{}",
note = "[Online; accessed TODAY]"
doi = {10.1371/journal.pcbi.1010752},
url = {},
year = 2023,
month = {jan},
publisher = {Public Library of Science ({PLoS})},
volume = {19},
number = {1},
pages = {e1010752},
author = {Saskia Hiltemann and Helena Rasche and Simon Gladman and Hans-Rudolf Hotz and Delphine Larivi{\`{e}}re and Daniel Blankenberg and Pratik D. Jagtap and Thomas Wollmann and Anthony Bretaudeau and Nadia Gou{\'{e}} and Timothy J. Griffin and Coline Royaux and Yvan Le Bras and Subina Mehta and Anna Syme and Frederik Coppens and Bert Droesbeke and Nicola Soranzo and Wendi Bacon and Fotis Psomopoulos and Crist{\'{o}}bal Gallardo-Alba and John Davis and Melanie Christine Föll and Matthias Fahrner and Maria A. Doyle and Beatriz Serrano-Solano and Anne Claire Fouilloux and Peter van Heusden and Wolfgang Maier and Dave Clements and Florian Heyl and Björn GrĂŒning and B{\'{e}}r{\'{e}}nice Batut and},
editor = {Francis Ouellette},
title = {Galaxy Training: A powerful framework for teaching!},
journal = {PLoS Comput Biol}
Congratulations on successfully completing this tutorial!
Do you want to extend your knowledge?
Follow one of our recommended follow-up trainings:
3 stars:
Liked: I liked it, I just think I could do better.
Disliked: my patience
4 stars:
Liked: The instructions!
Disliked: Getting started was the normal nightmare - being in the right place, however the members of the breakout room I was in were super helpful
4 stars:
Liked: The illustrations of what to expect with each graph output allowed me to trace back an error I'd made that still allowed the process to be run, and so didn't flag as an error.