Removing the effects of the cell cycle

Author(s)	Marisa Loach
Editor(s)	Wendi Bacon
Tester(s)	Graeme Tyson Pavankumar Videm
Reviewers

Overview
Questions:

How can I reduce the effects of the cell cycle on my scRNA-seq data?

Objectives:

Identify the cell cycle genes

Use the cell cycle genes to regress out the effects of the cell cycle

Create PCA plots to understand the impact of the regression

Requirements:

Introduction to Galaxy Analyses

tutorial Hands-on: Generating a single cell matrix using Alevin

tutorial Hands-on: Combining single cell datasets after pre-processing

tutorial Hands-on: Filter, plot and explore single-cell RNA-seq data with Scanpy

Time estimation: 1 hour

Supporting Materials:

Datasets

Workflows

galaxy-history-answer Answer Histories

UseGalaxy.eu
2024-12-13

help How to Use This

Published: Jan 25, 2023

Last modification: Mar 11, 2025

License: Tutorial Content is licensed under Creative Commons Attribution 4.0 International License. The GTN Framework is licensed under MIT

purl PURL: https://gxy.io/GTN:T00248

version Revision: 14

Single-cell RNA sequencing can be sensitive to both biological and technical variation, which is why preparing your data carefully is an important part of the analysis. You want the results to reflect the interesting differences in expression between cells that relate to their type or state. Other sources of variation can conceal or confound this, making it harder for you to see what is going on.

One common biological confounder is the cell cycle (Luecken and Theis 2019). Cells express different genes during different parts of the cell cycle, depending on whether they are in their growing phase (G1), duplicating their DNA (the S or Synthesis phase), or dividing in two (G2/M or Mitosis phase). If these cell cycle genes are having a big impact on your data, then you could end up with separate clusters that actually represent cells of the same type that are just at different stages of the cycle.

In this tutorial, we will identify the genes whose expression is known to vary during the cell cycle so that we can use them to regress out (or remove) the effects of the cell cycle on the clustering.

Comment: Other Scanpy and Seurat tutorials

This tutorial is based on the Scanpy cell cycle regression tutorial (Cittaro 2018), which was itself based on the Seurat vignette addressing the same issue (Paul Hoffman 2022). However, we will be using a different dataset for this tutorial.

Agenda

In this tutorial, we will cover:

Get Data

Important tips for easier analysis

Cell Cycle Scoring

Cell Cycle Regression

Plotting the Effects of Cell Cycle Regression

Prepare a table of cell cycle genes

Create an ordered list of gene names

Mark the cell cycle genes

Create the annotation column

Add an annotation to the AnnData

Filter the cell cycle genes

Plot the cell cycle genes before regression

Plot the cell cycle genes after regression

Conclusion

Get Data

The data used in this tutorial is from a mouse dataset of fetal growth restriction (Bacon et al. 2018). Cell cycle regression should be performed after the data has been filtered, normalised, and scaled. You can download the dataset below or import the history with the starting data.

Comment

If you’ve been working through the Single-cell RNA-seq: Case Study then you can use your dataset from the Filter, Plot and Explore Single-cell RNA-seq Data tutorial here. Select the Use_me_Scaled dataset from your history. Rename that dataset as Processed_Anndata. You will still need to import the S and G2/M gene lists below through Zenodo.

At the end of this tutorial, you can return to the main tutorial to plot and explore your data with reduced effects from the cell cycle.

In addition to the scRNA-seq dataset, we will also need lists of the genes that are known to be expressed at different points in the cell cycle. The lists used in this tutorial are part of the HBC tinyatlas and can be downloaded from Zenodo below Kirchner and HBC 2018. Between them, they include 97 genes that are expressed during the S and G2/M phases. The expression level of these cycle genes are - mostly - determined by the phases of the cell cycle. Make sure that the file type is tabular (not just the name of the file) - you can choose this when you download the files or change it after the files are in your history.

Hands On: Option 1: Data upload - Import history

Sometimes data upload can take a while, so a faster route is to import a history.

Import history from: example input history

Open the link to the shared history

Click on the Import this history button on the top left

Enter a title for the new history

Click on Copy History

Rename galaxy-pencil the the history to your name of choice.

Hands On: Option 2: Data upload - Add to history
Create a new history for this tutorial
Import the AnnData object from Zenodo
https://zenodo.org/record/7311628/files/Processed_AnnData.h5ad
Copy the link location

Click galaxy-upload Upload Data at the top of the tool panel

Select galaxy-wf-edit Paste/Fetch Data

Paste the link(s) into the text field

Press Start

Close the window
Rename the dataset Processed_Anndata

Click on the galaxy-pencil pencil icon for the dataset to edit its attributes

In the central panel, change the Name field

Click the Save button

Check that the datatype is h5ad

Click on the galaxy-pencil pencil icon for the dataset to edit its attributes

In the central panel, click galaxy-chart-select-data Datatypes tab on the top

In the galaxy-chart-select-data Assign Datatype, select datatypes from “New Type” dropdown

Tip: you can start typing the datatype into the field to filter the dropdown menu

Click the Save button

Hands On: Option 2 continued: Data upload - Add to history
Import the files from Zenodo
https://zenodo.org/record/7311628//files/sPhase.tabular
https://zenodo.org/record/7311628//files/g2mPhase.tabular
Copy the link location

Click galaxy-upload Upload Data at the top of the tool panel

Select galaxy-wf-edit Paste/Fetch Data

Paste the link(s) into the text field

Press Start

Close the window
Rename the datasets sPhase and g2mPhase respectively - be careful not to mix them up!

Click on the galaxy-pencil pencil icon for the dataset to edit its attributes

In the central panel, change the Name field

Click the Save button

Check that the datatype for both is tabular

Click on the galaxy-pencil pencil icon for the dataset to edit its attributes

In the central panel, click galaxy-chart-select-data Datatypes tab on the top

In the galaxy-chart-select-data Assign Datatype, select datatypes from “New Type” dropdown

Tip: you can start typing the datatype into the field to filter the dropdown menu

Click the Save button

Important tips for easier analysis

Tools are frequently updated to new versions. Your Galaxy may have multiple versions of the same tool available. By default, you will be shown the latest version of the tool. This may NOT be the same tool used in the tutorial you are accessing. Furthermore, if you use a newer tool in one step, and try using an older tool in the next step… this may fail! To ensure you use the same tool versions of a given tutorial, use the Tutorial mode feature.

Open your Galaxy server

Click on the curriculum icon on the top menu, this will open the GTN inside Galaxy.

Navigate to your tutorial

Tool names in tutorials will be blue buttons that open the correct tool for you

Note: this does not work for all tutorials (yet)

You can click anywhere in the grey-ed out area outside of the tutorial box to return back to the Galaxy analytical interface

Warning: Not all browsers work!

We’ve had some issues with Tutorial mode on Safari for Mac users.

Try a different browser if you aren’t seeing the button.

Did you know we have a unique Single Cell Omics Lab with all our single cell tools highlighted to make it easier to use on Galaxy? We recommend this site for all your single cell analysis needs, particularly for newer users.

The Single Cell Omics Lab is a different view of the underlying Galaxy server that organises tools and resources better for single-cell users! It also provides a platform for communities to engage and connect; distribute more targeted news and events; and highlight community-specific funding sources.

Try it out!

subdomain Europe: Single Cell Omics Lab

subdomain USA: Single Cell Omics Lab

subdomain Australia: Single Cell Omics Lab

Cell Cycle Scoring

The first step towards reducing the effects of the cell cycle on our dataset is cell cycle scoring. The cell cycle scoring algorithm will look at each cell in turn and calculate an S score based on the difference in the mean expression of the S Phase genes and a random sample of the same number of non-cell cycle genes from the same cell. It will do the same for the G2M genes in order to calculate the G2M score. The cells will then be assigned to the most likely phase: S, G2M, or G1, if neither G2M or S score highly. Three columns will be added to the AnnData dataset: S_score, G2M_score and phase.

Question

Why don’t we need a list of genes expressed in the G1 Phase?

Since we know which genes are expressed in the S and G2/M phases, we can classify cells that are expressing these genes into the S and G2/M phases respectively. Cells that aren’t expressing either the S or G2/M genes must be in the other phase of the cell cycle, so we can classify them as in G1 phase.

Comment: When should we regress out the effects of the cell cycle?

Cell cycle regression can be particularly important if we are planning to do trajectory analysis down the line or if we have a dataset that is very strongly influenced by the cell cycle Luecken and Theis 2019. However, it isn’t always appropriate to remove the effects of the cell cycle - sometimes they can be useful for distinguishing between dividing and non-dividing cell types. When you are analysing your own data, you might need to try it both ways to determine if the effects of the cell cycle are helpful or not. You could also check whether the cell cycle genes are among the top scoring genes expressed by your cell clusters to get an idea of how strong the effects are.

Hands On: Score the cell cycle genes

Inspect and manipulate ( Galaxy version 1.7.1+galaxy0) with the following parameters:

param-file “Annotated data matrix”: Processed_Anndata (Input dataset)

“Method used for inspecting”: Score cell cycle genes, using 'tl.score_genes_cell_cycle'

“Format for the list of genes associated with S phase”: File

param-file “File with the list of genes associated with S phase”: sPhase (Input dataset)

“Format for the list of genes associated with G2M phase”: File

param-file “File with the list of genes associated with G2M phase”: g2mPhase (Input dataset)

Rename the output CellCycle_Annotated

Cell Cycle Regression

The second step after scoring the cell cycle genes is to use these scores to regress out the effects of the cell cycle. Now that we know which phase each of our cells is in, we can work out how this is affecting gene expression in our cells. We can subtract these effects from our data so that they won’t influence our cell clustering. You will need to type the phase variable in to the Scanpy RegressOut tool to enable it to regress out the cell cycle effects.

The Scanpy RegressOut tool will create a linear model of the relationship between gene expression and the cell cycle scores we assigned in the previous step. Basically, this model is a line that shows how gene expression changes as the S or G2M score changes. Each gene will have its own line, so for any S score or G2M score, we could look at the corresponding point on the line to see the expected expression level of that gene.

Scanpy RegressOut will then regress out or remove this expected effect for the genes expressed by each cell, according to the cell’s S and G2M scores. The expected effect is subtracted from the expression data, leaving behind the difference between the expected position on the line and the actual position of each data point. The data points won’t sit exactly on the line because their expression levels aren’t determined completely by the cell cycle - the linear model only tells us what we would expect based on the cell cycle scores alone.

Understanding how the regression works should help you to see why we’re not just deleting the cell cycle genes from the dataset. We are using these genes that are known to be expressed during different phases to calculate the cell cycle scores and determine the phase each cell is in. We then use this information to work out the effect of the cell cycle on all of the other genes expressed by the cells. Even if they are not cell cycle genes, their expression can still be affected by the cycle. Finally, we remove or regress out the effect of the cell cycle on all the genes, leaving behind the variation that we’re interested in.

Hands On: Regress out the effects of the cell cycle

Scanpy RegressOut ( Galaxy version 1.8.1+galaxy0) with the following parameters:

param-file “Input object in AnnData/Loom format”: CellCycle_Annotated (output of Inspect and manipulate tool)

“Variables to regress out”: phase

Rename the output CellCycle_Regressed

Plotting the Effects of Cell Cycle Regression

Your data is now ready for further analysis, so you can return to the Filter, Plot and Explore Single-cell RNA-seq Data tutorial and move on to the Preparing coordinates step there. Make sure that you use the CellCycle_Regressed dataset (you may want to rename it as Use_me_Scaled as that is the name used in the main tutorial). However, if you want to understand how cell cycle regression has affected your data then you can work through the following steps first, to visualise where the cell cycle genes are expressed - with or without regression.

In order to look at the cell cycle genes, we first need to label them in our AnnData dataset so that we can select them for plotting. To add a new annotation to the genes or variables, we need a column with entries for each gene, in the same order as in the dataset, and with a header at the top that will become the key for identifying these entries in the AnnData dataset. We want to end up with a column that reads TRUE for the 97 cell cycle genes and FALSE for all the other genes.

You might find it easier to create this new column using a spreadsheet and then upload it as a tabular dataset, but it is possible to complete all the steps on Galaxy.

Prepare a table of cell cycle genes

If we’re going to mark all the cell cycle genes, we’ll need a single list of all 97 genes instead of the two separate lists for S Phase and G2/M Phase. We’ll combine the two lists into a single column with 97 entries. We’ll then add a second column that simply reads TRUE in every row, which we’ll use later to mark these as cell cycle genes in the main dataset.

Hands On: Create a list of all cell cycle genes

Concatenate datasets with the following parameters:

param-file “Concatenate Dataset”: sPhase (Input dataset)

In “Dataset”:

param-repeat “Insert Dataset”

param-file “Select”: g2mPhase (Input dataset)

Add column ( Galaxy version 1.0.0) with the following parameters:

“Add this value”: TRUE

param-file “to Dataset”: out_file1 (output of Concatenate datasets tool)

Rename the dataset CC_Genes

Create an ordered list of gene names

Next, we’ll need a list of all the genes in our dataset, so that we can mark the ones that are in our cell cycle list. We’ll also add a column of numbers as this will help us keep the gene names in order.

Hands On: Get the gene names from your dataset

Inspect AnnData ( Galaxy version 0.7.5+galaxy1) with the following parameters:

param-file “Annotated data matrix”: CellCycle_Regressed (Input dataset)

“What to inspect?”: Key-indexed annotation of variables/features (var)

Table Compute ( Galaxy version 1.2.4+galaxy0) with the following parameters:

“Input Single or Multiple Tables”: Single Table

param-file “Table”: var (output of Inspect AnnData tool)

“Type of table operation”: Drop, keep or duplicate rows and columns

“List of columns to select”: 1

“Output formatting options”: Unselect all

Add column ( Galaxy version 1.0.0) with the following parameters:

param-file “to Dataset”: table (output of Table Compute tool)

“Iterate?”: YES

Comment: Keeping the genes in order

Adding these numbers will enable us to keep the genes in their original order. This is essential for adding the cell cycle gene annotation back into the AnnData dataset.

Rename the output Dataset_Genes

Mark the cell cycle genes

We can now combine our table of cell cycle genes CC_genes with the table of gene names Dataset_Genes.

Hands On: Combine the two tables

Join ( Galaxy version 1.1.2) with the following parameters:

param-file “1st file”: Dataset_Genes (output of Add column tool)

“Column to use from 1st file”: c1

param-file “2nd File”: CC_Genes (output of Add column tool)

“Column to use from 2nd file”: c1

“Output lines appearing in”: All lines [-a 1 -a 2]

“Value to put in unpaired (empty) fields”: FALSE

Comment: How the cell cycle genes are marked

When we join the two tables, we’ll ask for any empty fields to be filled in with FALSE. The cell cycle gene table has an extra column where they are all marked as TRUE - they will retain these entries when we join the tables, but since there are no entries for the rest of the genes in this column, their rows will be filled in as FALSE. This will enable us to pick out the cell cycle genes later.

Sort with the following parameters:

param-file “Sort Dataset”: output (output of Join tool)

“on column”: c2

“everything in”: Ascending order

Comment: Putting the genes in order again

Sorting the genes using the column of numbers we added earlier will put them back in their original order - make sure to sort them in ascending order, otherwise they’ll end up the opposite way around.

Question

What would happen if any of the cell cycle genes were not present in the dataset?

How would we remove these genes from the table?

Any cell cycle genes that weren’t in the dataset would have an empty field in the numbered column, which would be filled in with FALSE when we created the table with the Join tool. These rows would appear at the top of the table after it was sorted.

We should check the first rows of the table for any unnumbered genes and then cut these rows out in the next step.

Create the annotation column

We now have a table with all the gene names in the same order as the main dataset and a column indicating which ones are cell cycle genes. If we cut this column out of the table, then we can add it as a new annotation to the main dataset. We’ll also need to add a column header, which will be used as the key for this annotation in the AnnData dataset.

Hands On: Create the cell cycle annotation column
Table Compute ( Galaxy version 1.2.4+galaxy0) with the following parameters:

“Input Single or Multiple Tables”: Single Table

param-file “Table”: out_file1 (output of Sort tool)

“Input data has:”: Unselect all

“Type of table operation”: Drop, keep or duplicate rows and columns

“List of columns to select”: 3

“Output formatting options”: Unselect all

Comment: Removing rows for missing genes

If there were any cell cycle genes that weren’t present in the main dataset, we could remove them at this stage by excluding them from the List of rows to select. As before, if we were using a dataset of a different size, we would need to change this parameter to include all the rest of the rows.
Create a new tabular file from the following
CC_genes
Click galaxy-upload Upload Data at the top of the tool panel

Select galaxy-wf-edit Paste/Fetch Data at the bottom

Paste the file contents into the text field

Change Type from “Auto-detect” to tabular* Press Start and Close the window
Concatenate datasets with the following parameters:

param-file “Concatenate Dataset”: Pasted Entry dataset

In “Dataset”:

param-repeat “Insert Dataset”

param-file “Select”: table (output of Table Compute tool)

Add an annotation to the AnnData

We will need to add the annotation to both the annotated dataset CellCycle_Annotated and to the one that we created by regressing out the cell cycle genes CellCycle_Regressed. This will allow us to plot the cell cycle genes before and after regression. We can do this using the Manipulate Anndata tool and selecting the correct function from the dropdown menu.

Hands On: Add the new annotations

Manipulate AnnData ( Galaxy version 0.7.5+galaxy1) with the following parameters:

param-file “Annotated data matrix”: CellCycle_Annotated (output of Inspect and manipulate tool)

“Function to manipulate the object”: Add new annotation(s) for observations or variables

param-file “Table with new annotations”: out_file1 (output of Concatenate datasets tool)

Rename the output CellCycle_Annotated_CC

Manipulate AnnData ( Galaxy version 0.7.5+galaxy1) with the following parameters:

param-file “Annotated data matrix”: CellCycle_Regressed (output of Scanpy RegressOut tool)

“Function to manipulate the object”: Add new annotation(s) for observations or variables

param-file “Table with new annotations”: out_file1 (output of Concatenate datasets tool)

Rename the output CellCycle_Regressed_CC

Filter the cell cycle genes

To demonstrate the power of cell cycle regression, we’re going to reduce our expression matrices to contain only the 97 cell cycle genes. This will force our dimension reduction and plotting to be based entirely on cell cycle genes. You wouldn’t do this during analysis, but for proof of principle, let’s go for it!

Hands On: Filter the AnnData datasets

Manipulate AnnData ( Galaxy version 0.7.5+galaxy1) with the following parameters:

param-file “Annotated data matrix”: CellCycle_Annotated_CC (output of Manipulate AnnData tool)

“Function to manipulate the object”: Filter observations or variables

“Type of filtering?”: By key (column) values

“Key to filter”: CC_genes

“Type of value to filter”: Boolean

Rename the output CellCycle_Annotated_CC_Only

Manipulate AnnData ( Galaxy version 0.7.5+galaxy1) with the following parameters:

param-file “Annotated data matrix”: CellCycle_Regressed_CC (output of Manipulate AnnData tool)

“Function to manipulate the object”: Filter observations or variables

“Type of filtering?”: By key (column) values

“Key to filter”: CC_genes

“Type of value to filter”: Boolean

Rename the output CellCycle_Regressed_CC_Only

Plot the cell cycle genes before regression

Now that we have a dataset that only includes the cell cycle genes, we can visualise their effects in a PCA plot. We first calculate the PCA coordinates, which are a measure of how similar each pair of cells is in terms of the expression of the 97 cell cycle genes we’ve included in the filtered dataset. We will then visualise the cells on a PCA plot where the axes represent the principal components, which reflect the genes (or groups of genes) that had the biggest impact in these calculations.

You will learn more about plotting your data in the Filter, Plot and Explore tutorial. For now, it is enough to know that each dot on the plot represents a cell and the closer two cells are together, the more similar they are.

Hands On: Create a PCA Plot of cell cycle genes

Cluster, infer trajectories and embed ( Galaxy version 1.7.1+galaxy0) with the following parameters:

param-file “Annotated data matrix”: CellCycle_Annotated_CC_Only (output of Manipulate AnnData tool)

“Method used”: Computes PCA (principal component analysis) coordinates, loadings and variance decomposition, using 'tl.pca'

“Type of PCA?”: Full PCA

Comment: Plot all the genes

Make sure that you de-select the option for the Cluster, infer trajectories and embed tool to use highly variable genes only - some of the cell cycle genes are also HVGs, but we want our plots to include the cell cycle genes that aren’t HVGs too.

Plot ( Galaxy version 1.7.1+galaxy1) with the following parameters:

param-file “Annotated data matrix”: anndata_out (output of Cluster, infer trajectories and embed tool)

“Method used for plotting”: PCA: Plot PCA results, using 'pl.pca_overview'

“Keys for annotations of observations/cells or variables/genes”: phase

In “Plot attributes”:

“Colors to use for plotting categorical annotation groups”: rainbow (Miscellaneous)

Question

Does the plot look as you expected?

The PCA plot shows that the three groups of cells are separated out according to what phase of the cell cycle they are in. This is what we would expect to see as we are only looking at the cell cycle genes, which by definition are expressed during particular phases.

Open image in new tab

Figure 1: PCA Plot of Cell Cycle Genes before regression

Plot the cell cycle genes after regression

We will now repeat the same steps to create a PCA plot of the filtered dataset after the effects of the cell cycle have been regressed out.

Hands On: Recreate the PCA plot of cell cycle genes after regression

Cluster, infer trajectories and embed ( Galaxy version 1.7.1+galaxy0) with the following parameters:

param-file “Annotated data matrix”: CellCycle_Regressed_CC_Only (output of Manipulate AnnData tool)

“Method used”: Computes PCA (principal component analysis) coordinates, loadings and variance decomposition, using 'tl.pca'

“Type of PCA?”: Full PCA

Plot ( Galaxy version 1.7.1+galaxy1) with the following parameters:

param-file “Annotated data matrix”: anndata_out (output of Cluster, infer trajectories and embed tool)

“Method used for plotting”: PCA: Plot PCA results, using 'pl.pca_overview'

“Keys for annotations of observations/cells or variables/genes”: phase

In “Plot attributes”:

“Colors to use for plotting categorical annotation groups”: rainbow (Miscellaneous)

Question

Does the plot look as you expected?

The cells in different phases are now all mixed up together. This makes sense because we are only plotting the cell cycle genes, but the previously strong effects of the cell cycle on these genes have now been regressed out. There are still some differences between the cells (they don’t all end up at the same point on the PCA chart) because the regression only removes the expected effects of the cell cycle, leaving behind any individual variation in the expression of the cell cycle genes.

Open image in new tab

Figure 2: PCA Plot of Cell Cycle Genes after regression

Comparing the before and after plots, we can clearly see that the effects of the cell cycle have been removed. Although you wouldn’t usually need to filter out the cell cycle genes or create these plots when analysing your own data, hopefully you have found doing it now to be helpful for understanding the impact of cell cycle regression.

Question

What impact do you think the cell cycle regression will have when you analyse the whole dataset? What would happen if we plotted all of the genes from the main dataset?

The regression reduces the impact of the cell cycle on the data - this is why the cells are less separated by phase afterwards. When we analyse the whole CellCycle_Regressed dataset, with all of the genes, this could allow other differences in gene expression to become more apparent.

We wouldn’t expect to see such clear distinctions in PCA plots created using all of the genes (not just the cell cycle ones), even before the regression. Although the cell cycle genes can have a significant effect, these won’t be as obvious when other genes are also being taken into account. However, we will still see a difference after we regress out the effects of the cell cycle - the cells in different phases will become more mixed up together. How much of a difference the regression makes will depend on how strong the effects of the cell cycle are in a particular dataset - you can see the effects on this dataset below. You can also replicate these plots after completing the rest of the Filter, Plot and Explore tutorial by colouring your PCA plots by phase.

Open image in new tab

Figure 3: PCA Plot using all genes before regression

Open image in new tab

Figure 4: PCA Plot using all genes after regression

Conclusion

In this tutorial, you have annotated and scored the cell cycle genes and regressed out the effects of the cell cycle. You have also created PCA plots of the data before and after regression to visualise the effects.

You might want to check your results against this example history.

You can now continue to analyse this data by returning to the Preparing coordinates step in the Filter, Plot and Explore tutorial. If you use the CellCycle_Regressed dataset (which you may now want to rename as Use_me_Scaled since that is the name used in the main tutorial), you should notice some differences in your results compared to those shown there because the effects of the cell cycle have been regressed out.

feedback To discuss with like-minded scientists, join our Galaxy Training Network chatspace in Slack and discuss with fellow users of Galaxy single cell analysis tools on #single-cell-users

We also post new tutorials / workflows there from time to time, as well as any other news.

point-right If you’d like to contribute ideas, requests or feedback as part of the wider community building single-cell and spatial resources within Galaxy, you can also join our Single cell & sPatial Omics Community of Practice.

tool You can request tools here on our Single Cell and Spatial Omics Community Tool Request Spreadsheet

You've Finished the Tutorial

Key points

Cell cycle genes can conceal what is happening in your data if cells are grouping together according to their stage in the cycle

Identifying the cell cycle genes and using them to regress out the effects of the cell cycle can reveal underlying patterns in the data

Frequently Asked Questions

Have questions about this tutorial? Have a look at the available FAQ pages and support channels

Useful literature

Further information, including links to documentation and original publications, regarding the tools, analysis techniques and the interpretation of results described in this tutorial can be found here.

References

Bacon, W. A., R. S. Hamilton, Z. Yu, J. Kieckbusch, D. Hawkes et al., 2018 Single-Cell Analysis Identifies Thymic Maturation Delay in Growth-Restricted Neonatal Mice. Frontiers in Immunology 9: 10.3389/fimmu.2018.02523
Cittaro, D., 2018 Cell-Cycle Scoring and Regression. https://nbviewer.org/github/scverse/scanpy_usage/blob/master/180209_cell_cycle/cell_cycle.ipynb
Kirchner, R., and HBC, 2018 Tinyatlas/mus_musculus.CSV at master · HBC/tinyatlas. GitHub. https://github.com/hbc/tinyatlas/blob/master/cell_cycle/Mus_musculus.csv
Luecken, M. D., and F. J. Theis, 2019 Current best practices in single-cell RNA-seq analysis: a tutorial. Molecular Systems Biology 15: 10.15252/msb.20188746
Paul Hoffman, S. L., 2022 Cell-Cycle Scoring and Regression. https://satijalab.org/seurat/articles/cell_cycle_vignette.html#assign-cell-cycle-scores

Feedback

Did you use this material as an instructor? Feel free to give us feedback on how it went.
Did you use this material as a learner or student? Click the form below to leave feedback.

Citing this Tutorial

Marisa Loach, Removing the effects of the cell cycle (Galaxy Training Materials). https://training.galaxyproject.org/training-material/topics/single-cell/tutorials/scrna-case_cell-cycle/tutorial.html Online; accessed TODAY
Hiltemann, Saskia, Rasche, Helena et al., 2023 Galaxy Training: A Powerful Framework for Teaching! PLOS Computational Biology 10.1371/journal.pcbi.1010752
Batut et al., 2018 Community-Driven Data Analysis Training for Biology Cell Systems 10.1016/j.cels.2018.05.012

@misc{single-cell-scrna-case_cell-cycle,
author = "Marisa Loach",
	title = "Removing the effects of the cell cycle (Galaxy Training Materials)",
	year = "",
	month = "",
	day = "",
	url = "\url{https://training.galaxyproject.org/training-material/topics/single-cell/tutorials/scrna-case_cell-cycle/tutorial.html}",
	note = "[Online; accessed TODAY]"
}
@article{Hiltemann_2023,
	doi = {10.1371/journal.pcbi.1010752},
	url = {https://doi.org/10.1371%2Fjournal.pcbi.1010752},
	year = 2023,
	month = {jan},
	publisher = {Public Library of Science ({PLoS})},
	volume = {19},
	number = {1},
	pages = {e1010752},
	author = {Saskia Hiltemann and Helena Rasche and Simon Gladman and Hans-Rudolf Hotz and Delphine Larivi{\`{e}}re and Daniel Blankenberg and Pratik D. Jagtap and Thomas Wollmann and Anthony Bretaudeau and Nadia Gou{\'{e}} and Timothy J. Griffin and Coline Royaux and Yvan Le Bras and Subina Mehta and Anna Syme and Frederik Coppens and Bert Droesbeke and Nicola Soranzo and Wendi Bacon and Fotis Psomopoulos and Crist{\'{o}}bal Gallardo-Alba and John Davis and Melanie Christine Föll and Matthias Fahrner and Maria A. Doyle and Beatriz Serrano-Solano and Anne Claire Fouilloux and Peter van Heusden and Wolfgang Maier and Dave Clements and Florian Heyl and Björn Grüning and B{\'{e}}r{\'{e}}nice Batut and},
	editor = {Francis Ouellette},
	title = {Galaxy Training: A powerful framework for teaching!},
	journal = {PLoS Comput Biol}
}

                   

Congratulations on successfully completing this tutorial!

Do you want to extend your knowledge?
Follow one of our recommended follow-up trainings:

tutorial Hands-on: Importing files from public atlases

You can use Ephemeris's shed-tools install command to install the tools used in this tutorial.

shed-tools install [-g GALAXY] [-a API_KEY] -t <(curl https://training.galaxyproject.org/training-material/api/topics/single-cell/tutorials/scrna-case_cell-cycle/tutorial.json | jq .admin_install_yaml -r)

Alternatively you can copy and paste the following YAML

---
install_tool_dependencies: true
install_repository_dependencies: true
install_resolver_dependencies: true
tools:
- name: text_processing
  owner: bgruening
  revisions: d698c222f354
  tool_panel_section_label: Text Manipulation
  tool_shed_url: https://toolshed.g2.bx.psu.edu/
- name: add_value
  owner: devteam
  revisions: 745871c0b055
  tool_panel_section_label: Text Manipulation
  tool_shed_url: https://toolshed.g2.bx.psu.edu/
- name: scanpy_regress_variable
  owner: ebi-gxa
  revisions: 36daab33aecf
  tool_panel_section_label: Single-cell
  tool_shed_url: https://toolshed.g2.bx.psu.edu/
- name: anndata_inspect
  owner: iuc
  revisions: ee98d611afc6
  tool_panel_section_label: Single-cell
  tool_shed_url: https://toolshed.g2.bx.psu.edu/
- name: anndata_manipulate
  owner: iuc
  revisions: 3d748954434b
  tool_panel_section_label: Single-cell
  tool_shed_url: https://toolshed.g2.bx.psu.edu/
- name: scanpy_cluster_reduce_dimension
  owner: iuc
  revisions: aaa5da8e73a9
  tool_panel_section_label: Single-cell
  tool_shed_url: https://toolshed.g2.bx.psu.edu/
- name: scanpy_inspect
  owner: iuc
  revisions: c5d3684f7c4c
  tool_panel_section_label: Single-cell
  tool_shed_url: https://toolshed.g2.bx.psu.edu/
- name: scanpy_plot
  owner: iuc
  revisions: aa0c474463c2
  tool_panel_section_label: Single-cell
  tool_shed_url: https://toolshed.g2.bx.psu.edu/
- name: table_compute
  owner: iuc
  revisions: 3bf5661c0280
  tool_panel_section_label: Text Manipulation
  tool_shed_url: https://toolshed.g2.bx.psu.edu/

No feedback has been recieved yet for this training. Be the first one by filling in the feedback form.