Removing the effects of the cell cycle

Author(s) orcid logoMarisa Loach avatar Marisa Loach
Editor(s) orcid logoWendi Bacon avatar Wendi Bacon
Tester(s) orcid logoGraeme Tyson avatar Graeme Tyson
Reviewers Helena Rasche avatar Marisa Loach avatar Saskia Hiltemann avatar Wendi Bacon avatar Julia Jakiela avatar Mehmet Tekman avatar
Overview
Creative Commons License: CC-BY Questions:
  • How can I reduce the effects of the cell cycle on my scRNA-seq data?

Objectives:
  • Identify the cell cycle genes

  • Use the cell cycle genes to regress out the effects of the cell cycle

  • Create PCA plots to understand the impact of the regression

Requirements:
Time estimation: 1 hour
Supporting Materials:
Published: Jan 25, 2023
Last modification: Jun 14, 2024
License: Tutorial Content is licensed under Creative Commons Attribution 4.0 International License. The GTN Framework is licensed under MIT
purl PURL: https://gxy.io/GTN:T00248
version Revision: 11

Single-cell RNA sequencing can be sensitive to both biological and technical variation, which is why preparing your data carefully is an important part of the analysis. You want the results to reflect the interesting differences in expression between cells that relate to their type or state. Other sources of variation can conceal or confound this, making it harder for you to see what is going on.

One common biological confounder is the cell cycle (Luecken and Theis 2019). Cells express different genes during different parts of the cell cycle, depending on whether they are in their growing phase (G1), duplicating their DNA (the S or Synthesis phase), or dividing in two (G2/M or Mitosis phase). If these cell cycle genes are having a big impact on your data, then you could end up with separate clusters that actually represent cells of the same type that are just at different stages of the cycle.

In this tutorial, we will identify the genes whose expression is known to vary during the cell cycle so that we can use them to regress out (or remove) the effects of the cell cycle on the clustering.

Comment: Other Scanpy and Seurat tutorials

This tutorial is based on the Scanpy cell cycle regression tutorial (Cittaro 2018), which was itself based on the Seurat vignette addressing the same issue (Paul Hoffman 2022). However, we will be using a different dataset for this tutorial.

Agenda

In this tutorial, we will cover:

  1. Get Data
  2. Important tips for easier analysis
  3. Cell Cycle Scoring
  4. Cell Cycle Regression
  5. Plotting the Effects of Cell Cycle Regression
    1. Prepare a table of cell cycle genes
    2. Create an ordered list of gene names
    3. Mark the cell cycle genes
    4. Create the annotation column
    5. Add an annotation to the AnnData
    6. Filter the cell cycle genes
    7. Plot the cell cycle genes before regression
    8. Plot the cell cycle genes after regression
  6. Conclusion

Get Data

The data used in this tutorial is from a mouse dataset of fetal growth restriction (Bacon et al. 2018). Cell cycle regression should be performed after the data has been filtered, normalised, and scaled. You can download the dataset below or import the history with the starting data.

Comment
  • If you’ve been working through the Single-cell RNA-seq: Case Study then you can use your dataset from the Filter, Plot and Explore Single-cell RNA-seq Data tutorial here. Select the Use_me_Scaled dataset from your history. Rename that dataset as Processed_Anndata. You will still need to import the S and G2/M gene lists below through Zenodo.
  • At the end of this tutorial, you can return to the main tutorial to plot and explore your data with reduced effects from the cell cycle.

In addition to the scRNA-seq dataset, we will also need lists of the genes that are known to be expressed at different points in the cell cycle. The lists used in this tutorial are part of the HBC tinyatlas and can be downloaded from Zenodo below Kirchner and HBC 2018. Between them, they include 97 genes that are expressed during the S and G2/M phases. The expression level of these cycle genes are - mostly - determined by the phases of the cell cycle. Make sure that the file type is tabular (not just the name of the file) - you can choose this when you download the files or change it after the files are in your history.

Hands-on: Option 1: Data upload - Import history

Sometimes data upload can take a while, so a faster route is to import a history.

  1. Import history from: example input history

    1. Open the link to the shared history
    2. Click on the Import this history button on the top left
    3. Enter a title for the new history
    4. Click on Copy History

  2. Rename galaxy-pencil the the history to your name of choice.

Hands-on: Option 2: Data upload - Add to history
  1. Create a new history for this tutorial

  2. Import the AnnData object from Zenodo

    https://zenodo.org/record/7311628/files/Processed_AnnData.h5ad
    
    • Copy the link location
    • Click galaxy-upload Upload Data at the top of the tool panel

    • Select galaxy-wf-edit Paste/Fetch Data
    • Paste the link(s) into the text field

    • Press Start

    • Close the window

  3. Rename the dataset Processed_Anndata

    • Click on the galaxy-pencil pencil icon for the dataset to edit its attributes
    • In the central panel, change the Name field
    • Click the Save button

  4. Check that the datatype is h5ad

    • Click on the galaxy-pencil pencil icon for the dataset to edit its attributes
    • In the central panel, click galaxy-chart-select-data Datatypes tab on the top
    • In the galaxy-chart-select-data Assign Datatype, select datatypes from “New type” dropdown
      • Tip: you can start typing the datatype into the field to filter the dropdown menu
    • Click the Save button

Hands-on: Option 2 continued: Data upload - Add to history
  1. Import the files from Zenodo

    https://zenodo.org/record/7311628//files/sPhase.tabular
    https://zenodo.org/record/7311628//files/g2mPhase.tabular
    
    • Copy the link location
    • Click galaxy-upload Upload Data at the top of the tool panel

    • Select galaxy-wf-edit Paste/Fetch Data
    • Paste the link(s) into the text field

    • Press Start

    • Close the window

  2. Rename the datasets sPhase and g2mPhase respectively - be careful not to mix them up!

    • Click on the galaxy-pencil pencil icon for the dataset to edit its attributes
    • In the central panel, change the Name field
    • Click the Save button

  3. Check that the datatype for both is tabular

    • Click on the galaxy-pencil pencil icon for the dataset to edit its attributes
    • In the central panel, click galaxy-chart-select-data Datatypes tab on the top
    • In the galaxy-chart-select-data Assign Datatype, select datatypes from “New type” dropdown
      • Tip: you can start typing the datatype into the field to filter the dropdown menu
    • Click the Save button

Important tips for easier analysis

Tools are frequently updated to new versions. Your Galaxy may have multiple versions of the same tool available. By default, you will be shown the latest version of the tool. This may NOT be the same tool used in the tutorial you are accessing. Furthermore, if you use a newer tool in one step, and try using an older tool in the next step… this may fail! To ensure you use the same tool versions of a given tutorial, use the Tutorial mode feature.

  • Open your Galaxy server
  • Click on the curriculum icon on the top menu, this will open the GTN inside Galaxy.
  • Navigate to your tutorial
  • Tool names in tutorials will be blue buttons that open the correct tool for you
  • Note: this does not work for all tutorials (yet) gif showing how GTN-in-Galaxy works
  • You can click anywhere in the grey-ed out area outside of the tutorial box to return back to the Galaxy analytical interface
Warning: Not all browsers work!
  • We’ve had some issues with Tutorial mode on Safari for Mac users.
  • Try a different browser if you aren’t seeing the button.

Did you know we have a unique Single Cell Omics Lab with all our single cell tools highlighted to make it easier to use on Galaxy? We recommend this site for all your single cell analysis needs, particularly for newer users.

The Single Cell Omics Lab currently uses the main European Galaxy infrastructure and power, it’s just organised better for users of particular analyses…like single cell!

Try it out!

Cell Cycle Scoring

The first step towards reducing the effects of the cell cycle on our dataset is cell cycle scoring. The cell cycle scoring algorithm will look at each cell in turn and calculate an S score based on the difference in the mean expression of the S Phase genes and a random sample of the same number of non-cell cycle genes from the same cell. It will do the same for the G2M genes in order to calculate the G2M score. The cells will then be assigned to the most likely phase: S, G2M, or G1, if neither G2M or S score highly. Three columns will be added to the AnnData dataset: S_score, G2M_score and phase.

Question
  1. Why don’t we need a list of genes expressed in the G1 Phase?
  1. Since we know which genes are expressed in the S and G2/M phases, we can classify cells that are expressing these genes into the S and G2/M phases respectively. Cells that aren’t expressing either the S or G2/M genes must be in the other phase of the cell cycle, so we can classify them as in G1 phase.
Comment: When should we regress out the effects of the cell cycle?

Cell cycle regression can be particularly important if we are planning to do trajectory analysis down the line or if we have a dataset that is very strongly influenced by the cell cycle Luecken and Theis 2019. However, it isn’t always appropriate to remove the effects of the cell cycle - sometimes they can be useful for distinguishing between dividing and non-dividing cell types. When you are analysing your own data, you might need to try it both ways to determine if the effects of the cell cycle are helpful or not. You could also check whether the cell cycle genes are among the top scoring genes expressed by your cell clusters to get an idea of how strong the effects are.

Hands-on: Score the cell cycle genes
  1. Inspect and manipulate ( Galaxy version 1.7.1+galaxy0) with the following parameters:
    • param-file “Annotated data matrix”: Processed_Anndata (Input dataset)
    • “Method used for inspecting”: Score cell cycle genes, using 'tl.score_genes_cell_cycle'
      • “Format for the list of genes associated with S phase”: File
        • param-file “File with the list of genes associated with S phase”: sPhase (Input dataset)
      • “Format for the list of genes associated with G2M phase”: File
        • param-file “File with the list of genes associated with G2M phase”: g2mPhase (Input dataset)
  2. Rename the output CellCycle_Annotated

Cell Cycle Regression

The second step after scoring the cell cycle genes is to use these scores to regress out the effects of the cell cycle. Now that we know which phase each of our cells is in, we can work out how this is affecting gene expression in our cells. We can subtract these effects from our data so that they won’t influence our cell clustering. You will need to type the phase variable in to the Scanpy RegressOut tool to enable it to regress out the cell cycle effects.

The Scanpy RegressOut tool will create a linear model of the relationship between gene expression and the cell cycle scores we assigned in the previous step. Basically, this model is a line that shows how gene expression changes as the S or G2M score changes. Each gene will have its own line, so for any S score or G2M score, we could look at the corresponding point on the line to see the expected expression level of that gene.

Scanpy RegressOut will then regress out or remove this expected effect for the genes expressed by each cell, according to the cell’s S and G2M scores. The expected effect is subtracted from the expression data, leaving behind the difference between the expected position on the line and the actual position of each data point. The data points won’t sit exactly on the line because their expression levels aren’t determined completely by the cell cycle - the linear model only tells us what we would expect based on the cell cycle scores alone.

Understanding how the regression works should help you to see why we’re not just deleting the cell cycle genes from the dataset. We are using these genes that are known to be expressed during different phases to calculate the cell cycle scores and determine the phase each cell is in. We then use this information to work out the effect of the cell cycle on all of the other genes expressed by the cells. Even if they are not cell cycle genes, their expression can still be affected by the cycle. Finally, we remove or regress out the effect of the cell cycle on all the genes, leaving behind the variation that we’re interested in.

Hands-on: Regress out the effects of the cell cycle
  1. Scanpy RegressOut ( Galaxy version 1.8.1+galaxy0) with the following parameters:
    • param-file “Input object in AnnData/Loom format”: CellCycle_Annotated (output of Inspect and manipulate tool)
    • “Variables to regress out”: phase
  2. Rename the output CellCycle_Regressed

Plotting the Effects of Cell Cycle Regression

Your data is now ready for further analysis, so you can return to the Filter, Plot and Explore Single-cell RNA-seq Data tutorial and move on to the Preparing coordinates step there. Make sure that you use the CellCycle_Regressed dataset (you may want to rename it as Use_me_Scaled as that is the name used in the main tutorial). However, if you want to understand how cell cycle regression has affected your data then you can work through the following steps first, to visualise where the cell cycle genes are expressed - with or without regression.

In order to look at the cell cycle genes, we first need to label them in our AnnData dataset so that we can select them for plotting. To add a new annotation to the genes or variables, we need a column with entries for each gene, in the same order as in the dataset, and with a header at the top that will become the key for identifying these entries in the AnnData dataset. We want to end up with a column that reads TRUE for the 97 cell cycle genes and FALSE for all the other genes.

You might find it easier to create this new column using a spreadsheet and then upload it as a tabular dataset, but it is possible to complete all the steps on Galaxy.

Prepare a table of cell cycle genes

If we’re going to mark all the cell cycle genes, we’ll need a single list of all 97 genes instead of the two separate lists for S Phase and G2/M Phase. We’ll combine the two lists into a single column with 97 entries. We’ll then add a second column that simply reads TRUE in every row, which we’ll use later to mark these as cell cycle genes in the main dataset.

Hands-on: Create a list of all cell cycle genes
  1. Concatenate datasets with the following parameters:
    • param-file “Concatenate Dataset”: sPhase (Input dataset)
    • In “Dataset”:
      • param-repeat “Insert Dataset”
        • param-file “Select”: g2mPhase (Input dataset)
  2. Add column ( Galaxy version 1.0.0) with the following parameters:
    • “Add this value”: TRUE
    • param-file “to Dataset”: out_file1 (output of Concatenate datasets tool)
  3. Rename the dataset CC_Genes

Create an ordered list of gene names

Next, we’ll need a list of all the genes in our dataset, so that we can mark the ones that are in our cell cycle list. We’ll also add a column of numbers as this will help us keep the gene names in order.

Hands-on: Get the gene names from your dataset
  1. Inspect AnnData ( Galaxy version 0.7.5+galaxy1) with the following parameters:
    • param-file “Annotated data matrix”: CellCycle_Regressed (Input dataset)
    • “What to inspect?”: Key-indexed annotation of variables/features (var)
  2. Table Compute ( Galaxy version 1.2.4+galaxy0) with the following parameters:
    • “Input Single or Multiple Tables”: Single Table
      • param-file “Table”: var (output of Inspect AnnData tool)
      • “Type of table operation”: Drop, keep or duplicate rows and columns
        • “List of columns to select”: 1
    • “Output formatting options”: Unselect all
  3. Add column ( Galaxy version 1.0.0) with the following parameters:
    • param-file “to Dataset”: table (output of Table Compute tool)
    • “Iterate?”: YES
    Comment: Keeping the genes in order

    Adding these numbers will enable us to keep the genes in their original order. This is essential for adding the cell cycle gene annotation back into the AnnData dataset.

  4. Rename the output Dataset_Genes

Mark the cell cycle genes

We can now combine our table of cell cycle genes CC_genes with the table of gene names Dataset_Genes.

Hands-on: Combine the two tables
  1. Join ( Galaxy version 1.1.2) with the following parameters:
    • param-file “1st file”: Dataset_Genes (output of Add column tool)
    • “Column to use from 1st file”: c1
    • param-file “2nd File”: CC_Genes (output of Add column tool)
    • “Column to use from 2nd file”: c1
    • “Output lines appearing in”: All lines [-a 1 -a 2]
    • “Value to put in unpaired (empty) fields”: FALSE
    Comment: How the cell cycle genes are marked

    When we join the two tables, we’ll ask for any empty fields to be filled in with FALSE. The cell cycle gene table has an extra column where they are all marked as TRUE - they will retain these entries when we join the tables, but since there are no entries for the rest of the genes in this column, their rows will be filled in as FALSE. This will enable us to pick out the cell cycle genes later.

  2. Sort with the following parameters:
    • param-file “Sort Dataset”: output (output of Join tool)
    • “on column”: c2
    • “everything in”: Ascending order
    Comment: Putting the genes in order again

    Sorting the genes using the column of numbers we added earlier will put them back in their original order - make sure to sort them in ascending order, otherwise they’ll end up the opposite way around.

Question
  1. What would happen if any of the cell cycle genes were not present in the dataset?
  2. How would we remove these genes from the table?
  1. Any cell cycle genes that weren’t in the dataset would have an empty field in the numbered column, which would be filled in with FALSE when we created the table with the Join tool. These rows would appear at the top of the table after it was sorted.
  2. We should check the first rows of the table for any unnumbered genes and then cut these rows out in the next step.

Create the annotation column

We now have a table with all the gene names in the same order as the main dataset and a column indicating which ones are cell cycle genes. If we cut this column out of the table, then we can add it as a new annotation to the main dataset. We’ll also need to add a column header, which will be used as the key for this annotation in the AnnData dataset.

Hands-on: Create the cell cycle annotation column
  1. Table Compute ( Galaxy version 1.2.4+galaxy0) with the following parameters:
    • “Input Single or Multiple Tables”: Single Table
      • param-file “Table”: out_file1 (output of Sort tool)
    • “Input data has:”: Unselect all
      • “Type of table operation”: Drop, keep or duplicate rows and columns
        • “List of columns to select”: 3
    • “Output formatting options”: Unselect all
    Comment: Removing rows for missing genes

    If there were any cell cycle genes that weren’t present in the main dataset, we could remove them at this stage by excluding them from the List of rows to select. As before, if we were using a dataset of a different size, we would need to change this parameter to include all the rest of the rows.

  2. Create a new tabular file from the following

    CC_genes
    
    • Click galaxy-upload Upload Data at the top of the tool panel
    • Select galaxy-wf-edit Paste/Fetch Data at the bottom
    • Paste the file contents into the text field
    • Change Type from “Auto-detect” to tabular* Press Start and Close the window

  3. Concatenate datasets with the following parameters:
    • param-file “Concatenate Dataset”: Pasted Entry dataset
    • In “Dataset”:
      • param-repeat “Insert Dataset”
        • param-file “Select”: table (output of Table Compute tool)

Add an annotation to the AnnData

We will need to add the annotation to both the annotated dataset CellCycle_Annotated and to the one that we created by regressing out the cell cycle genes CellCycle_Regressed. This will allow us to plot the cell cycle genes before and after regression. We can do this using the Manipulate Anndata tool and selecting the correct function from the dropdown menu.

Hands-on: Add the new annotations
  1. Manipulate AnnData ( Galaxy version 0.7.5+galaxy1) with the following parameters:
    • param-file “Annotated data matrix”: CellCycle_Annotated (output of Inspect and manipulate tool)
    • “Function to manipulate the object”: Add new annotation(s) for observations or variables
      • param-file “Table with new annotations”: out_file1 (output of Concatenate datasets tool)
  2. Rename the output CellCycle_Annotated_CC

  3. Manipulate AnnData ( Galaxy version 0.7.5+galaxy1) with the following parameters:
    • param-file “Annotated data matrix”: CellCycle_Regressed (output of Scanpy RegressOut tool)
    • “Function to manipulate the object”: Add new annotation(s) for observations or variables
      • param-file “Table with new annotations”: out_file1 (output of Concatenate datasets tool)
  4. Rename the output CellCycle_Regressed_CC

Filter the cell cycle genes

To demonstrate the power of cell cycle regression, we’re going to reduce our expression matrices to contain only the 97 cell cycle genes. This will force our dimension reduction and plotting to be based entirely on cell cycle genes. You wouldn’t do this during analysis, but for proof of principle, let’s go for it!

Hands-on: Filter the AnnData datasets
  1. Manipulate AnnData ( Galaxy version 0.7.5+galaxy1) with the following parameters:
    • param-file “Annotated data matrix”: CellCycle_Annotated_CC (output of Manipulate AnnData tool)
    • “Function to manipulate the object”: Filter observations or variables
      • “Type of filtering?”: By key (column) values
        • “Key to filter”: CC_genes
        • “Type of value to filter”: Boolean
  2. Rename the output CellCycle_Annotated_CC_Only

  3. Manipulate AnnData ( Galaxy version 0.7.5+galaxy1) with the following parameters:
    • param-file “Annotated data matrix”: CellCycle_Regressed_CC (output of Manipulate AnnData tool)
    • “Function to manipulate the object”: Filter observations or variables
      • “Type of filtering?”: By key (column) values
        • “Key to filter”: CC_genes
        • “Type of value to filter”: Boolean
  4. Rename the output CellCycle_Regressed_CC_Only

Plot the cell cycle genes before regression

Now that we have a dataset that only includes the cell cycle genes, we can visualise their effects in a PCA plot. We first calculate the PCA coordinates, which are a measure of how similar each pair of cells is in terms of the expression of the 97 cell cycle genes we’ve included in the filtered dataset. We will then visualise the cells on a PCA plot where the axes represent the principal components, which reflect the genes (or groups of genes) that had the biggest impact in these calculations.

You will learn more about plotting your data in the Filter, Plot and Explore tutorial. For now, it is enough to know that each dot on the plot represents a cell and the closer two cells are together, the more similar they are.

Hands-on: Create a PCA Plot of cell cycle genes
  1. Cluster, infer trajectories and embed ( Galaxy version 1.7.1+galaxy0) with the following parameters:
    • param-file “Annotated data matrix”: CellCycle_Annotated_CC_Only (output of Manipulate AnnData tool)
    • “Method used”: Computes PCA (principal component analysis) coordinates, loadings and variance decomposition, using 'tl.pca'
      • “Type of PCA?”: Full PCA
    Comment: Plot all the genes

    Make sure that you de-select the option for the Cluster, infer trajectories and embed tool to use highly variable genes only - some of the cell cycle genes are also HVGs, but we want our plots to include the cell cycle genes that aren’t HVGs too.

  2. Plot ( Galaxy version 1.7.1+galaxy1) with the following parameters:
    • param-file “Annotated data matrix”: anndata_out (output of Cluster, infer trajectories and embed tool)
    • “Method used for plotting”: PCA: Plot PCA results, using 'pl.pca_overview'
      • “Keys for annotations of observations/cells or variables/genes”: phase
      • In “Plot attributes”:
        • “Colors to use for plotting categorical annotation groups”: rainbow (Miscellaneous)
Question
  1. Does the plot look as you expected?
  1. The PCA plot shows that the three groups of cells are separated out according to what phase of the cell cycle they are in. This is what we would expect to see as we are only looking at the cell cycle genes, which by definition are expressed during particular phases.
PCA plot showing three separate clusters of cells in the G1, S and G2M Phases. Open image in new tab

Figure 1: PCA Plot of Cell Cycle Genes before regression

Plot the cell cycle genes after regression

We will now repeat the same steps to create a PCA plot of the filtered dataset after the effects of the cell cycle have been regressed out.

Hands-on: Recreate the PCA plot of cell cycle genes after regression
  1. Cluster, infer trajectories and embed ( Galaxy version 1.7.1+galaxy0) with the following parameters:
    • param-file “Annotated data matrix”: CellCycle_Regressed_CC_Only (output of Manipulate AnnData tool)
    • “Method used”: Computes PCA (principal component analysis) coordinates, loadings and variance decomposition, using 'tl.pca'
      • “Type of PCA?”: Full PCA
  2. Plot ( Galaxy version 1.7.1+galaxy1) with the following parameters:
    • param-file “Annotated data matrix”: anndata_out (output of Cluster, infer trajectories and embed tool)
    • “Method used for plotting”: PCA: Plot PCA results, using 'pl.pca_overview'
      • “Keys for annotations of observations/cells or variables/genes”: phase
      • In “Plot attributes”:
        • “Colors to use for plotting categorical annotation groups”: rainbow (Miscellaneous)
Question
  1. Does the plot look as you expected?
  1. The cells in different phases are now all mixed up together. This makes sense because we are only plotting the cell cycle genes, but the previously strong effects of the cell cycle on these genes have now been regressed out. There are still some differences between the cells (they don’t all end up at the same point on the PCA chart) because the regression only removes the expected effects of the cell cycle, leaving behind any individual variation in the expression of the cell cycle genes.
PCA plot showing one big cluster with the cells from G1, S and G2M Phases all mixed up together. Open image in new tab

Figure 2: PCA Plot of Cell Cycle Genes after regression

Comparing the before and after plots, we can clearly see that the effects of the cell cycle have been removed. Although you wouldn’t usually need to filter out the cell cycle genes or create these plots when analysing your own data, hopefully you have found doing it now to be helpful for understanding the impact of cell cycle regression.

Question
  1. What impact do you think the cell cycle regression will have when you analyse the whole dataset? What would happen if we plotted all of the genes from the main dataset?
  1. The regression reduces the impact of the cell cycle on the data - this is why the cells are less separated by phase afterwards. When we analyse the whole CellCycle_Regressed dataset, with all of the genes, this could allow other differences in gene expression to become more apparent.

We wouldn’t expect to see such clear distinctions in PCA plots created using all of the genes (not just the cell cycle ones), even before the regression. Although the cell cycle genes can have a significant effect, these won’t be as obvious when other genes are also being taken into account. However, we will still see a difference after we regress out the effects of the cell cycle - the cells in different phases will become more mixed up together. How much of a difference the regression makes will depend on how strong the effects of the cell cycle are in a particular dataset - you can see the effects on this dataset below. You can also replicate these plots after completing the rest of the Filter, Plot and Explore tutorial by colouring your PCA plots by phase.

PCA plot showing some separation between cells in the G1, S and G2M Phases before regression. Open image in new tab

Figure 3: PCA Plot using all genes before regression
PCA plot showing that cells in the G1, S and G2M Phases are more mixed up with each other after the regression. Open image in new tab

Figure 4: PCA Plot using all genes after regression

Conclusion

In this tutorial, you have annotated and scored the cell cycle genes and regressed out the effects of the cell cycle. You have also created PCA plots of the data before and after regression to visualise the effects.

You might want to check your results against this example history.

You can now continue to analyse this data by returning to the Preparing coordinates step in the Filter, Plot and Explore tutorial. If you use the CellCycle_Regressed dataset (which you may now want to rename as Use_me_Scaled since that is the name used in the main tutorial), you should notice some differences in your results compared to those shown there because the effects of the cell cycle have been regressed out.

feedback To discuss with like-minded scientists, join our Galaxy Training Network chatspace in Slack and discuss with fellow users of Galaxy single cell analysis tools on #single-cell-users

We also post new tutorials / workflows there from time to time, as well as any other news.

point-right If you’d like to contribute ideas, requests or feedback as part of the wider community building single-cell and spatial resources within Galaxy, you can also join our Single cell & sPatial Omics Community of Practice.

tool You can request tools here on our Single Cell and Spatial Omics Community Tool Request Spreadsheet