Bioinformatics Projects: Using deconvolution to get new insights from old bulk RNA-seq data

purlPURL: https://gxy.io/GTN:P00026
Comment: What is a Learning Pathway?
A graphic depicting a winding path from a start symbol to a trophy, with tutorials along the way
We recommend you follow the tutorials in the order presented on this page. They have been selected to fit together and build up your knowledge step by step. If a lesson has both slides and a tutorial, we recommend you start with the slides, then proceed with the tutorial.

Are you an educator looking for project ideas for students to practice independent enquiry and research skills? Are you a student looking for a project idea? Look no more - here, you will find a learning pathway of tutorials that can guide you through the skills to find old data and transform it into new results!

To be clear, we will only provide the methods - you will need to come up with your own research question by exploring the literature and available public datasets, apply these analyses, and interpret the results. Your research question will take the form of, “How does variable X impact the cell type proportions in issue/sample/organism Y?”

Note: You will need to be familiar with the Galaxy interface and single-cell RNA-seq analysis in general to follow this Learning Pathway. You can do so by completing the Introduction to single-cell analysis learning pathway. It would be a bonus to also complete the Beyond single cell learning pathway to reinforce that knowledge.

For support throughout these tutorials, join our Galaxy single cell chat group on Matrix to ask questions!

Need a short bioinformatics project idea? Follow this learning path to create new insights from old data!

Module 1: What's deconvolution?

First, you will learn about the concept of deconvolution. This will help you formulate your question and identify datasets next.

Time estimation: 2 hours

Learning Objectives
  • Construct Bulk and scRNA Expression Set Objects
  • Inspect these objects for various properties
  • Measure the abundance of certain cell type cluster markers compared to others
Lesson Slides Hands-on Recordings
Bulk RNA Deconvolution with MuSiC

Module 2: Picking & importing a dataset

Next, you will need to pick a bulk RNA-seq dataset, along with a corresponding single-cell dataset as a reference to perform deconvolution. You will need to then transform these datasets into ESet objects. We have set up these tutorials to work with datasets from the European Bioinformatics Institute, because these are carefully curated and work with our workflows. You can try others, but you may experience challenges.

Time estimation: 2 hours 15 minutes

Learning Objectives
  • You will retrieve raw data from the EMBL-EBI Expression Atlas.
  • You will manipulate the metadata and matrix files.
  • You will combine the metadata and matrix files into an ESet object for MuSiC deconvolution.
  • You will create multiple ESet objects - both combined and separated out by disease phenotype for your bulk dataset.
  • You will retrieve raw data from the EBI Single Cell Expression Atlas and Human Cell Atlas.
  • You will manipulate the metadata and matrix files.
  • You will combine the metadata and matrix files into an AnnData or Seurat object for downstream analysis.
  • You will retrieve raw data from the EMBL-EBI Single cell expression atlas.
  • You will manipulate the metadata and matrix files.
  • You will combine the metadata and matrix files into an ESet object for MuSiC deconvolution.
  • You will create multiple ESet objects - both combined and separated out by disease phenotype for your single cell reference.
Lesson Slides Hands-on Recordings
Bulk matrix to ESet | Creating the bulk RNA-seq dataset for deconvolution
Importing files from public atlases
Matrix Exchange Format to ESet | Creating a single-cell RNA-seq reference dataset for deconvolution

Module 3: Does my reference work well?

Next, you will benchmark your reference dataset to see how accurate it is at inferring proportions. If it does not work well, you may need to pick a different dataset and try again!

Time estimation: 2 hours

Learning Objectives
  • Generate psuedo-bulk data from single-cell RNA data
  • Process the single-cell and psuedo-bulk data using various deconvolution tools
  • Evaluate and visualse the results of the different deconvolution methods
Lesson Slides Hands-on Recordings
Evaluating Reference Data for Bulk RNA Deconvolution

Module 4: Analysing your data!

At long last, you’ve done all the hard work of learning about deconvolution, picking your datasets, reformatting them for use, and making sure your reference is of a high quality. You can now finally infer cell proportions from your bulk RNA-seq samples, and compare them across a variable of interest!

Time estimation: 1 hour

Learning Objectives
  • Apply the MuSiC deconvolution to samples and compare the cell type distributions
  • Compare the results from analysing different types of input, for example, whether combining disease and healthy references or not yields better results
Lesson Slides Hands-on Recordings
Comparing inferred cell compositions using MuSiC deconvolution

The End!

And now you’re done! We hope that you generated interesting results that you are able to write up in a fantastic project. We would love to hear from you if you have! Contact us via our Galaxy single cell chat group on Matrix. Alternatively, if you prefer Slack, join the GTN’s Slack workspace and message our #single-cell-users channel.

You will find more features, tips and tricks in our general Galaxy Single-cell Training page.


Editorial Board

This material is reviewed by our Editorial Board:

orcid logoWendi Bacon avatar Wendi Baconorcid logoMorgan Howells avatar Morgan HowellsMehmet Tekman avatar Mehmet Tekman

Funding

These individuals or organisations provided funding support for the development of this resource