# Comparing inferred cell compositions using MuSiC deconvolution

 Author(s) Wendi BaconMehmet Tekman Tester(s) Marisa Loach
Overview
Questions:
• How do the cell type distributions vary in bulk RNA samples across my variable of interest?

• For example, are beta cell proportions different in the pancreas data from diabetes and healthy patients?

Objectives:
• Apply the MuSiC deconvolution to samples and compare the cell type distributions

• Compare the results from analysing different types of input, for example, whether combining disease and healthy references or not yields better results

Requirements:
Time estimation: 1 hour
Supporting Materials:
Last modification: Feb 3, 2023

# Introduction

The goal of this tutorial is to apply bulk RNA deconvolution techniques to a problem with multiple variables - in this case, a model of diabetes is compared with its healthy counterparts. All you need to compare inferred cell compositions are well-annotated, high quality reference scRNA-seq datasets, transformed into MuSiC-friendly Expression Set objects, and your bulk RNA-samples of choice (also transformed into MuSiC-friendly Expression Set objects). For more information on how MuSiC works, you can check out their github site MuSiC or published article (Wang et al. 2019).

Comment: Research question
• How does variable X impact the cell distributions in my samples?
• Needs: scRNA-seq reference dataset; bulk RNA-seq samples of interest to compare
Agenda

In this tutorial, we will cover:

# Data

In the standard MuSiC tutorial, we used human pancreas data. We will now use the same single cell reference dataset Segerstolpe et al. 2016 with its 10 samples of 6 healthy subjects and 4 with Type-II diabetes (T2D), as well as the bulk RNA-samples from the same lab (3 healthy, 4 diseased). Both of these datasets were accessed from the public EMBL-EBI repositories and transformed into Expression Set objects in the previous two tutorials. For both the single cell reference and the bulk samples of interest, you have generated Expression Set objects with only T2D samples, only healthy samples, and a final everything-combined sample for the scRNA reference. We won’t need the combined bulk RNA dataset. The plan is to analyse this data in three ways: using a combined reference (altogether); using only the healthy single cell reference (healthyscref); or using a healthy and combined reference separately (like4like), all to identify differences in cellular composition.

If you have followed the previous tutorials to create your own bulk and single cell expression sets, then you can copy these into a new history now. Otherwise, follow the steps below to import the datasets you’ll need.

There 3 ways to copy datasets between histories

1. From the original history

1. Click on the galaxy-gear icon (History options) on the top of the history panel
2. Click on Copy Dataset
3. Select the desired files

4. Give a relevant name to the “New history”

5. Click on the new history name in the green box that have just appear to switch to this history
2. From the galaxy-columns View all histories

1. Click on galaxy-columns View all histories on the top right
2. Switch to the history in which the dataset should be copied
3. Drag the dataset to copy from its original history
4. Drop it in the target history
3. From the target history

1. Click on User in the top bar
2. Click on Datasets
3. Search for the dataset to copy
4. Click on it
5. Click on Copy to History

## Get data

1. Create a new history for this tutorial “Deconvolution: Compare”
2. Import the files from Zenodo

• Human single cell RNA ESet objects (tag: #singlecell)

https://zenodo.org/record/7319925/files/ESet_object_sc_combined.rdata
https://zenodo.org/record/7319925/files/ESet_object_sc_T2D.rdata
https://zenodo.org/record/7319925/files/ESet_object_sc_healthy.rdata

• Human bulk RNA ESet objects (tag: #bulk)

https://zenodo.org/record/7319925/files/ESet_object_bulk_healthy.rdata
https://zenodo.org/record/7319925/files/ESet_object_bulk_T2D.rdata

• Open the Galaxy Upload Manager (galaxy-upload on the top-right of the tool panel)

• Select Paste/Fetch Data
• Paste the link into the text field

• Press Start

• Close the window
3. Rename the datasets as needed

4. Add to each file a tag corresponding to #bulk and #scrna

• Click on the dataset
• Click on galaxy-tags Add Tags
• Add a tag starting with #

Tags starting with # will be automatically propagated to the outputs of tools using this dataset.

• Check that the tag is appearing below the dataset name

# Infer cellular composition & compare

It’s finally time!

## Altogether: Deconvolution with a combined sc reference

Tools are frequently updated to new versions. Your Galaxy may have multiple versions of the same tool available. By default, you will be shown the latest version of the tool. This may NOT be the same tool used in the tutorial you are accessing. Furthermore, if you use a newer tool in one step, and try using an older tool in the next step… this may fail! To ensure you use the same tool versions of a given tutorial, use the Tutorial mode feature.

• Click on the curriculum icon on the top menu, this will open the GTN inside Galaxy.
• Tool names in tutorials will be blue buttons that open the correct tool for you
• Note: this does not work for all tutorials (yet)
• You can click anywhere in the grey-ed out area outside of the tutorial box to return back to the Galaxy analytical interface
Warning: Not all browsers work!
• We’ve had some issues with Tutorial mode on Safari for Mac users.
• Try a different browser if you aren’t seeing the button.
Hands-on: Comparing: altogether
1. MuSiC Compare Tool: toolshed.g2.bx.psu.edu/repos/bgruening/music_compare/music_compare/0.1.1+galaxy4 with the following parameters:
• In “New scRNA Group”:
• param-repeat “Insert New scRNA Group”
• “Name of scRNA Dataset”: scRNA_set
• param-file “scRNA Dataset”: ESet_object_sc_combined.rdata (Input dataset)
• “Cell Types Label from scRNA dataset”: Inferred cell type - author labels
• “Samples Identifier from scRNA dataset”: Individual
• “Comma list of cell types to use from scRNA dataset”: alpha cell,beta cell,delta cell,gamma cell,acinar cell,ductal cell
• In “Bulk Datasets in scRNA Group”:
• param-repeat “Insert Bulk Datasets in scRNA Group”
• “Name of Bulk Dataset”: Bulk_set:Normal
• param-file “Bulk RNA Dataset”: ESet_object_bulk_healthy.rdata (Input dataset)
• “Factor Name”: Disease
• param-repeat “Insert Bulk Datasets in scRNA Group”
• “Name of Bulk Dataset”: Bulk_set:T2D
• param-file “Bulk RNA Dataset”: ESet_object_bulk_T2D.rdata (Input dataset)
• “Factor Name”: Disease
2. To each of the outputs, add the #altogether tag.

There are four sets of output files.

1. Summarised Plots <- This is the most interesting output, because it has the pretty pictures!
2. Individual Heatmaps <- This kind of does what standard (non-Comparing) MuSiC does for each sample, rather than combining them.
3. Stats <- This will be very handy if you want to make any statistical calculations, as it contains medians and quartiles
4. Tables <- This contains the cell proportions found within each sample as well as the number of reads.

### Summarised Plots

Examine galaxy-eye the output file Summarised Plots (MuSiC). Now the first few pages are similar to the standard deconvolution tool, but now comparing across the factor of interest (disease). Among the myriad of visualisations available, our favourite is on page 5 - a comparison of inferred cell proportions across disease.

Here we can see that the bulk-RNA seq samples from the T2D patients contain markedly fewer beta cells as compared with their healthy counterparts. This makes sense, so that’s good!

### Individual Heatmaps

Examine galaxy-eye the output file Individual heatmaps (MuSiC). This shows the cell distribution across each of the individual samples, separated out by disease factor into two separate plots, but ultimately isn’t particularly informative.

### Stats

If you select the Stats dataset, you’ll find it contains four sets of data, Bulk_disease: Read Props, Bulk_disease: Sample Props, Bulk_healthy: Read Props and Bulk_healthy: Sample Props. Examine galaxy-eye the file Bulk_disease: Sample Props. This contains summary statistics (Min, quartiles, median, mean, etc.) for each phenotype. This could be quite helpful if you’re trying to statistically identify differences across samples.

### Tables

Finally, if you select the Tables dataset, you’ll find it contains three sets of data, Data Table, Matrix of Cell Type Read Counts, and Matrix of Cell Type Sample Proportions.

Examine galaxy-eye the file Data Table. This contains the inferred proportions and reads associated with each sample and cell type, along with its important factor of interest (Disease). In this tutorial, we tend to use sample proportions rather than read count, but either works. The two other matrix files are just portions of this data table.

Question
1. Why does the data table contain 42 rows?
1. The data table contains a row for each cell type within each sample. Since there are 6 cell types and 7 samples, 6*7 = 42 rows.

Hopefully, this has been illuminating! Now let’s try two other ways of inferring from a reference and see if it makes a difference.

## Like4like: Deconvolution of healthy samples with a healthy reference and diseased samples with a diseased reference

Hands-on: Like4like Inference
1. MuSiC Compare Tool: toolshed.g2.bx.psu.edu/repos/bgruening/music_compare/music_compare/0.1.1+galaxy4 with the following parameters:
• In “New scRNA Group”:
• param-repeat “Insert New scRNA Group”
• “Name of scRNA Dataset”: scRNA_set:Normal
• param-file “scRNA Dataset”: ESet_object_sc_healthy.rdata (Input dataset)
• “Cell Types Label from scRNA dataset”: Inferred cell type - author labels
• “Samples Identifier from scRNA dataset”: Individual
• “Comma list of cell types to use from scRNA dataset”: alpha cell,beta cell,delta cell,gamma cell,acinar cell,ductal cell
• In “Bulk Datasets in scRNA Group”:
• param-repeat “Insert Bulk Datasets in scRNA Group”
• “Name of Bulk Dataset”: Bulk_set:Normal
• param-file “Bulk RNA Dataset”: ESet_object_bulk_healthy.rdata (Input dataset)
• “Factor Name”: Disease
• param-repeat “Insert New scRNA Group”
• “Name of scRNA Dataset”: scRNA_set:T2D
• param-file “scRNA Dataset”: ESet_object_sc_T2D.rdata (Input dataset)
• “Cell Types Label from scRNA dataset”: Inferred cell type - author labels
• “Samples Identifier from scRNA dataset”: Individual
• “Comma list of cell types to use from scRNA dataset”: alpha cell,beta cell,delta cell,gamma cell,acinar cell,ductal cell
• In “Bulk Datasets in scRNA Group”:
• param-repeat “Insert Bulk Datasets in scRNA Group”
• “Name of Bulk Dataset”: bulk_set:T2D
• param-file “Bulk RNA Dataset”: ESet_object_bulk_T2D.rdata (Input dataset)
• “Factor Name”: Disease
2. Add the #like4like tag to each of the outputs.
Question
1. How have the cell inferences changed, now that we have changed the scRNA references used?
1. Overall, our interpretation here is that the differences are less pronounced. It’s interesting to conjecture whether this is an artefact of analysis, or whether - possibly - the beta cells in the diseased samples are not only fewer, but also contain fewer beta-cell specific transcripts (and thereby inhibited beta cell function), thereby lowering the bar for the inference of a beta cell and leading to a higher proportion of interred B-cells.

Let’s try one more inference - this time, we’ll use only healthy cells as a reference, to (theoretically) make a more consistent analysis across the two phenotypes.

## healthyscref: Deconvolution using only healthy cells as a reference

Hands-on: Healthy sc reference only inference
1. MuSiC Compare Tool: toolshed.g2.bx.psu.edu/repos/bgruening/music_compare/music_compare/0.1.1+galaxy4 with the following parameters:
• In “New scRNA Group”:
• param-repeat “Insert New scRNA Group”
• “Name of scRNA Dataset”: scRNA_set:Normal
• param-file “scRNA Dataset”: ESet_object_sc_healthy.rdata (Input dataset)
• “Cell Types Label from scRNA dataset”: Inferred cell type - author labels
• “Samples Identifier from scRNA dataset”: Individual
• “Comma list of cell types to use from scRNA dataset”: alpha cell,beta cell,delta cell,gamma cell,acinar cell,ductal cell
• In “Bulk Datasets in scRNA Group”:
• param-repeat “Insert Bulk Datasets in scRNA Group”
• “Name of Bulk Dataset”: Bulk_set:Normal
• param-file “Bulk RNA Dataset”: ESet_object_bulk_healthy.rdata (Input dataset)
• “Factor Name”: Disease
• param-repeat “Insert Bulk Datasets in scRNA Group”
• “Name of Bulk Dataset”: Bulk_set:T2D
• param-file “Bulk RNA Dataset”: ESet_object_bulk_T2D.rdata (Input dataset)
• “Factor Name”: Disease
2. Add the #healthyscref tag to each of the outputs.
Question
1. How have the cell inferences changed this time?
1. If using a like4like inference reduced the difference between the phenotype, aligning both phenotypes to the same (healthy) reference exacerbated them - there are even fewer beta cells in the output of this analysis.

Overall, it’s important to remember how the inference changes depending on the reference used - for example, a combined reference might have majority healthy samples or diseased samples, so that would impact the inferred cellular compositions.

# Conclusion

Congrats! You’ve made it to the end of this suite of deconvolution tutorials! You’ve learned how to find quality data for reference and for analysis, how to reformat it for deconvolution using MuSiC, and how to compare cellular inferences using multiple kinds of reference datasets. You can find the workflow for this tutorial and an example history.

We hope this helps you in your research!

This tutorial is part of the https://singlecell.usegalaxy.eu portal (Tekman et al. 2020).

Key points
• Deconvolution can be used to compare cell type distributions from bulk RNA-seq datasets

Have questions about this tutorial? Check out the tutorial FAQ page or the FAQ page for the Single Cell topic to see if your question is listed there. If not, please ask your question on the GTN Gitter Channel or the Galaxy Help Forum

# Useful literature

Further information, including links to documentation and original publications, regarding the tools, analysis techniques and the interpretation of results described in this tutorial can be found here.

# References

1. Segerstolpe, Å., A. Palasantza, P. Eliasson, E.-M. Andersson, A.-C. Andréasson et al., 2016 Single-cell transcriptome profiling of human pancreatic islets in health and type 2 diabetes. Cell metabolism 24: 593–607. 10.1016/j.cmet.2016.08.020
2. Wang, X., J. Park, K. Susztak, N. R. Zhang, and M. Li, 2019 Bulk tissue cell type deconvolution with multi-subject single-cell expression reference. Nature communications 10: 1–9. 10.1038/s41467-018-08023-x
3. Tekman, M., B. Batut, A. Ostrovsky, C. Antoniewski, D. Clements et al., 2020 A single-cell RNA-sequencing training and analysis suite using the Galaxy framework. GigaScience 9: giaa102. 10.1093/gigascience/giaa102 https://academic.oup.com/gigascience/article/9/10/giaa102/5931798

# Feedback

Did you use this material as an instructor? Feel free to give us feedback on how it went.
Did you use this material as a learner or student? Click the form below to leave feedback.

# Citing this Tutorial

1. Wendi Bacon, Mehmet Tekman, Comparing inferred cell compositions using MuSiC deconvolution (Galaxy Training Materials). https://training.galaxyproject.org/training-material/topics/single-cell/tutorials/bulk-music-4-compare/tutorial.html Online; accessed TODAY
2. Batut et al., 2018 Community-Driven Data Analysis Training for Biology Cell Systems 10.1016/j.cels.2018.05.012



@misc{single-cell-bulk-music-4-compare,
author = "Wendi Bacon and Mehmet Tekman",
title = "Comparing inferred cell compositions using MuSiC deconvolution (Galaxy Training Materials)",
year = "",
month = "",
day = ""
url = "\url{https://training.galaxyproject.org/training-material/topics/single-cell/tutorials/bulk-music-4-compare/tutorial.html}",
note = "[Online; accessed TODAY]"
}
@article{Hiltemann_2023,
doi = {10.1371/journal.pcbi.1010752},
url = {https://doi.org/10.1371%2Fjournal.pcbi.1010752},
year = 2023,
month = {jan},
publisher = {Public Library of Science ({PLoS})},
volume = {19},
number = {1},
pages = {e1010752},
author = {Saskia Hiltemann and Helena Rasche and Simon Gladman and Hans-Rudolf Hotz and Delphine Larivi{\{e}}re and Daniel Blankenberg and Pratik D. Jagtap and Thomas Wollmann and Anthony Bretaudeau and Nadia Gou{\'{e}} and Timothy J. Griffin and Coline Royaux and Yvan Le Bras and Subina Mehta and Anna Syme and Frederik Coppens and Bert Droesbeke and Nicola Soranzo and Wendi Bacon and Fotis Psomopoulos and Crist{\'{o}}bal Gallardo-Alba and John Davis and Melanie Christine Föll and Matthias Fahrner and Maria A. Doyle and Beatriz Serrano-Solano and Anne Claire Fouilloux and Peter van Heusden and Wolfgang Maier and Dave Clements and Florian Heyl and Björn Grüning and B{\'{e}}r{\'{e}}nice Batut and},
editor = {Francis Ouellette},
title = {Galaxy Training: A powerful framework for teaching!},
journal = {PLoS Comput Biol} Computational Biology}
}

`

Congratulations on successfully completing this tutorial!