End-to-End Tissue Microarray Image Analysis with Galaxy-ME

Overview
Creative Commons License: CC-BY Questions:
  • What tools are available for pre-processing multiplex tissue images in Galaxy?

  • What tools are available for downstream analysis of multiplex tissue images in Galaxy?

  • How do I pre-process and analyze Tissue Microarray data?

  • How can I visualize multiplex tissue images and associated data?

  • How can I assign phenotypes to cells in an MTI dataset?

Objectives:
  • Understand the tools available in Galaxy for multiplex tissue imaging analysis

  • Analyze and visualize publicly available TMA data using Galaxy

Requirements:
Time estimation: 3 hours
Supporting Materials:
Published: Feb 14, 2023
Last modification: Nov 3, 2023
License: Tutorial Content is licensed under Creative Commons Attribution 4.0 International License. The GTN Framework is licensed under MIT
purl PURL: https://gxy.io/GTN:T00334
version Revision: 15

Multiplex tissue images are large, multi-channel images that contain intensity data for numerous biomarkers. The methods for generating multiplex tissue images are diverse, and each method can require specialized knowledge for downstream processing and analysis. The MCMICRO (Schapiro et al. 2021) pipeline was developed to process multiplex images into single-cell data, and to have the range of tools to accomodate for different imaging methods. The tools used in the MCMICRO pipeline, in addition to tools for single-cell analysis, spatial analysis, and interactive visualization are available in Galaxy to facilitate comprehensive and accessible analyses of multiplex tissue images. The MCMICRO tools available in Galaxy are capable of processing Whole Slide Images (WSI) and Tissue Microarrays (TMA). WSIs are images in which a tissue section from a single sample occupies the entire microscope slide; whereas, TMAs multiplex smaller cores from multiple samples onto a single slide. This tutorial will demonstrate how to use the Galaxy multiplex imaging tools to process and analyze publicly available TMA test data provided by MCMICRO (Figure 1.).

Find a full example history

Aviator screenshot, described in figure caption. Open image in new tab

Figure 1: Fully registered image of the MCMICRO Exemplar-002 Tissue microarray. Exemplar-002 consists of four cores, each with a distinct tissue organization and expression of biomarkers. In the image, there are six biomarkers shown: DNA (white), CD163 (yellow), CD3D (blue), CD31 (red), VDAC1 (green), and Keratin (orange). This image is being viewed using Avivator, an interactive tool that allows the user to selectively view channels and adjust channel intensities.

Agenda

In this tutorial, we will cover:

  1. Get data
  2. Tile illumination correction with BaSiC Illumination
  3. Stitching and registration with ASHLAR
  4. TMA dearray with UNetCoreograph
  5. Nuclear segmentation with Mesmer
  6. Calculate single-cell features with Quantification
  7. Convert McMicro Output to Anndata
  8. Scimap: Single Cell Phenotyping
  9. Interactive visualization of multiplex tissue images
    1. Converting UNetCoreograph images to OME-TIFF using the Convert image tool
    2. Rename OME-TIFF Channels
    3. Initial visualization with Avivator
    4. Generating an interactive visualization dashboard with Vitessce
  10. Next steps: Compositional and spatial analyses
  11. Conclusion

Get data

Multiplex tissue images come in a variety of forms and file-types depending on the modality or platform used. For this tutorial, the Exemplar-002 data was imaged using Cyclic Immunofluorescence (CycIF) with a RareCyte slide scanner. Many of the steps in this workflow have platform-specific parameters, and the hands-on sections will show the best parameters for CycIF RareCyte images; however, notes will be made where critical differences may occur depending on the modality or platform throughout the tutorial.

Hands-on: Data import to history
  1. Create a new history for this tutorial

    Click the new-history icon at the top of the history panel:

    UI for creating new history

  2. Import the files from Zenodo or from the shared data library (GTN - Material -> imaging -> End-to-End Tissue Microarray Image Analysis with Galaxy-ME):
    https://zenodo.org/record/7622545/files/markers.csv
    https://zenodo.org/record/7622545/files/exemplar_002_phenotypes.csv
    https://zenodo.org/record/7622545/files/exemplar-002-cycle-01.ome.tiff
    https://zenodo.org/record/7622545/files/exemplar-002-cycle-02.ome.tiff
    https://zenodo.org/record/7622545/files/exemplar-002-cycle-03.ome.tiff
    https://zenodo.org/record/7622545/files/exemplar-002-cycle-04.ome.tiff
    https://zenodo.org/record/7622545/files/exemplar-002-cycle-05.ome.tiff
    https://zenodo.org/record/7622545/files/exemplar-002-cycle-06.ome.tiff
    https://zenodo.org/record/7622545/files/exemplar-002-cycle-07.ome.tiff
    https://zenodo.org/record/7622545/files/exemplar-002-cycle-08.ome.tiff
    https://zenodo.org/record/7622545/files/exemplar-002-cycle-09.ome.tiff
    https://zenodo.org/record/7622545/files/exemplar-002-cycle-10.ome.tiff
    
    • Copy the link location
    • Click galaxy-upload Upload Data at the top of the tool panel

    • Select galaxy-wf-edit Paste/Fetch Data
    • Paste the link(s) into the text field

    • Press Start

    • Close the window

  3. Group the datasets into collections. Make a collection of the OME-TIFF, ordering the files by cycle number.
Warning: **Imaging platform differences**

The Exemplar-002 raw images are in ome.tiff format; however, commonly seen raw file-types are ome.tiff, tiff, czi, and svs. If your input images are not ome.tiff or tiff, you may have to edit the dataset attributes in Galaxy to allow tools to recognize them as viable inputs.

The raw files for each round (10 in total) of the exemplar-002 data are available on cancer.usegalaxy.org under Data Libraries (Figure 2.). Import the raw files into a new history as a list collection.

Screenshot of the Galaxy data libraries on cancer.usegalaxy.org, highlighting the path to the dataset, Libraries, Exemplar 002, raw. The UI shows all datasets in that folder selected before using the Export to History button to import them as a Collection.Open image in new tab

Figure 2: Finding the Exemplar-002 data on cancer.usegalaxy.org Data Libraries.

Tile illumination correction with BaSiC Illumination

Commonly, raw MTI data will consist of one image per round of imaging. These individual round images are frequently captured in tiles, and there can be slight variations in how each tile was illuminated across the course of imaging. Prior to tile stitching and image registration, the tiles have to undergo illumination correction with BaSiC Illumination (Peng et al. 2017) to account for this. Unlike many of the other tools in this workflow, BaSiC has no extra parameters to think about: Just input the collection of raw images and press go!

Two new list collections will appear in the history upon completion:

  • BaSiC Illumination on Collection X: FFP (flat-field)
  • BaSiC Illumination on Collection X: DFP (deep-field)
Hands-on: Illumination correction
  1. BaSiC Illumination ( Galaxy version 1.0.3+galaxy1) with the following parameters:

    • param-collection “Raw Cycle Images: “: List collection of raw images

Stitching and registration with ASHLAR

After illumination is corrected across round tiles, the tiles must be stitched together, and subsequently, each round mosaic must be registered together into a single pyramidal OME-TIFF file. ASHLAR (Muhlich et al. 2022) from MCMICRO provides both of these functions.

Comment: Important detail: Marker File

ASHLAR optionally reads a marker metadata file to name the channels in the output OME-TIFF image. This marker file will also be used in later steps. Make sure that the marker file is comma-separated and has the marker_names as the third column (Figure 3.).

screenshot of the markers table. Open image in new tab

Figure 3: Markers file, used both in ASHLAR and downstream steps. Critically, the marker_names are in the third column.
Hands-on: : Image stitching and registration
  1. ASHLAR ( Galaxy version 1.14.0+galaxy1) with the following parameters:

    • param-collection “Raw Images”: List collection of raw images
    • param-collection “Deep Field Profile Images”: List collection of DFP images produced by BaSiC Illumination
    • param-collection “Flat Field Profile Images”: List collection of FFP images produced by BaSiC Illumination
    • param-file “Markers File (optional)”: Comma-separated markers file with marker_names in third column

    • In “Advanced Options”:
      • “Write output as a single pyramidal TIFF”: Yes
Warning: **Imaging platform differences**

ASHLAR, among other tools in the MCMICRO and Galaxy-ME pre-processing tools have some parameters that are specific to the imaging patform used. By default, ASHLAR is oriented to work with images from RareCyte scanners. AxioScan scanners render images in a different orientation. Because of this, when using ASHLAR on AxioScan images, it is important to select the Flip Y-Axis parameter to Yes

ASHLAR will work for most imaging modalities; however, certain modalities require different tools to be registered. For example, multiplex immunohistochemistry (mIHC) images must use an aligner that registers each moving image to a reference Hematoxylin image. For this, Galaxy-ME includes the alternative registration tool PALOM .

TMA dearray with UNetCoreograph

Many downstream processing and analysis steps require each individual core from the TMA to be in a separate image file. To accomplish this from our registered ome.tiff image, we can use UNetCoreograph to detect and crop each core into separate files.

UNetCoreograph will output images (used for downstream steps), masks, and a preview image (Figure 4.).

Image of four cores (numbered 1-4) on a slide.Open image in new tab

Figure 4: Preview image from UNetCoreograph, outlines show detection of each individual core in the TMA.
Hands-on: TMA dearray
  1. UNetCoreograph ( Galaxy version 2.2.8+galaxy1) with the following parameters:

    • param-file “Registered TIFF”: The output of ASHLAR (registered, pyramidal OME-TIFF file)
    Comment: What about Whole Slide Images?

    Whole slide images do not need to be dearrayed, so in most cases, this step can be skipped; however, UNetCoreograph has the “Tissue” option, which when selected, can act to separate the whole tissue from the background in a whole slide image which can be useful. In this case, it is important to toggle the “Downsample factor” as this often needs to be higher when extracting whole tissues.

Nuclear segmentation with Mesmer

Cell segmentation is the basis for all downstream single-cell analyses. Different segmentation tools work highly variably depending on the imaging modality or platform used. Because of this, Galaxy-ME has incorporated several cell segmentation tools so users may find the tool that works optimally for their data.

Available segmentation tools in Galaxy-ME:

In this tutorial, we use Mesmer because it tends to perform generally well on a diverse range of image types, and has a limited number of parameters to understand.

Comment: Important detail: Running images in batches

Now that each image has been split into individual core images, downstream tools must be run on the images separately. Luckily, Galaxy makes this easy by including the option to run each tool in batch across a collection of inputs. Next to the input for the tool, select param-collection (Dataset collection) as the input type, and pass the collection output by UNetCoreograph as input.

Hands-on: Nuclear segmentation
  1. Mesmer ( Galaxy version 0.12.3+galaxy2) with the following parameters:
    • param-collection “Image containing the nuclear marker(s) “: Collection output of UNetCoreograph (images)
    • “Resolution of the image in microns-per-pixel”: 0.65
    • “Compartment for segmentation prediction:”: Nuclear
    Comment: np.squeeze

    The np.squeeze parameter is very important to select as Yes to make the output compatible with next steps

Warning: Imaging platform differences: Image resolution**

A crucial parameter for Mesmer and other segmentation tools is the Image resolution. This is reported in microns/pixel, and can vary depending on the imaging platform used and the settings at image acquisition. Mesmer accepts the resolution in microns/pixel; however, if using UnMICST, the resolution must be reported as a ratio of the resolution of UnMICST’s training images (0.65). For example, when using UnMICST, if your images were captured at a resolution of 0.65, then the UnMICST value would be 1, but if your images were captured at 0.325 microns/pixel, then the value you would enter for UnMICST would be 0.5.

Calculate single-cell features with Quantification

After generating a segmentation mask, the mask and the original registered image can be used to extract mean intensities for each marker in the panel, spatial coordinates, and morphological features for every cell. This step is performed by MCMICRO’s Quantification module.

Once again, as this is a TMA, we will be running this in batch mode for every core image and its segmentation mask.

The quantification step will produce a CSV cell feature table for every image in the batch.

Hands-on: Quantification
  1. Quantification ( Galaxy version 1.5.3+galaxy1) with the following parameters:

    • param-collection “Registered TIFF “: Collection output of UNetCoreograph (images)
    • param-collection “Primary Cell Mask “: Collection output of Mesmer (or other segmentation tool)
    • param-collection “Additional Cell Masks “: Nothing Selected (Other tools may produce multiple mask types)
    • param-file “Marker channels”: Comma-separated markers file with marker_names in third column
    Comment: Mask metrics and Intensity metrics

    Leaving the “mask metrics” and “intensity metrics” blank will by default run all available metrics

Convert McMicro Output to Anndata

Anndata (Virshup et al. 2021) is a Python package and file format schema for working with annotated data matrices that has gained popularity in the single-cell analysis community. Many downstream analysis tools, including Scimap from MCMICRO, Scanpy (Wolf et al. 2018), and Squidpy (Palla et al. 2022) are built around anndata format files (h5ad). This tool splits the marker intensity data into a separate dataframe (X), and places all observational data (spatial coordinates, morphological features, etc.) in the cell feature table into a separate dataframe (obs) that shares the same indices as X. In downstream analyses, new categorical variables, such as phenotype assignments for each cell, are stored in the obs dataframe.

Learn more about this file format at the anndata documentation.

Hands-on: Conver to Anndata
  1. Convert McMicro Output to Anndata ( Galaxy version 0.17.7+galaxy0) with the following parameters:

    • param-collection “Select the input image or images”: Collection output of Quantification (cellMaskQuant)
    • In “Advanced Options”:
      • “Whether to remove the DNA channels from the final output”: No
      • “Whether to use unique name for cells/rows”: No
    Warning: Important parameter: Unique names for cells/rows

    Setting “Whether to use unique name for cells/rows” to No to ensures that downstream interactive visualizations will be able to map observational features to the mask CellIDs.

Scimap: Single Cell Phenotyping

There are several ways to classify cells available in Galaxy-ME. Unsupervised approaches, such as Leiden clustering, can be performed on all cells and phenotypes can be manually annotated based on marker expression patterns observed by the user. This approach is time consuming, so here we will demonstrate automated phenotyping based on thresholds of specific lineage markers using MCMICRO’s Scimap. Scimap phenotyping can either be provided a table of manual gate values for each marker of interest (which can be determined using the GateFinder tool in Galaxy-ME), or by default, Scimap will fit a Gaussian Mixture Model (GMM) to the log(intensity) data for each marker to determine positive and negative populations for that marker. The marker intensity values are rescaled between (0,1) with 0.5 being the cut-off between negative and positive populations. Scimap uses a ‘Phenotype workflow’ to guide the classification of cells (Figure 5.). For more on how to construct a Scimap workflow, see the Scimap documentation.

Screenshot of the phenotypes table. Open image in new tab

Figure 5: Example of a phenotype workflow compatible with Scimap. 'Pos' means that the marker must be positive to be classified as the respective phenotype. 'Anypos' means any, but not necessarily all, of the listed markers can be positive to call the respective phenotype.
Hands-on: Single Cell Phenotyping with Scimap
  1. Single Cell Phenotyping ( Galaxy version 0.17.7+galaxy0) with the following parameters:

    • param-collection “Select the input anndata”: Output of Convert MCMICRO output to Anndata
    • param-file “Select the dataset containing manual gate information”: (Optional) manually determined gates in CSV format. Gates will be determined automatically using a GMM for each marker if this file is not provided
    • param-file “Select the dataset containing gating workflow”: exemplar_002_phenotypes.csv, CSV phenotype workflow (Figure 5.)
    • “Save the GMM gates plots If True”: Yes
    Comment: Limitations of GMM automated phenotyping

    When manual gates are not provided, Scimap fits a GMM to determine a threshold between positive and negative cells. This automated gating works well when markers are highly abundant within the tissue, and the data shows a bimodal distribution (Figure 6A.). GMM gating can lead to spurious thresholds, however, when the data does not appear to be bimodal (Figure 6B.). This tends to happen when the marker is not highly abundant in the tissue, so there isn’t a large positive population. Markers that have a highly continuous range of intensity, like certain functional markers, can also be problematic with GMM gating. It is recommended to always look at the GMM plots output by Scimap, and validate any potentially spurious gates manually.

    Two bar plots with overlain curves. Left in A shows a bimodal distribution of CD3D, right in B shows a unimodal distribution in CD11B.Open image in new tab

    Figure 6: Scimap automatic gating GMMs for two markers. (A) An example of a marker with a bimodal distribution and a reasonable looking gate. (B) An example of a marker with a unimodal distribution that is not ideal for fitting with a GMM, and would be a candidate for manual validation and gating.

Interactive visualization of multiplex tissue images

Visual analysis is an important part of multiplex tissue imaging workflows. Galaxy-ME has several tools that make interactive visualization easy, and can be used at various stages of analysis.

Converting UNetCoreograph images to OME-TIFF using the Convert image tool

UNetCoreograph outputs each individual core image in tiff format. Interactive visualization tools, such as Vitessce and Avivator require the images to be in OME-TIFF format to be viewed. Galaxy-ME includes a conversion tool that can accomodate this, along with many other useful conversion functions.

Hands-on: Convert image
  1. Convert image ( Galaxy version 6.7.0+galaxy0) with the following parameters:
    • param-collection “Input Image”: UNetCoreograph Images
    • “Output data type”: OME TIFF
    • “Tile image”: Tile image
    • “Pyramid image”: Generate Pyramid

Rename OME-TIFF Channels

Some tools can cause the channel names in an OME-TIFF image to be lost. To fix this, or to change the channel names to whatever the user prefers, the Rename OME-TIFF Channels tool can be invoked using a markers file similar to the one used in previous steps.

Hands-on: Rename channels
  1. Rename OME-TIFF Channels ( Galaxy version 0.0.1+galaxy1) with the following parameters:

    • param-file “Input image in OME-tiff format”: Convert image
    • “Format of input image”: ome.tiff
    • param-file “Channel metadata CSV”: markers.csv, Comma-separated markers file with marker_names in third column

Initial visualization with Avivator

For any OME-TIFF image in a Galaxy-ME history, there will be an option to view the image using Avivator. This is a great way to perform an initial inspection of an image for QC purposes before continuing with downstream steps. The Avivator window can be launched by expanding the dataset information in the history panel and clicking the link (Figure 7.).

Screenshot shows a galaxy dataset expanded, and then the 'display at aviator' link expanded into a screenshot of Aviator showing a multicoloured histology slide.Open image in new tab

Figure 7: The highlighted link automatically appears for any OME-TIFF image (left) and, when clicked, launches an Avivator window to explore the image (right).
Hands-on: View Images with Avivator
  1. Expand the dataset ASHLAR: OME-TIFF image to be viewed
  2. Click on “display at Avivator”

Generating an interactive visualization dashboard with Vitessce

Vitessce is a powerful visualization tool that creates interactive dashboards (Figure 8.) to look at a multiplex OME-TIFF images in conjunction with data generated during analysis and stored in an anndata file. The segmentation mask can be overlaid onto the image to qualitatively assess the segmentation performance. The mask can then be colored with associated observational data (Figure 9A.), such as phenotype, with the same colors appearing in barplots (Figure 9B.), UMAP representations, heatmaps, and marker intensity violin plots for comrehensive data exploration.

Screenshot of the vitessce dashboard.Open image in new tab

Figure 8: A Full view of a vitesse dashboard for one core from Exemplar-002.
screenshot of vitessce interface, on the left in A is a picture of a cell highlighted. On the right in B is a bar chart comparing cell size vs lineages.Open image in new tab

Figure 9: Each window in the dashboard can be resized to view the components in more detail. A closer look at the phenotype-labeled mask overlaid on the actual image (A), and the phenotype barplot (B).
Hands-on: Vitessce visualization
  1. Vitessce Visualization ( Galaxy version 1.0.4+galaxy0) with the following parameters:

    • param-collection “Select the OME Tiff image”: OME-TIFF image to be viewed (or collection of files to run in batch)
    • param-collection “Select masks for the OME Tiff image (Optional)”: Output of Mesmer (or other segmentation tool)
    • “Whether to do phenotyping”: Yes
      • “Select the anndata contaning phenotyping info”: Single Cell Phenotyping, an anndata file that includes includes cell phenotype annotations.
      • “Select an embedding algorithm for scatterplot”: UMAP
      • “Input phenotyping keys”: Multiple choices
        • “Select the key(s)”: phenotype

Next steps: Compositional and spatial analyses

Galaxy-ME includes additional tools from Scimap and tools from the Squidpy package (Palla et al. 2022) that can be used to perform a variety of downstream analyses. For example, once phenotypes have been assigned to individual cells, Squidpy has several methods for understanding the spatial organization of the tissue. Using Squidpy, a spatial neighborhood graph is first generated, from which the organization of specific phenotype groups and their interactions can be quantified.

Hands-on: Spatial analysis with Squidpy
  1. Squidpy Graph and Plotting generate a spatial neighborhood graph with the following parameters:

    • param-collection “Select the input anndata”: Anndata file containing phenotype information (or other variable of interest)
    • “Select an analysis”: Spatial neighbors -- Create a graph from spatial coordinates
  2. Squidpy Graph and Plotting compute and plot a neighborhood enrichment analysis with the following parameters:

    • param-collection “Select the input anndata”: Output of step 1 (anndata file with spatial neighborhood graph)
    • “Select an analysis”: nhood_enrichment -- Compute neighborhood enrichment by permutation test
    • “Key in anndata.AnnData.obs where clustering is stored”: phenotype
    Comment: Neighborhood enrichment plot

    Squidpy was used to calculate neighborhood enrichments for each phenotype in core 2 of exemplar 2 (Figure 10.). This shows which phenotypes co-locate most frequently within the tissue.

    Heatmap showing phenotype vs neighbourhood enrichment. Most of the heatmap is blue/green (low) but one cell under epithelial is bright yellow (high). Open image in new tab

    Figure 10: The output of Squidpy's neighborhood enrichment on core 2 from Exemplar-002.
  3. Squidpy Graph and Plotting calculate Ripley’s L curves for each phenotype with the following parameters:

    • param-collection “Select the input anndata”: Output of step 1 (anndata file with spatial neighborhood graph)
    • “Select an analysis”: nhood_enrichment -- Compute neighborhood enrichment by permutation test
    • “Key in anndata.AnnData.obs where clustering is stored”: phenotype
    • In “Advanced Graph Options”:
      • “Which Ripley’s statistic to compute”: L
    • In “Plotting Options”:
      • “Ripley’s statistic to be plotted”: L
    Comment: Ripley's L plot

    Squidpy was used to calculate Ripley’s L curves for each phenotype in core 2 of exemplar 2 (Figure 11.). This shows the overall organization of each phenotype in the tissue. If the curve for a given phenotype lies above the light grey null line (Example: Epithelial cells in Figure 11.), the phenotype is statistically significantly clustered. If the curve lies on the null line (Example: Myeloid lineage in Figure 11.), it’s spatial distribution within the tissue is random. If the curve is underneath the null line (Example: T cells in Figure 11.), it’s spatial distribution is statistically significantly dispersed.

    Graph of Ripley's L. Value is plotted against bins, all of which show cursves starting at 0 and increasing as bins increase. Epithelial is the highest curve.Open image in new tab

    Figure 11: The output of Squidpy's Ripley's L curve on core 2 from Exemplar-002.

Conclusion

In this tutorial, we demonstrated a complete multiplex tissue imaging analysis workflow performed entirely in a web browser using Galaxy-ME. Using an example tissue microarray imaged with cylic immunofluoresence provided by MCMICRO, we…

  • Corrected illumination between imaging tiles
  • Stitched and registered input images to produce a single, pyramidal OME-TIFF image that is viewable in multiple built-in interactive viewing tools (Avivator, Vitessce)
  • Split the TMA into separate images for each core
  • Processed each core in parallel, beginning with nuclear segmentation
  • Quantified the mean marker intensities, morphological features, and spatial coordinates of each cell in each core
  • Converted the resulting tabular data to anndata format for convenient downstream anaylses and visualizations
  • Performed marker-based, automatically gated, phenotyping of cells
  • Prepared the dearrayed images and viewed them interactively in a dashboard combined with observational data
Image of a galaxy workflow with all of the tools from this tutorial.Open image in new tab

Figure 12: The entire workflow used in this tutorial.