Clinical-MP-5-Data Interpretation

Overview
Creative Commons License: CC-BY Questions:
  • Why do we need to interpret the data?

  • Can we visualize the data?

Objectives:
  • Perform group comparison analysis.

  • Analyze significant proteins

  • Look at the taxonomic distribution of the quantified peptides

Requirements:
Time estimation: 3 hours
Supporting Materials:
Published: Feb 6, 2024
Last modification: Feb 12, 2024
License: Tutorial Content is licensed under Creative Commons Attribution 4.0 International License. The GTN Framework is licensed under MIT
purl PURL: https://gxy.io/GTN:T00417
version Revision: 2

The final workflow in the array of clinical metaproteomics tutorials is the data interpretation workflow. Interpreting MaxQuant data using MSstats involves applying a rigorous statistical framework to glean meaningful insights from quantitative proteomic datasets. The MaxQuant output is explored to understand data distribution and variability. Subsequent normalization helps account for systematic variations. MSstats allows the user to define the experimental design, including sample groups and conditions, to perform statistical analysis. The output provides valuable information about differential protein expression across conditions, estimates of fold changes, and associated p-values, aiding in the identification of biologically significant proteins. Furthermore, MSstats enables quality control and data visualization, ultimately enhancing our ability to draw meaningful conclusions from complex proteomic datasets. Additional tutorial material for using MaxQuant and MSstatTMT for TMT data analysis can be found at MaxQuant and MSstats for the analysis of TMT data.

Data-Interpretation-workflow.

Agenda

In this tutorial, we will cover:

  1. Get data
  2. Taxonomic analysis with Unipept
  3. Extraction of Microbial Sequences
  4. MSstats TMT
    1. Statistical Analysis of Microbial proteins with MSstatsTMT
    2. Statistical Analysis of Human proteins with MSstatsTMT
  5. Conclusion

Get data

Hands-on: Data Upload
  1. Create a new history for this tutorial
  2. Import the files from Zenodo or from the shared data library (GTN - Material -> proteomics -> Clinical-MP-5-Data Interpretation):

    https://zenodo.org/records/10105821/files/Annotation.tabular
    https://zenodo.org/records/10105821/files/Comparison_Matrix.tabular
    https://zenodo.org/records/10105821/files/MaxQuant_Evidence.tabular
    https://zenodo.org/records/10105821/files/MaxQuant_Protein_Groups.tabular
    https://zenodo.org/records/10105821/files/Quantified-Peptides.tabular
       
    
    • Copy the link location
    • Click galaxy-upload Upload Data at the top of the tool panel

    • Select galaxy-wf-edit Paste/Fetch Data
    • Paste the link(s) into the text field

    • Press Start

    • Close the window

    As an alternative to uploading the data from a URL or your computer, the files may also have been made available from a shared data library:

    1. Go into Shared data (top panel) then Data libraries
    2. Navigate to the correct folder as indicated by your instructor.
      • On most Galaxies tutorial data will be provided in a folder named GTN - Material –> Topic Name -> Tutorial Name.
    3. Select the desired files
    4. Click on Add to History galaxy-dropdown near the top and select as Datasets from the dropdown menu
    5. In the pop-up window, choose

      • “Select history”: the history you want to import the data to (or create a new one)
    6. Click on Import

  3. Rename the datasets
  4. Check that the datatype

    • Click on the galaxy-pencil pencil icon for the dataset to edit its attributes
    • In the central panel, click galaxy-chart-select-data Datatypes tab on the top
    • In the galaxy-chart-select-data Assign Datatype, select datatypes from “New type” dropdown
      • Tip: you can start typing the datatype into the field to filter the dropdown menu
    • Click the Save button

  5. Add to each database a tag corresponding to …

    1. Click on the dataset to expand it
    2. Click on Add Tags galaxy-tags
    3. Add a tag starting with #
      • Tags starting with # will be automatically propagated to the outputs of tools using this dataset.
    4. Press Enter
    5. Check that the tag appears below the dataset name

Taxonomic analysis with Unipept

Unipept serves as a vital bioinformatics platform for the analysis of mass spectrometry-based shotgun proteomics data, especially in the study of microbial communities. Its primary utility lies in taxonomic and functional analyses, enabling researchers to identify and quantify microorganisms within diverse environments. The platform facilitates comparative studies across samples, conditions, or time points, shedding light on the dynamic responses of microbial communities to environmental changes. Unipept integrates with public databases like UniProt, ensuring access to comprehensive and updated information for annotations. Being community-driven and open source, Unipept fosters collaboration and transparency, with a user-friendly web interface that accommodates researchers of varying bioinformatics expertise. In essence, Unipept is an invaluable resource, offering tools for the exploration of metaproteomic data and contributing to advancements in our understanding of microbial ecology.

Hands-on: Unipept 5.0
  1. Unipept ( Galaxy version 4.5.1) with the following parameters:
    • “Unipept application”: peptinfo: Tryptic peptides and associated EC and GO terms and lowest common ancestor taxonomy
    • “Peptides input format”: tabular
      • param-file “Tabular Input Containing Peptide column”: output (Input dataset)
      • “Select column with peptides”: c1
    • “Match input peptides by”: Match to the full input peptide
    • “Choose outputs”: Tabular with one line per peptide JSON Taxomony Tree Peptide GO terms in normalized tabular Peptide InterPro entries in normalized tabular Peptide EC terms in normalized tabular JSON EC Coverage Tree

Data-Interpretation with Unipept.

Extraction of Microbial Sequences

Hands-on: Extract Microbial sequences with Select
  1. Select with the following parameters:
    • param-file “Select lines from”: output (Input dataset)
    • “that”: NOT Matching
    • “the pattern”: (HUMAN)|(REV)|(CON)|(con)
    • “Keep header line”: Yes
Hands-on: Select sequences matching "HUMAN"
  1. Select with the following parameters:
    • param-file “Select lines from”: output (Input dataset)
    • “the pattern”: (HUMAN)
    • “Keep header line”: Yes
Hands-on: Select out reverse and contaminants
  1. Select with the following parameters:
    • param-file “Select lines from”: out_file1 (output of Select tool)
    • “that”: NOT Matching
    • “the pattern”: (REV)|(con)
    • “Keep header line”: Yes

MSstats TMT

MSstats TMT(Tandem Mass Tag) is a computational tool designed for the robust statistical analysis of mass spectrometry-based quantitative proteomics data using TMT labeling. TMT is a widely used method for multiplexed quantitative proteomics, enabling simultaneous identification and quantification of proteins across multiple samples. MSstats TMT plays a crucial role in this process by providing a statistical framework for analyzing TMT data, and facilitating accurate and reliable protein abundance measurements. The tool offers a range of features, including quality control, normalization, and statistical modeling, allowing researchers to identify differentially expressed proteins with confidence. MSstats TMT is particularly valuable in large-scale studies where quantifying protein expression across multiple conditions is essential for understanding complex biological processes. Its application contributes to advancing our understanding of proteomic changes in response to various experimental conditions or perturbations.

Statistical Analysis of Microbial proteins with MSstatsTMT

Hands-on: MSstatsTMT
  1. MSstatsTMT ( Galaxy version 2.0.0+galaxy1) with the following parameters:
    • “Input Source”: MaxQuant
      • param-file “evidence.txt - feature-level data”: output (Input dataset)
      • param-file “proteinGroups.txt”: out_file1 (output of Select Microbial tool)
      • param-file “annotation.txt”: output (Input dataset)
    • In “Plot Output Options”:
      • “Select protein IDs to draw plots”: generate all plots for each protein
    • “Compare Groups”: Yes
      • “Use comparison matrix?”: Yes
        • param-file “Comparison Matrix”: output (Input dataset)
      • “Select outputs”: Group Comparison MSstats Volcano Plot MSstats Comparison Plot
      • In “Comparison Plot Options”:
        • “Select protein IDs to draw plots”: generate all plots for each protein
        • “Select comparisons to draw plots”: Generate all plots for each comparison
      • “Select outputs”: MSstatsTMT summarization log MSstatsTMT summarization MSstats Protein abundance

Statistical Analysis of Human proteins with MSstatsTMT

Hands-on: MSstatsTMT
  1. MSstatsTMT ( Galaxy version 2.0.0+galaxy1) with the following parameters:
    • “Input Source”: MaxQuant
      • param-file “evidence.txt - feature-level data”: output (Input dataset)
      • param-file “proteinGroups.txt”: out_file1 (output of Select HUMAN tool)
      • param-file “annotation.txt”: output (Input dataset)
    • In “Plot Output Options”:
      • “Select protein IDs to draw plots”: generate all plots for each protein
    • “Compare Groups”: Yes
      • “Use comparison matrix?”: Yes
        • param-file “Comparison Matrix”: output (Input dataset)
      • “Select outputs”: Group Comparison MSstats Volcano Plot MSstats Comparison Plot
      • In “Comparison Plot Options”:
        • “Select protein IDs to draw plots”: generate all plots for each protein
        • “Select comparisons to draw plots”: Generate all plots for each comparison
      • “Select outputs”: MSstatsTMT summarization log MSstatsTMT summarization MSstats Protein abundance

The MSstats output typically includes essential information such as estimated fold changes, p-values, and other statistical measures that help identify differentially expressed proteins across experimental conditions or sample groups. It provides a clear picture of the variations in protein expression levels, aiding in the prioritization of biologically relevant targets. MSstats output also often includes visualizations and quality control metrics, making it a valuable resource for researchers in their quest to extract meaningful insights from complex proteomic datasets and understand the underlying biology of their experiments. Example of our data interpretation:

Data-Interpretation results with MSstats.

Conclusion

With the completion of this tutorial, you have successfully completed the clinical metaproteomics tutorials.

In conclusion, clinical metaproteomics tutorials represent an essential gateway to harnessing the power of advanced proteomic techniques in the realm of clinical research and applications. These bioinformatics tutorials serve as valuable guides for understanding the intricacies of metaproteomic workflows, from data analysis to interpretation. By providing comprehensive knowledge and practical insights, they equip researchers and clinicians with the tools necessary to explore the rich diversity of the microbiome and its impact on health and disease or environment. As metaproteomic techniques continue to evolve and integrate with clinical practice, we hope these tutorials will be instrumental in shaping clinical research.