EWAS data analysis of 450k data
What is a epigenome-wide association studies?
Why is a EWAS analysis useful?
Why 450k data?
Learn how to perform reproducible EWAS analysis
- Introduction to Galaxy Analyses
- Sequence analysis
time Time estimation: 1 hour
The field of cancer genomics has demonstrated the power of massively parallel sequencing techniques to inform on genes and specific alterations that drive tumor onset and progression. Although large comprehensive sequence data sets continue to be made increasingly available, data analysis remains an ongoing challenge, particularly for laboratories lacking dedicated resources and bioinformatics expertise. To address this, we have provide training based on Galaxy tools EWAS suite that represent many popular algorithms for detecting somatic genetic alterations from cancer genome and exome data.
This exercise uses datasets from the Cell publication by Hugo, Willy, et al., 2015 Hugo et al. 2015. with the goal being to identify differentially methylated regions and positions associated with treatment resistant melanomas. Datasets include the Illumina 450K Methylation array performed in melanoma tumors in a sample of patients pre and post MAPKi and BRAFi treatment with different outcomes (sensitive and resistant). For each sample there is raw green and red colour arrays containing the summarized bead information generated by the Illumina 450k scanner, i.e. there are 8 files in total:
The Infinium Methylation Assay 450k uses two different bead types to detect changes in DNA methylation levels. In the figure we can see M - methylated and U - unmethylated bead types. In our study unmethylated and methylated bead signals are reported as green and red colors respectively.
450K Array Data Upload
The first step of EWAS data analysis is the raw methylation data loading (intensity information files for each two color micro array)
hands_on Hands-on: Data Loading
Create a new history for this tutorial and give it a proper name
tip Tip: Creating a new history
Click the new-history icon at the top of the history panel
If the new-history is missing:
- Click on the galaxy-gear icon (History options) on the top of the history panel
- Select the option Create New from the menu
- Import the following IDAT files from Zenodo or from the data library (ask your instructor)
details List of Zenodo URLs
https://zenodo.org/record/1251211/files/GSM1588704_8795207135_R01C02_Red.idat https://zenodo.org/record/1251211/files/GSM1588706_8795207135_R02C02_Red.idat https://zenodo.org/record/1251211/files/GSM1588705_8795207119_R05C02_Red.idat https://zenodo.org/record/1251211/files/GSM1588707_8795207119_R06C02_Red.idat https://zenodo.org/record/1251211/files/GSM1588704_8795207135_R01C02_Grn.idat https://zenodo.org/record/1251211/files/GSM1588706_8795207135_R02C02_Grn.idat https://zenodo.org/record/1251211/files/GSM1588705_8795207119_R05C02_Grn.idat https://zenodo.org/record/1251211/files/GSM1588707_8795207119_R06C02_Grn.idat
tip Tip: Importing data via links
- Copy the link location
Open the Galaxy Upload Manager (galaxy-upload on the top-right of the tool panel)
- Select Paste/Fetch Data
Paste the link into the text field
- Close the window
By default, Galaxy uses the URL as the name, so rename the files with a more useful name.
tip Tip: Importing data from a data library
As an alternative to uploading the data from a URL or your computer, the files may also have been made available from a shared data library:
Go into Shared data (top panel) then Data libraries
Find the correct folder (ask your instructor)
- Select the desired files
- Click on the To History button near the top and select as Datasets from the dropdown menu
- In the pop-up window, select the history you want to import the files to (or create a new one)
- Click on Import
- Run minfi read450k tool with the following parameters:
- param-files “red channel files”: all files ending in
- param-files “green channel files”: all files ending in
- Inspect the generated set of data
How are the Green and Red signals stored?
Green and Red micro arrays are built up into
Preprocessing and Quality Assessment
Preprocessing and data quality assurance is an important step in Infinium Methylation Assay analysis.
RGChannelSet represents two colour data with a green and a red channel and can be converted into methylated and unmethylated signals assigned to
MethylSet or into Beta values build in
RatioSet. Users can convert from
MethylSet using the minfi mset tool or compute Beta values using minfi set tool. The minfi qc tool tool extracts and plots the quality control data frame with two columns mMed and uMed which are the medians of
MethylSet signals (Meth and Unmeth). Comparing them against one another allows users to detect and remove low-quality samples.
hands_on Hands-on: Preprocessing
- Run minfi mset tool on the output from minfi_read450K to create
- Run minfi qc tool on the ‘MethylSet’ object created in the previous step to estimate sample-specific quality control. Once it is done click on the galaxy-eye (eye) icon next to the QC filename in your history to view QC image.
- Once you have inspected the quality control run minfi rset tool on the ‘MethylSet’output from minfi_read450K to convert methylation data from the
MethylSetto a ratio of the methylated probe intensity and the overall intensity (sum of methylated and unmethylated probe intensities)
- Run minfi maptogenome tool on the output from the previous step to map ratio data to the genomic positions
comment Preprocess and Normalize data
If your files require normalisation, you might prefer to use one of the other preprocessing tools provided in EWAS suite i.e. minfi ppfun tool or minfi ppquantile tool look for recommendation at (ref).
Removing probes affected by genetic variation
Incomplete annotation of genetic variations such as single nucleotide polymorphism (SNP) may affect DNA measurements and interfere results from downstream analysis.
Hansen, J. P. 2014 highly recommended to remove the probes that contain either a SNP at the methylated loci interrogation or at the single nucleotide extension
hands_on Hands-on: Removing probes affected by genetic variation
- Run minfi dropsnp tool on the output of minfi_maptogenome to remove the probes that contain either a SNP at the methylated loci interrogation or at the single nucleotide extension
DMPs and DMRs Identification
The main goal of the EWAS suite is to simplify the way differentially methylated loci sites are detected. The EWAS suite contains minfi dmp tool tool detecting differentially methylated positions (DMPs) with respect to a phenotype covariate, and more complex minfi dmr tool solution for finding differentially methylated regions (DMRs). Genomic regions that are differentially methylated between two conditions can be tracked using a bumphunting algorithm. The algorithm first implements a t-statistic at each methylated loci location, with optional smoothing, then groups probe into clusters with a maximum location gap and a cutoff size to refer the lowest possible value of genomic profile hunted by our tool.
From the next step we will need an additional file that annotates illumina450K with genomic features, which we can obtain from UCSC.
hands_on Hands-on: Data upload from UCSC
UCSC Mainin the tool search bar (top left)
UCSC Maintool. You will be taken to the UCSC table browser
- Set the following options:
Feb. 2009 (GRCh37/hg19)
- “output format”:
GTF - gene transfer (limited)
- “Send output to”:
- Click on the get output button at the bottom of the screen
- On the next page, click on the Send Query to Galaxy button
- Wait for the upload to finish
hands_on Hands-on: DMPs and DMRs Identification
phenotypeTable.txtfrom Zenodo or data library:
- Run minfi dmp tool on the output of the minfi dropsnp tool adjusting the following parameters
- param-file “Input set”:
- param-file “Phenotype Table”:
- “Phenotype Type”:
- “qCutoff Size”:
0.5(DMPs with an FDR q-value greater than this will not be returned)
- “Variance Shrinkage”:
TRUE(This is recommended when sample sizes are small <10}
- “Genome Table”:
- Run minfi dmr tool on the output of minfi_dmp adjusting the following parameters
- param-file “Input set”:
- param-file “Phenotype Table”:
- “maxGap Size”:
- “coef Size”:
- “Cutoff Size”:
- Click on the minfi dmr tool output in your history to expand it
- Set the database build of your dataset to
Human Feb. 2009 (GRCh37/hg19) (hg19)
tip Tip: Changing Database/Build (dbkey)
- Click on the galaxy-pencil pencil icon for the dataset to edit its attributes
- In the central panel, change the Database/Build field
- Select your desired database key from the dropdown list:
- Click the Save button
- Click on
display at UCSCtowards the bottom of the history item. This will launch UCSC Genome Browser with your Custom Track
comment Phenotype table
Phenotype table can be in different sizes with different arguments, however the second column is required to contain phenotype covariate information for each sample.
How do we define phenotype covariate?
Phenotype covariate is the set of observable characteristics of an individual resulting from the gene-environment interactions
Annotation and Visualization
In addition to downstream analysis users can annotate the differentially methylated loci at the promoter regions of genes with gene function description, and relationships between these concepts.
hands_on Hands-on: Annotate Differentially Methylated Position
- Run chipeakanno annopeaks toolon the output of minfi_dmp with the following parameters
- param-file “Differentialy methylated data”: output of minfi dmp tool
“Additional Column of Score”:
Position of column of score optional value if it is required
- Cut tool on the previous output adjusting the following parameters to cut “gene_name” column from table of annotated peaks and then get a list of geenes
- “Cut columns”:
- “Delimited by”:
- param-file “From”: output of chipeakanno annopeaks tool
- Remove beginning tool of
Gene Listwith the following parameters
- “Remove first”:
- param-file “from”: output of Cut tool
- Run clusterProfiler bitr tool on the previous output adjusting the following parameters to convert the list of genes to list of entrez ID
- “Input Type Gene ID”:
- “Output Type Gene ID”:
- Use the output of the clusterProfiler bitr tool to run a GO Enrichment Analysis using clusterProfiler go
|GO:0048732||gland development||1.38E-58||4.23E-55||PTGS2 / KCNC1 / FZD1 /SLC22A18 /SLC22A3 (…)||372|
|GO:1901652||response to peptide||3.99E-57||8.13E-54||SULF1/ LAMA5/ MED1 /CFLAR/ MSX2 (…)||359|
|GO:0048545||response to steroid hormone||1.38EE-54||2.11E-51||HDAC9/ RAB10/ CFLAR/ WDTC1 (…)||394|
Epigenetic aberrations which involve DNA modifications give researchers an interest in identifying novel non-genetic factors responsible for complex human phenotypes such as height, weight, and disease. To identify methylation changes researchers need to perform complicated and time consuming computational analysis. Here, the EWAS suite becomes a solution for this inconvenience and provides a simplified downstream analysis available as a ready to run pipline in supplementary materials.
keypoints Key points
EWAS analyse changes in DNA methylation in cohorts of different individuals effected by diseases
DNA methylation changes can be triggered by environment (i.e. sun exposure, diet) and effect the way how our cells works and in long term cause cancer
- Hugo, W., H. Shi, L. Sun, M. Piva, C. Song et al., 2015 Non-genomic and Immune Evolution of Melanoma Acquiring MAPKi Resistance. Cell 162: 1271–1285. 10.1016/j.cell.2015.07.061
congratulations Congratulations on successfully completing this tutorial!
Did you use this material as an instructor? Feel free to give us feedback on how it went.
feedback Give us even more feedback on this content!
To give us more detailed feedback about these materials, please take a moment to fill in the extended Feedback Form.