name: inverse layout: true class: center, middle, inverse
--- # Introduction to Transcriptomics --- ## Requirements Before diving into this slide deck, we recommend you to have a look at: - [Introduction to Galaxy Analyses](/archive/2019-11-01/topics/introduction) - [Sequence analysis](/archive/2019-11-01/topics/sequence-analysis) - Quality Control: [
slides
slides](/archive/2019-11-01/topics/sequence-analysis/tutorials/quality-control/slides.html) - [
tutorial
hands-on](/archive/2019-11-01/topics/sequence-analysis/tutorials/quality-control/tutorial.html) - Mapping: [
slides
slides](/archive/2019-11-01/topics/sequence-analysis/tutorials/mapping/slides.html) - [
tutorial
hands-on](/archive/2019-11-01/topics/sequence-analysis/tutorials/mapping/tutorial.html) .footnote[Tip: press `P` to view the presenter notes] ??? Presenter notes contain extra information which might be useful if you intend to use these slides for teaching. Press `P` again to switch presenter notes off --- # What is RNA sequencing? --- ### RNA  - Transcribed form of the DNA - Active state of the DNA .footnote[[Credit: Thomas Shafee](https://en.wikipedia.org/wiki/File:Gene_structure_eukaryote_2_annotated.svg)] --- ## RNA sequencing  - RNA quantification at single base resolution - Cost efficient analysis of the whole transcriptome in a high-throughput manner .footnote[[Credit: Thomas Shafee (adapted)](https://commons.wikimedia.org/wiki/File:Summary_of_RNA-Seq.svg)] --- ### Where does my data come from?  .footnote[[*Zang and Mortazavi, Nature, 2012*](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4138050/)] --- ### Principle of RNA sequencing  .footnote[[*Korf, Nat Met, 2013*](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4461013/)] --- ### Challenges of RNA sequencing - Different origin for the sample RNA and the reference genome - Presence of incompletely processed RNAs or transcriptional background noise - Sequencing biases (*e.g.* PCR library preparation) --- ### Benefits of RNA sequencing  --- ### 2 main research applications for RNA-Seq - Transcript discovery > *Which RNA molecules are in my sample?* > Novel isoforms and alternative splicing, Non-coding RNAs, Single nucleotide variations, Fusion genes - RNA quantification > *What is the concentration of RNAs?* > Absolute gene expression (within sample), Differential expression (between biological samples) --- ## How to analyze RNA seq data for RNA quantification? --- ### RNA quantification  .footnote[[*Pepke et al, Nat Met, 2009*](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4077321/)] --- ### Overview of the Data Processing  - No available standardized workflow - Multiple possible best practices for every dataset --- ### Data Pre-processing 1. Adapter clipping to trim the sequencing adapters 2. Quality trimming to remove wrongly called and low quality bases .footnote[See [NGS Quality control](../../NGS-QC/slides/index.html)] --- ### Annotation of RNA-Seq reads Simple mapping on a reference genome? More challenging  .footnote[[Credit: Rgocs](https://en.wikipedia.org/wiki/File:RNA-Seq-alignment.png)] --- ### Annotation of RNA-Seq reads 3 main strategies for annotations - Transcriptome mapping - Genome mapping - *De novo* transcriptome assembly and annotation --- ### Transcriptome mapping  *See [NGS Mapping](../../NGS-mapping/slides/index.html)* - Need reliable gene models - No detection of novel genes .footnote[Figures by Ernest Turro, EMBO Practical Course on Analysis of HTS Data, 2012] --- ### Genome mapping Splice-aware read alignment  Detection of novel genes and isoforms .footnote[Figures by Ernest Turro, EMBO Practical Course on Analysis of HTS Data, 2012] --- ### Transcriptome and Genome mapping Needed - Reference genome/transcriptome in FASTA - Annotations of known genes, ... in GTF Where to find? - Joint projects to produce and maintain annotations on selected organisms: EMBL-EBI, UCSC, RefSeq, Ensembl, ... --- ### *De novo* transcriptome assembly No need for a reference genome ... 1. Assembly into transcripts 2. Map reads back --- ### Quantification *What is the expression level of the genomic features?* - Counting the number of reads per features: Easy!! - Challenges - How to handle multi-mapped reads (*i.e.* reads with multiple alignments)? - How to distinguish between different isoforms? - At gene level? - At transcript level? - At exon level? --- ### Differential Expression Analysis .image-75[] Account for variability of expression across biological replicates<br>with the help of counts --- ### Differential Expression Analysis: Normalization *Make the expression levels comparable across* - By Features: genes, isoforms - By Samples - Methods - [*FPKM/RPKM*](https://www.nature.com/nmeth/journal/v5/n7/abs/nmeth.1226.html) (Cufflinks/Cuffdiff) - [*TMM*](https://genomebiology.biomedcentral.com/articles/10.1186/gb-2010-11-3-r25) (edgeR) - [*DESeq2*](https://genomebiology.biomedcentral.com/articles/10.1186/s13059-014-0550-8) (DESeq2) .footnote[*"Only the DESeq and TMM normalization methods are robust to the presence of different library sizes and widely different library compositions..."* - Dillies et al., Brief Bioinf, 2013] --- ### Impact of sequencing depth and number of replicates .image-75[] .footnote[[*Conesa et al, Genome Biol, 2016*](https://genomebiology.biomedcentral.com/articles/10.1186/s13059-016-0881-8)] **Recommendation: At least 3 biological replicates** ??? - Number of replicates has greater effect on DE detection accuracy than sequencing depth (more replicates = increased statistical power) - DE detection of lowly expressed genes is very sensitive to number of reads and replication - DE detection of highly expressed genes possible already at low sequencing depth --- ### Visualization - Integrative Genomics Viewer ([*IGV*](https://bib.oxfordjournals.org/content/14/2/178.full?keytype=ref&%2520ijkey=qTgjFwbRBAzRZWC)) or Trackster Visualization of the aligned BAM files - [*Sashimi plots*](https://bioinformatics.oxfordjournals.org/content/early/2015/01/21/bioinformatics.btv034) Quantitative visualization of read coverage along exons and splice junctions - [*CummeRbund*](http://compbio.mit.edu/cummeRbund/manual_2_0.html) Visualization package for Cufflinks high-throughput sequencing data --- ## Related tutorials --- ## Thank you! This material is the result of a collaborative work. Thanks to the [Galaxy Training Network](https://wiki.galaxyproject.org/Teach/GTN) and all the contributors!
Bérénice Batut
,
Anika Erxleben
,
Markus Wolfien