Introduction to Transcriptomics


last_modification Last modification: Dec 20, 2021

What is RNA sequencing?


The structure of a eukaryotic protein-coding gene

.footnote[Credit: Thomas Shafee]

RNA sequencing

Summary of RNA-Seq principle. In vivo transcription, pre-mRNA, intron splicing all rpoduce a mature mRNA. In vitro this is fragmented into RNA fragments, reverse transcribed into double stranded cDNA and then sequenced.

.footnote[Credit: Thomas Shafee (adapted)]

Where does my data come from?

select a cell population and extract total RNA is shown at the top. Small RNA are size selected by PAGE or kit, an adapter ligated, and converted to cDNA. Or poly(a) selects ribosome minus, and those mRNA are fragmented, and converted to cDNA. In both cases the cDNA becomes a library for sequencing.

.footnote[Zang and Mortazavi, Nature, 2012]

Principle of RNA sequencing

Cartoon showing a magazine stand labelled transcriptome, and a person saying "I'll take all of them". These are run through a shredder, before hundreds of people attempt to re-assemble, and the person hands the professor a poorly assembled magazine.

.footnote[Korf, Nat Met, 2013]

Challenges of RNA sequencing

Benefits of RNA sequencing

A word cloud highlighting words like benefits, novel, sensitivity.

2 main research applications for RNA-Seq

How to analyze RNA seq data for RNA quantification?

RNA quantification

Select RNA fraction of interest (polyA, ribo-minus, and others), these are fragmented and reverse transcribed before sequencing and mapping onto the genome and quantification.

.footnote[Pepke et al, Nat Met, 2009]

Overview of the Data Processing

Control and treatment files goes through QC, annotation, and rad counting to produce sets of count tables. Then differential expression analysis is computed.

Data Pre-processing

  1. Adapter clipping to trim the sequencing adapters
  2. Quality trimming to remove wrongly called and low quality bases

.footnote[See NGS Quality control]

Annotation of RNA-Seq reads

Simple mapping on a reference genome? More challenging

A cartoon of a pre-mRNA, intro and exons. These map to an mRNA and short reads are shown piled up against the mRNA. The short read is splity by intron when aligning to a reference genome.

.footnote[Credit: Rgocs]

Annotation of RNA-Seq reads

3 main strategies for annotations

Transcriptome mapping

Cartoon of multiple exons collapsed, and paired end reads being shown as easy to align.

See NGS Mapping

.footnote[Figures by Ernest Turro, EMBO Practical Course on Analysis of HTS Data, 2012]

Genome mapping

Splice-aware read alignment

The same cartoon again, but now it is shown split up by introns, and one of the paired end reads is split across three exons, so it is hard to align.

Detection of novel genes and isoforms

.footnote[Figures by Ernest Turro, EMBO Practical Course on Analysis of HTS Data, 2012]

Transcriptome and Genome mapping


Where to find?

De novo transcriptome assembly

No need for a reference genome …

  1. Assembly into transcripts
  2. Map reads back


What is the expression level of the genomic features?

Differential Expression Analysis

.image-75[Three conditions are created and multiple transcriptomics sequenced into reads and mapped and compared.]

Account for variability of expression across biological replicates
with the help of counts

Differential Expression Analysis: Normalization

Make the expression levels comparable across

.footnote[“Only the DESeq and TMM normalization methods are robust to the presence of different library sizes and widely different library compositions…” - Dillies et al., Brief Bioinf, 2013]

Impact of sequencing depth and number of replicates

.image-75[Image of a table from a paper. The recommendation is at least three biological replicates to accurately detect changes. 3 replicates will give you an 87% chance of detecting a 2-fold change, but only a 17% chance of detecting a 1.25 fold change.]

.footnote[Conesa et al, Genome Biol, 2016]

Recommendation: At least 3 biological replicates

