View markdown source on GitHub

Introduction to ATAC-Seq data analysis




last_modification Published: Apr 8, 2021
last_modification Last Updated: Jul 26, 2021

Where does my data come from?

Low resolution image from a paper with two subfigures a and b. In A closed and open chromatin exists, then tn5 transposase attaches to open chromatin regions. these fragments are amplifiable and become amplified and sequenced. In subfigure B open chromatin is shown as two strands in black, with a common end in grey on both strands' both ends. Adapter 1 and 2 are attached to 5' ends. this is extended at 72° for 5 minutes, PCR'd using barcoded primers, and then the final product is created for sequencing which has open chromatin regions, surrounded by a common end, and then the adapters with a barcode on adapter 2.

Buenrostro et al. 2013 Nat Methods

Speaker Notes

Characteristics of ATAC-Seq design

Speaker Notes

How to analyze ATAC-Seq data?

Speaker Notes

Check the Insert Size

.image-50[ A bar chart mapping frequency to insert size in base pairs. The graph data starts at insert size 20, there is a large sharp peak around 50 to 2000 frequency, and this rapidly decreases to ~200 frequency. There are additional small and broad peaks at insert size 200, 400, and 600 but the graph steadily decreases. ]

Speaker Notes

Do not worry about a nucleotide bias

Fastqc plot showing sequence content as percent of the four bases, A C T G as a function of their position in the read. Below position 17, the lines are extremely jagged showing widely varying distribution of nucleotides that changes base by base. After position 17 in the read this stabilises to ~22% CG/~28% AT.

Speaker Notes

Filtering Reads

Speaker Notes

Peak Calling

.image-50[ Cartoon of nucleosomes shown as large spheres with dna sequence wrapped aroud them and connecting them like lights on a string. Tn5 is binding to the DNA connecting the nucleosomes, and transcription factor is bound to the Tn5 molecules. Reads are shown as arrows pointing towards each other, below the Tn5 molecules. ]

Speaker Notes


Schematic of the ATAC-Seq workflow. dataset R1 and R2 are run through FastQC which points to a cloud labelled "QC Measures". The two datasets also go through cutadapt to produce trimmed r1 and r2. This goes to bowtie2 which produces alignments, and another cloud labelled QC measures. The alignments output goes to bamtools filter to produce filtered alignments, which then goes to mark duplicates to produce a filtered without duplicate alignments file. This is sent to MACS2 which produces coverage and peaks outputs. Those, in combination with annotations for the genome are sent to deepTools for heatmaps summarizing multiple regions, and pyGenomeTracks for tracks displaying a specific region.

Speaker Notes

Thank you!

This material is the result of a collaborative work. Thanks to the Galaxy Training Network and all the contributors! Galaxy Training Network Tutorial Content is licensed under Creative Commons Attribution 4.0 International License.