top of page
ATAC-seq
ATAC-seq (Assay for Transposase-Accessible Chromatin using high-throughput sequencing) is a method to investigate chromatin accessibility across the genome. It's based on a modified hyperactive Tn5 transposase that binds open chromatin regions, inserts DNA sequences corresponding to sequencing adapters and fragments the DNA. Fragments are then used for library preparation and then for sequencing.
Click here for the original protocol paper, here for an adaptation of the protocol (Abcam), and here for additional info and references (Illumina) .
​
ENCODE Guidelines:
​
- Sequence at least 2 biological replicates per experiment
- Sequencing may be paired- or single-ended, but paired-ended is preferred
- Each replicate should have 25 million non-mitochondrial mapped fragments (50 million paired-ended reads)
- The detailed ENCODE ATAC-seq analysis pipeline is available here
- To obtain the required number of reads, libraries can be multiplexed (i.e. pooled and sequenced simultaneously during a single run). The number of reads obtained will depend on (1) the number of samples you multiplex and (2) the number of reads you get per run, which varies across sequencing platforms
​
Pipeline for Data Analysis
STEP 1 : Sequencing QC
Downloading raw data
Raw reads exist in as FASTQ. File extensions include .fastq or .fq, and fastq.gz (gunzip compressed). FASTQ data can also be compressed by the Short Read Archive and exist as SRA (.sra) file. This is commonly found in public repositories such as GEO. Sra files can be converted into fastq using the sratoolkit fastq-dump.
File type:
SRA
Tool:
sratoolkit
File type:
FASTQ
QC on FASTQ reads
It is recommended to perform some quality control checks on the FASTQ data using FastQC. The first step is to check the presence of the Nextera adapters and the quality of the reads. If needed, adapters, short reads and low quality bases can be trimmed using Cutadapt. In this case, it is recommended to trim reads to a fixed length for all samples prior to alignment.
STEP 2 : Alignment
Mapping reads
After FASTQ files have undergone the necessary QC, they have to be mapped to a reference genome. It is important that all samples being compared are mapped to the same version of the genome (genome assembly). The first step is to download an index for the genome of choice. The alignment step tends to be the most time consuming and the files generated are very large in size (several Gb). A good mapping tool is Bowtie2.
File type:
FASTQ
File type:
SAM
File type:
BAM
QC Tools:
Remove mt and low q -> samtools view
Remove PCR dup -> samtools rmdup
Plot fragment size -> ATACseqQC
File formats
STAR will produce an output in BAM format, whereas bowtie2 will produce an output in SAM format. This needs to be converted into BAM for several downstream applications. However, some peakcalling tools require SAM format instead of BAM. These formats can be interconverted using samtools view. In addition, mapped reads can also be stored in CRAM format (a compressed and smaller alternative to SAM/BAM, check cramtools for more). This too can be converted to BAM using the samtools view. Aligned reads in BAM can also be converted back into FASTQ using samtools fastq.
QC on aligned reads
BAM files can be filtered to remove mitochondrial reads and low mapping quality reads using samtools view or bamtools. PCR duplicates can be removed with samtools rmdup. It's also important to plot the insert size (the size of the DNA fragment). It should show a periodicity of 150/200 bp (nucleosome size), and can be plotted using ATACseqQC.
​
Merge / Sort / Index
When the same library has been sequenced across different lanes it will be necessary to merge the different BAM files using samtools merge. Some downstream applications require you to sort and index the BAM files and this can be done using samtools sort and index commands.
STEP 3 : Peak Calling
Calling peaks
The peak calling step is essential in identifying regions in the genome with accessible chromatin. Generally, ATAC-seq produce both "narrow" and "broad" peaks, therefore parameters to use need fine-tuning for each experiment. MACS2 is a good tool to use, with different settings if you want to focus on looking for where the 'cutting sites' are or for single nucleosome detection (see --shift parameters here for more info).
File type:
BAM
Tool:
MACS2
File type:
BED
Reproducibility and Differential Enrichment
Apart from basic overlap of peaks, several tools are available to assess the reproducibility of peaks between biological replicates. This allows the identification of only statistically significant reproducible peaks with reduced false positives. ENCODE recommends IDR, the Irreproducible Discovery Rate.
Identifying peaks differentially enriched between different samples or treatments is also possible using the R package DiffBind.
STEP 4 : Visualisation
File type:
BAM
QC on coverage reproducibility
Reproducibility can be checked by plotting the correlation of the read coverage among biological replicates using deeptools plotCorrelation. Other features such as checking GC bias or ChIP strength are also available.
File type:
BEDGRAPH
File type:
BIGWIG
Genome coverage
This step generates a BIGWIG (.bw) file containing the read coverage over every chromosome. This also allows for normalization which makes it possible to compare different samples/treatments in the same experiment. An intermediary format is the .bedgraph which can also be visualise in certain browsers.
Enrichment profile
Files in this format can be uploaded directly to a genome browser for inspection. BED files can also be uploaded and allow you to visualise both the coverage profile and the result of the peak calling in the genes of interest. At a global level, the overall enrichment across a set of regions or peaks can be plotted (e.g. around the TSS) and clustered using tools such as plotHeatmap, Genomation or ChIPseeker. BIGWIG and BED files are relatively small can be easily used locally.
STEP 5 : Functional Analysis & Motif Discovery
Annotation and Gene Ontology
In order to explore the functional relevance of the peaks identified, these can be annotated (e.g. to the nearest gene or TSS) or plotted relative to their genome location (e.g. % peaks in promoters, intergenic, etc). Peaks can be analysed for ontologies associated with the corresponding genes or regions. The tools here listed are an example of many that perform these and several additional functions.
File type:
BAM
File type:
BED
File type:
BIGWIG
Finding Motifs and foot-printing analysis
Motif discovery consists on finding over-represented DNA sequences that are significantly more frequent in a set of peaks than would expect by chance (i.e. compared against a background). These tools offer a wide range of options including motif de novo discovery, motif enrichment and motif scanning (MEME, Homer).
In addition, In order to investigate TFs occupancy using ATAC-seq, it is possible do foot-printing analysis. Indeed, the DNA corresponding to a binding motif is selectively resistant to digestion by Tn5, therefore leaving a “footprint” when a TF is binding a specific site in the genome (FLR, Wellington).
Legend:
should be performed on a cluster (via terminal)
time consuming step
important QC step
can be performed locally (via terminal)
can be performed on Galaxy (web interface)
​
​
Where to run these tools:
Terminal (local or cluster): sratoolkit, FastQC, cutadapt, trimmomatic, FastX, bowtie2, samtools, bamtools, MACS2, Homer, IDR, bedtools, UCSC aplication binaries, deeptools, ChromHMM
Galaxy (web interface): sratoolkit, FastQC, cutadapt, trimmomatic, bowtie2, samtools, bamtools, MACS2, IDR, bedtools, deeptools, MEME
RStudio: DiffBind, ChIPseeker, ChIPpeakanno, clusterProfiler, genomation
Download software: Seqmonk, IGV
Web-interface: UCSC browser, GREAT, MEME suite
References:
Buenrostro, J., Giresi, P., Zaba, L. et al. Transposition of native chromatin for fast and sensitive epigenomic profiling of open chromatin, DNA-binding proteins and nucleosome position. Nat Methods 10, 1213–1218 (2013). https://doi.org/10.1038/nmeth.2688
https://www.abcam.com/epigenetics/epigenetics-application-spotlight-atac-seq
https://emea.illumina.com/techniques/popular-applications/epigenetics/atac-seq-chromatin-accessibility.html
bottom of page