Why should I use DFilter ?


There are already many peak calling tools for next gen.. sequencing tag-profiles, why one should use DFilter. The answer to this question has been answered in first figure of the manuscript (Kumar et al. Nature biotechnology, 2013). One can notice that when input control in not provided the most popular program fail to provide genuine enriched sites(peaks). In many projects it is too costly to generate input control such as while doing large number of chip-seq it is too expensive to have one input control for each chip-seq. Especially for valuable clinical samples from large cohorts.

Some assays do not need input control, such as Dnase-seq, FAIRE-seq etc. Hence there is risk of detecting false peaks (for example in duplicated genomic regions). DFilter provides good performance even in absence of control and can easily avoid such artifact due to duplicated regions and other bias, due to use of zero-mean and matched filter.

Dfilter also provides a uniform platform to analyze all kinds of next gen. tag profiles from different kinds of assays.


I am not able to run DFilter on my PC or server. What should I do ?


Just write an email to author ( kumarv1(at) gis.a-star.edu.sg). It would be wise to create a temporary guest account in your PC or server and let the authors install the DFilter for you. Dfilter would also be available on Basespace, so researchers having data on Basespace do not have to install it in their computer.



What should be the P-value threshold for peak count ?


This is an open question and other peak calling tools are also struggling with the same issue. However we have figured out the trend in P-value for different kinds of assays and DFilter parameters. This can be listed below


a) for focal histone chip-seq such as H3K27ac, H3K4me3 etc (using zero mean filter – default) : P-value below 1E-6 (or 1E-7) would be good. In other words 6th column in peak file should be above 6 or 7.


b) for transcription factor chip-seq (using option -nonzero ) : P-value below 1E-6 would be good. In other words 6th column in peak file should be above 6 or 7.


c)for wide peak chip-seq such as H3K36me3, H2K27me3 etc (using option -nonzero ) : P-value below 1E-3 would be good. In other words 6th column in peak file should be above 2.5 or 3.


  1. for open chromatin assays such as Dnase-seq, FAIRE-seq etc : P-value below 1E-3 would be good. In other words 6th column in peak file should be above 3.


However it advisable to visualize the peaks in UCSC browser after making custom track of your data (using option -wig), to make sense of P-value.

Hence first run DFilter with low stringency (-std=2) to get high number of peaks then use awk command

to choose peaks above certain threshold. ( awk '$6 > threshold' peaks.bed > new-peaks.bed )



What could be the benefit of Normalized tag count for peaks (Norm_tag_count) ?

The normalized tag count is the mean tag count in peaks after normalization for sequencing depth and input control signal. It could provide an initial idea of differential peaks among different libraries. Sometimes it is wise to choose peaks according to normalized tags counts also, especially if you are not sure that parameters you passed to DFilter was suitable for your assay or not (but it is rare).