The required physical coverage depends on the aim of the study. The main question is TE insertions of which population frequency do I want to recover? Than the required physical coverage (pc) can be computed like in the following equation
pc = minimum_count / population_frequency
Where minimum_count is provided during identification of TE signatures with PoPoolationTE2. To guard against chimeric reads leading to false positive TE insertions we recommend to use at least 2. The population_frequency is the minimum population frequency of TE insertions that should be recovered.
As an example, when using a minium count of 2 and the aim is to recover all TE insertions with a frequency of 1% or higher than a physical coverage of pc = 2/0.01 = 200 is need.
Personal advice add a savety margin, this could be helpful when subsampling the physical coverage.
The number of required paired ends can be computed from the targeted physical coverage (pc), the genome size (genome_size) and the inner distance between paired ends (= fragment_length - 2 * read_length).
paired_ends = pc * genome_size / id
For example, with D. melanogaster (genome size = 180Mbp) , a targeted physical coverage of 200, and a insert size of 150, than 240 million paired ends are requied (=200 * 180/150).
The number of reads depends on the service provider, but 240 million reads are usually achieved with one Illumina paired-end lane.
[revers to java -jar popte2.jar pairupSignatures] During testing the performance of PoPoolationTE2 using simulated data we found that the default values, --min-distance -200 and --max-distance 300, yield reliable results for many sets of parameters, i.e. many true positives, few false positives and many TE insertions where both signatures could be identified.
However it is possible to optimize this parameters by running pairing up mulitple times using different min and max distances. When plotting the distance vs the number of identified insertion an optimal distance may be found where the number of identified insertions levels off. In the following example we used the real data published by Kofler et al. (2012) http://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1002487
In this example we would recommend a --min-distance -100 and --max-distance 200 as here the maximum number of signatures are paired (resulting in the fewest TE insertions) and the parameters are most conservative (little pairing of unrelated signatures).
PoPoolationTE2 is a command line tool and thus it certainly lacks a user-friendly GUI. However I would still consider PoPoolationTE2 userfriendly because: