Menu

FAQ

Robert Kofler
Attachments
min-maxdist.png (20691 bytes)

Which physical coverage do I need for my sample

The required physical coverage depends on the aim of the study. The main question is TE insertions of which population frequency do I want to recover? Than the required physical coverage (pc) can be computed like in the following equation

pc = minimum_count / population_frequency

Where minimum_count is provided during identification of TE signatures with PoPoolationTE2. To guard against chimeric reads leading to false positive TE insertions we recommend to use at least 2. The population_frequency is the minimum population frequency of TE insertions that should be recovered.

As an example, when using a minium count of 2 and the aim is to recover all TE insertions with a frequency of 1% or higher than a physical coverage of pc = 2/0.01 = 200 is need.
Personal advice add a savety margin, this could be helpful when subsampling the physical coverage.

How many reads do I need to achieve a physical coverage of for example 100

The number of required paired ends can be computed from the targeted physical coverage (pc), the genome size (genome_size) and the inner distance between paired ends (= fragment_length - 2 * read_length).

paired_ends = pc * genome_size / id

For example, with D. melanogaster (genome size = 180Mbp) , a targeted physical coverage of 200, and a insert size of 150, than 240 million paired ends are requied (=200 * 180/150).
The number of reads depends on the service provider, but 240 million reads are usually achieved with one Illumina paired-end lane.

During pairing-up signatures, how to find the optimal min and max distance

[revers to java -jar popte2.jar pairupSignatures] During testing the performance of PoPoolationTE2 using simulated data we found that the default values, --min-distance -200 and --max-distance 300, yield reliable results for many sets of parameters, i.e. many true positives, few false positives and many TE insertions where both signatures could be identified.

However it is possible to optimize this parameters by running pairing up mulitple times using different min and max distances. When plotting the distance vs the number of identified insertion an optimal distance may be found where the number of identified insertions levels off. In the following example we used the real data published by Kofler et al. (2012) http://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1002487

In this example we would recommend a --min-distance -100 and --max-distance 200 as here the maximum number of signatures are paired (resulting in the fewest TE insertions) and the parameters are most conservative (little pairing of unrelated signatures).

How user-friendly is PoPoolationTE2?

PoPoolationTE2 is a command line tool and thus it certainly lacks a user-friendly GUI. However I would still consider PoPoolationTE2 userfriendly because:

  • it is provided as a single jar file (Java archive) and does not require installing any other third-party tools (apart from an mapper to align reads to the TE-merged-reference). This can be quite an advantage considering that pipelines that rely heavily on third-party tools may quickly break, for example if parameters change or input/output formats. For example I have used pipelines that only work with certain versions of samtools and this was not even documented. Because PoPoolationTE2 does not rely on third party tools it is quite robust
  • we provide driver scripts for automating the analysis with PoPoolationTE2
  • we provide a detailed illustrated Manual [Manual]
  • we provide a detailed Walkthrough using real data [Walkthrough]
  • PoPoolationTE2 provides detailed error messages that aim to capture wrong input files/parameters early in the analysis
  • PoPoolationTE2 provides detailed log messages which allow to track the current status and to trace problems during the analysis

Related

Wiki: Home
Wiki: Manual
Wiki: Walkthrough

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.