Recent changes to FAQ

FAQ modified by Robert Kofler

Robert Kofler — Wed, 18 May 2016 16:55:20 -0000

--- v18
+++ v19
@@ -38,7 +38,7 @@
 ## How user-friendly is PoPoolationTE2?##
 PoPoolationTE2 is a command line tool and thus it certainly lacks a user-friendly GUI. However I would still consider PoPoolationTE2 userfriendly because:

- * it is provided as a single jar file (Java archive) and does not require installing any other third-party tools (apart from an mapper to align reads to the TE-merged-reference). This can be quite an advantage considering that pipeline that rely heavily on third-party tools may quickly break, for example if parameters change or input/output formats. For example I have used tools that only work with certain versions of samtools and this was not documented. Thus because PoPoolationTE2 does not rely on third party tools it is quite robust
+ * it is provided as a single jar file (Java archive) and does not require installing any other third-party tools (apart from an mapper to align reads to the TE-merged-reference). This can be quite an advantage considering that pipelines that rely heavily on third-party tools may quickly break, for example if parameters change or input/output formats. For example I have used pipelines that only work with certain versions of samtools and this was not even documented. Because PoPoolationTE2 does not rely on third party tools it is quite robust
  * we provide driver scripts for automating the analysis with PoPoolationTE2
  * we provide a detailed illustrated Manual [Manual]
  * we provide a detailed Walkthrough using real data [Walkthrough]

FAQ modified by Robert Kofler

Robert Kofler — Wed, 04 May 2016 13:50:34 -0000

--- v17
+++ v18
@@ -35,4 +35,15 @@

 [[img src=min-maxdist.png]]

+## How user-friendly is PoPoolationTE2?##
+PoPoolationTE2 is a command line tool and thus it certainly lacks a user-friendly GUI. However I would still consider PoPoolationTE2 userfriendly because:

+ * it is provided as a single jar file (Java archive) and does not require installing any other third-party tools (apart from an mapper to align reads to the TE-merged-reference). This can be quite an advantage considering that pipeline that rely heavily on third-party tools may quickly break, for example if parameters change or input/output formats. For example I have used tools that only work with certain versions of samtools and this was not documented. Thus because PoPoolationTE2 does not rely on third party tools it is quite robust
+ * we provide driver scripts for automating the analysis with PoPoolationTE2
+ * we provide a detailed illustrated Manual [Manual]
+ * we provide a detailed Walkthrough using real data [Walkthrough]
+ * PoPoolationTE2 provides detailed error messages that aim to capture wrong input files/parameters early in the analysis
+ * PoPoolationTE2 provides detailed log messages which allow to track the current status and to trace problems during the analysis
+
+
+

FAQ modified by Robert Kofler

Robert Kofler — Fri, 29 Apr 2016 14:05:58 -0000

--- v16
+++ v17
@@ -31,7 +31,7 @@

 However it is possible to optimize this parameters by running pairing up mulitple times using different min and max distances. When plotting the distance vs the number of identified insertion an optimal distance may be found where the number of identified insertions levels off. In the following example we used  the real data published by Kofler et al. (2012) http://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1002487

-In this example we would recommend a *--min-distance -100* and *--max-distance 200* as here the maximum number of signatures are paired and the parameters are most conservative (little pairing of unrelated signatures).
+In this example we would recommend a *--min-distance -100* and *--max-distance 200* as here the maximum number of signatures are paired (resulting in the fewest TE insertions) and the parameters are most conservative (little pairing of unrelated signatures).

 [[img src=min-maxdist.png]]

FAQ modified by Robert Kofler

Robert Kofler — Fri, 29 Apr 2016 14:05:14 -0000

--- v15
+++ v16
@@ -31,7 +31,7 @@

 However it is possible to optimize this parameters by running pairing up mulitple times using different min and max distances. When plotting the distance vs the number of identified insertion an optimal distance may be found where the number of identified insertions levels off. In the following example we used  the real data published by Kofler et al. (2012) http://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1002487

-In this example we would recommend a *--min-distance -100* and *--max-distance 200*
+In this example we would recommend a *--min-distance -100* and *--max-distance 200* as here the maximum number of signatures are paired and the parameters are most conservative (little pairing of unrelated signatures).

 [[img src=min-maxdist.png]]

FAQ modified by Robert Kofler

Robert Kofler — Fri, 29 Apr 2016 14:04:09 -0000

--- v14
+++ v15
@@ -27,9 +27,11 @@
 The number of reads depends on the service provider, but 240 million reads are usually achieved with one Illumina paired-end lane.

 ## During pairing-up signatures, how to find the optimal min and max distance##
-[java -jar popte2.jar pairupSignatures] During testing the performance of PoPoolationTE2 using simulated data we found that the default values, *--min-distance -200* and *--max-distance 300*,  yield reliable results for many sets of parameters, i.e. many true positives, few false positives and many TE insertions where both signatures could be identified.
+\[revers to java -jar popte2.jar pairupSignatures\] During testing the performance of PoPoolationTE2 using simulated data we found that the default values, *--min-distance -200* and *--max-distance 300*,  yield reliable results for many sets of parameters, i.e. many true positives, few false positives and many TE insertions where both signatures could be identified.

-However it is  possible to optimize this parameters by running pairing up mulitple times using different min and max distance. When plotting the distance on the X-achsis vs the number of identified insertion on the Y-achsis, the optimal distance may be found where the number of identified insertions levels off.
+However it is possible to optimize this parameters by running pairing up mulitple times using different min and max distances. When plotting the distance vs the number of identified insertion an optimal distance may be found where the number of identified insertions levels off. In the following example we used  the real data published by Kofler et al. (2012) http://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1002487
+
+In this example we would recommend a *--min-distance -100* and *--max-distance 200*

 [[img src=min-maxdist.png]]

FAQ modified by Robert Kofler

Robert Kofler — Fri, 29 Apr 2016 14:01:29 -0000

FAQ modified by Robert Kofler

Robert Kofler — Fri, 29 Apr 2016 14:01:08 -0000

FAQ modified by Robert Kofler

Robert Kofler — Fri, 29 Apr 2016 13:57:20 -0000

--- v11
+++ v12
@@ -31,4 +31,6 @@

 However it is  possible to optimize this parameters by running pairing up mulitple times using different min and max distance. When plotting the distance on the X-achsis vs the number of identified insertion on the Y-achsis, the optimal distance may be found where the number of identified insertions levels off.

+[[img src=min-maxdist.png]]

+

FAQ modified by Robert Kofler

Robert Kofler — Fri, 29 Apr 2016 13:55:51 -0000

FAQ modified by Robert Kofler

Robert Kofler — Fri, 29 Apr 2016 11:26:22 -0000

--- v9
+++ v10
@@ -26,5 +26,9 @@
 For example, with *D. melanogaster* (genome size = 180Mbp) , a targeted physical coverage of 200, and a insert size of 150, than 240 million paired ends are requied (=200 \* 180/150). 
 The number of reads depends on the service provider, but 240 million reads are usually achieved with one Illumina paired-end lane.

+## During pairing-up signatures, how to find the optimal min and max distance##
+[java -jar popte2.jar pairupSignatures] During testing the performance of PoPoolationTE2 using simulated data we found that the default values, *--min-distance -200* and *--max-distance 300*,  yield reliable results for many sets of parameters, i.e. many true positives, few false positives and many TE insertions where both signatures could be identified.
+
+However it is  possible to optimize this parameters by running pairing up mulitple times using different min and max distance. When plotting the distance on the X-achsis vs the number of identified insertion on the Y-achsis, the optimal distance may be found where the number of identified insertions levels off.