Originally created by: lhmon... (code.google.com)@gmail.com
I have used Popoolation in the past and cited it in a publication, and I love the software. I am sequencing viral genomes and am interested in very low frequency viral variants in the population. As such, we tend to get very high coverage on our genomes (1000-10,000x on a single gene).
What steps will reproduce the problem?
1. When I use the variance-at-position script to calculate pi or D on my samples without enabling the corrections, the script runs beautifully.
2. However, when I run the script and enable the corrections, it will run for 3 days on our server and still not finish. I suspect that this is due to the high coverage of my assemblies. However, I would really like to be able to set the minimum SNP count to 3 to remove potential sequencing error.
I am wondering if there is any way to enable the corrections without having the program run for so long.
What version of the product are you using? On what operating system?
I am using Popoolation version 1.2.2 on Mac OS X version 10.6.8.
View and moderate all "tickets Discussion" comments posted by this user
Mark all as spam, and block user from posting to "Tickets"
Originally posted by: RoKof... (code.google.com)@gmail.com
you are perfectly right, the correction factors take excurciatingly long to calculate when the coverages are >500
. The good news is that once they have been calculated for all differrent coverages, the script will run as fast as before (so it is internally storing the correction factor for every coverage)
Maybee you can subsample to a fixed coverage e.g 500?
cheers ro