Hi!
I have a 800M genome to assemble, 13x coverage in PacBio reads and around 180x coverage in Illumina (PE, single and MP). I have a cluster with 32 nodes and 512 Gb of RAM. and I was wondering what would be the best parametrization for me to run PBcR using the Illumina reads to correct? It seems to me that the meryl is presenting failure due to memory problems too.. I set it up to use the total memory I have available, like that:
ovlMemory = 512
ovlStoreMemory= 512000
merylMemory = 512000
merylThreads = 32
Don't know if this it correct, though.
Thanks a lot for the help!
I'm attaching here my meryl.err file, it really seems like a memory error, right? Can you help my with each parameter to adapt in order to get PBcR to complete? My especifications are as follow:
ovlMemory = 512000
ovlStoreMemory= 512000
merylMemory = 512000
merylThreads = 32
coverageCutoff = 60
-genomeSize=800000000
That doesn't look like a memory error, but I'm not sure why it failed. It's just trying to read a sequence from the disk data store. Dropping memory limits by 10% won't hurt performance, and will be nice to the machine.
Before you get too far into this process, stop. I can't recommend using this algorithm for correction with Illumina data. That aspect hasn't been maintained for several years, and has trouble with large complex genomes. Look into ECtools or proovread instead. ECtools assembles the illumina reads and uses that for correction. Proovread sounds similar, but I haven't looked into it.
Once you get corrected reads I'd suggest assembling with CA's replacement, canu (https://github.com/marbl/canu).
Hi Brian! Thanks a lot for your answer. Ok, I understand your point. I've
been, in fact, correcting the PacBio data with prooveread. But I'm also
running PBcR for self-correction, although I only have 13 times genome
coverage in PacBio data. I would like your advice on something else, if I
may: so this is a 21Gb data, around 1,5 million subreads with 6Kb as medium
size. I have PBcR running in a 24 cores, 72Gb RAM machine now, and its been
running for 7 days.
1-) Could you send me any information about the temporary files it
creates?2-) I know its hard, but do you have any estimates in how long its
going to take to run? I would just like to estimate how long its going to
take: its running in a shared cluster and I gave to it a total of 25 days
to run: I don't want it to get to the end and don't finish in time!
Right now its running this "runPartition.sh 150" and creating these
.tmp.m5, tmp.cns.fasta, *.tmp.aln.fasta files!
Thank you so much for your help!!
2015-12-07 18:56 GMT+01:00 Brian Walenz brianwalenz@users.sf.net:
--
Msc Marcela Uliano da Silva
PhD Student at Universidade Federal do Rio de Janeiro - Brazil
Visiting researcher at Berlin Center for Genomics in Biodiversity Research
(BeGenDiv)
Botanischer Garten und Botanisches Museum Berlin-Dahlem
Berlin - Germany
CV: http://buscatextual.cnpq.br/buscatextual/visualizacv.do?id=K4261864A3
website: http://improvisocientifico.blogspot.com.br/
http://improvisocientifico.blogspot.com.br/
Related
Bugs: #335
Hi Brian! Thanks a lot for your answer. Ok, I understand your point. I've been, in fact, correcting the PacBio data with prooveread. But I'm also running PBcR for self-correction, although I only have 13 times genome coverage in PacBio data. I would like your advice on something else, if I may: so this is a 21Gb data, around 1,5 million subreads with 6Kb as medium size. I have PBcR running in a 24 cores, 72Gb RAM machine now, and its been running for 7 days.
1-) Could you send me any information about the temporary files it creates?
2-) I know its hard, but do you have any estimates in how long its going to take to run?
I would just like to estimate how long its going to take: I have it running in a shared cluster and I gave to it a total of 25 days to run: I don't want it to get to the end and don't finish in time!
Right now its running this "runPartition.sh 150" and creating these .tmp.m5, tmp.cns.fasta, *.tmp.aln.fasta files!
Thank you so much for your help!!
Hey Brian, my job finished! Thanks for your help!
So, as an estimation: 1 million PacBio reads (6Kb as medium size) took 7 days to run PBcR self-correction in a 24 core and 72Gb RAM cluster.
Thank you!