From: Christoph H. <chr...@gm...> - 2012-07-03 20:43:39
|
Dear Brian, Thanks! My last attempt has apparently caused some serious problems on the node it was running on. So, I have to wait for the cluster admins ok before I try again. Will try to run it manually without the obtStore then and keep you posted on the result. The dataset only contains illumina PE and 454 SE. Is there a way to get an idea about the memory requirements beforehand (I have to specify that on the cluster before I start the job and the admin will not be happy if I kill the node again..)? I guess not? Thanks again for your help!!! cheers, Christoph On 07/03/2012 10:24 PM, Walenz, Brian wrote: > Good to know about the restart not working. > > You should be able to run manually without the obtStore by leaving out the > -ovs option for it. > > To find duplicate mate pairs, it needs to save up overlaps until both of the > reads in the mate have been seen. The bug in CVS was to not process mate > pairs until ALL reads were seen. I've not seen this in CA7 but the same can > happen if the mated reads are 'far away' in the input, for example, if all > of the 'left' reads are loaded before the 'right' reads. > > If all else fails, you can skip deduplication. There is little gain in > deduplicating Illumina PE and MP libraries -- PE duplicates don't really > affect scaffolding, and MP duplicates aren't detectable from overlaps. > Hopefully there aren't any 454 mates in this. > > b > > > On 7/3/12 4:02 PM, "Christoph Hahn" <chr...@gm...> wrote: > > > >> Hi Brian, >> >> Thanks for your reply! >> >> I am using CA7. I am afraid updating is not really an option at the >> moment - I am running it on a cluster and updating CVS might be >> complicated because the cluster administrators are always very busy and >> it would thus for sure take a while.. >> >> Therefore, it would be great if you could give me a tip on how to handle >> that in CA7 for now. In my latest attempt I used 64 GB RAM and it killed >> the node after some 2 hours. I ran the following: >> >> CA version 7.0 ($Id: deduplicate.C,v 1.15 2011/12/29 09:26:03 >> brianwalenz Exp $). >> >> Error Rates: >> AS_OVL_ERROR_RATE 0.060000 >> AS_CNS_ERROR_RATE 0.100000 >> AS_CGW_ERROR_RATE 0.100000 >> AS_MAX_ERROR_RATE 0.250000 >> >> Current Working Directory: >> /projects/nn9201k/Celera/work2/salaris1/0-overlaptrim >> >> Command: >> /xanadu/home/chrishah/programmes/wgs-7.0/Linux-amd64/bin/deduplicate \ >> -gkp /projects/nn9201k/Celera/work2/salaris1/salaris.gkpStore \ >> -ovs >> /projects/nn9201k/Celera/work2/salaris1/0-overlaptrim/salaris.obtStore \ >> -ovs >> /projects/nn9201k/Celera/work2/salaris1/0-overlaptrim/salaris.dupStore \ >> -report >> /projects/nn9201k/Celera/work2/salaris1/0-overlaptrim/salaris.deduplicate.log >> \ >> -summary >> /projects/nn9201k/Celera/work2/salaris1/0-overlaptrim/salaris.deduplicate.summ >> ary >> >> Here are the first and last few lines of salaris.deduplicate.log (it has >> 384855 lines, *.deduplicate.summary and *.deduplicate.err are empty): >> >> Delete 28 DUPof 3462651 a 0,76 b 0,76 hang 0,0 diff 0,0 error 0.000000 >> Delete 76 DUPof 10667558 a 0,76 b 0,76 hang 0,0 diff 0,0 error 0.000000 >> Delete 210 DUPof 8142147 a 0,70 b 0,70 hang 0,0 diff 0,0 error 0.000000 >> Delete 216 DUPof 9129559 a 0,76 b 0,76 hang 0,0 diff 0,0 error 0.000000 >> Delete 228 DUPof 7781271 a 0,76 b 0,76 hang 0,0 diff 0,0 error 0.013200 >> Delete 297 DUPof 11757250 a 0,76 b 0,76 hang 0,0 diff 0,0 error 0.000000 >> Delete 319 DUPof 11174680 a 0,73 b 0,73 hang 0,0 diff 0,0 error 0.000000 >> . >> . >> . >> Delete 132295695 DUPof 211765973 a 0,76 b 0,76 hang 0,0 diff 0,0 >> error 0.000000 >> Delete 132296968 DUPof 181491499 a 0,76 b 0,76 hang 0,0 diff 0,0 >> error 0.000000 >> Delete 132297966 DUPof 159665067 a 0,76 b 0,76 hang 0,0 diff 0,0 >> error 0.000000 >> Delete 132304543 DUPof 155518568 a 0,76 b 0,76 hang 0,0 diff 0,0 >> error 0.000000 >> Delete 132307934 DUPof 134266938 a 0,76 b 0,76 hang 0,0 diff 0,0 >> error 0.000000 >> Delete 132309546 DUPof 179301753 a 0,76 b 0,76 hang 0,0 diff 0,0 >> error 0.000000 >> Delete 132313400 DUPof 153142824 a 0,76 b 0,76 hang 0,0 diff 0,0 >> error 0.000000 >> Delete 132319681 DUPof 132368976 a 0,76 b 0,76 hang 0,0 diff 0,0 >> error 0.000000 >> Delete 132323752 DUPof 165992623 a 0,76 (this is exactly how it stopped..) >> >> Can I maybe run the deduplicate command manually and only make use of >> the overlaps in the dupStore? When I tried to start CA again it >> continued with finalTrim, so I removed the *.deduplicate.log, etc. files >> before I restarted CA. >> It would be great if you could help me out! Thanks!! >> >> cheers, >> Christoph >> >> >> On 07/03/2012 06:44 PM, Walenz, Brian wrote: >>> Hi, Christoph- >>> >>> Are you using CA7 or CVS? >>> >>> This behavior was introduced to CVS on May 21, and fixed on the 29th. The >>> bug was after an optimization in loading overlaps was made - only overlaps >>> in the 'dupStore' are needed, the 'obtStore' can be ignored. This >>> eliminated a huge amount of I/O and overhead from the dedupe compute. >>> >>> If updating CVS doesn't fix the problem, can you send some of the logging >>> from deduplicate? >>> >>> b >>> >>> >>> On 7/3/12 6:28 AM, "Christoph Hahn" <chr...@gm...> wrote: >>> >>>> Dear developers and users, >>>> >>>> I am encountering some problems in the deduplicate step. Unfortunately, >>>> the memory usage is steadily increasing until the process dies because >>>> of exceeding memory limit. So far, I used up to 32 GB. I could of course >>>> just further increase the available memory, but I was wondering if there >>>> is a possibility to fix and/or predict the maximum memory usage for this >>>> step (and maybe also for the next steps) beforehand. >>>> >>>> Thanks for your help! >>>> >>>> much obliged, >>>> Christoph >>>> >>>> Universtiy of Oslo, Norway >>>> >>>> ---------------------------------------------------------------------------- >>>> -- >>>> Live Security Virtual Conference >>>> Exclusive live event will cover all the ways today's security and >>>> threat landscape has changed and how IT managers can respond. Discussions >>>> will include endpoint security, mobile security and the latest in malware >>>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >>>> _______________________________________________ >>>> wgs-assembler-users mailing list >>>> wgs...@li... >>>> https://lists.sourceforge.net/lists/listinfo/wgs-assembler-users >> |