From: Walenz, B. <bw...@jc...> - 2012-07-03 20:24:53
|
Good to know about the restart not working. You should be able to run manually without the obtStore by leaving out the -ovs option for it. To find duplicate mate pairs, it needs to save up overlaps until both of the reads in the mate have been seen. The bug in CVS was to not process mate pairs until ALL reads were seen. I've not seen this in CA7 but the same can happen if the mated reads are 'far away' in the input, for example, if all of the 'left' reads are loaded before the 'right' reads. If all else fails, you can skip deduplication. There is little gain in deduplicating Illumina PE and MP libraries -- PE duplicates don't really affect scaffolding, and MP duplicates aren't detectable from overlaps. Hopefully there aren't any 454 mates in this. b On 7/3/12 4:02 PM, "Christoph Hahn" <chr...@gm...> wrote: > Hi Brian, > > Thanks for your reply! > > I am using CA7. I am afraid updating is not really an option at the > moment - I am running it on a cluster and updating CVS might be > complicated because the cluster administrators are always very busy and > it would thus for sure take a while.. > > Therefore, it would be great if you could give me a tip on how to handle > that in CA7 for now. In my latest attempt I used 64 GB RAM and it killed > the node after some 2 hours. I ran the following: > > CA version 7.0 ($Id: deduplicate.C,v 1.15 2011/12/29 09:26:03 > brianwalenz Exp $). > > Error Rates: > AS_OVL_ERROR_RATE 0.060000 > AS_CNS_ERROR_RATE 0.100000 > AS_CGW_ERROR_RATE 0.100000 > AS_MAX_ERROR_RATE 0.250000 > > Current Working Directory: > /projects/nn9201k/Celera/work2/salaris1/0-overlaptrim > > Command: > /xanadu/home/chrishah/programmes/wgs-7.0/Linux-amd64/bin/deduplicate \ > -gkp /projects/nn9201k/Celera/work2/salaris1/salaris.gkpStore \ > -ovs > /projects/nn9201k/Celera/work2/salaris1/0-overlaptrim/salaris.obtStore \ > -ovs > /projects/nn9201k/Celera/work2/salaris1/0-overlaptrim/salaris.dupStore \ > -report > /projects/nn9201k/Celera/work2/salaris1/0-overlaptrim/salaris.deduplicate.log > \ > -summary > /projects/nn9201k/Celera/work2/salaris1/0-overlaptrim/salaris.deduplicate.summ > ary > > Here are the first and last few lines of salaris.deduplicate.log (it has > 384855 lines, *.deduplicate.summary and *.deduplicate.err are empty): > > Delete 28 DUPof 3462651 a 0,76 b 0,76 hang 0,0 diff 0,0 error 0.000000 > Delete 76 DUPof 10667558 a 0,76 b 0,76 hang 0,0 diff 0,0 error 0.000000 > Delete 210 DUPof 8142147 a 0,70 b 0,70 hang 0,0 diff 0,0 error 0.000000 > Delete 216 DUPof 9129559 a 0,76 b 0,76 hang 0,0 diff 0,0 error 0.000000 > Delete 228 DUPof 7781271 a 0,76 b 0,76 hang 0,0 diff 0,0 error 0.013200 > Delete 297 DUPof 11757250 a 0,76 b 0,76 hang 0,0 diff 0,0 error 0.000000 > Delete 319 DUPof 11174680 a 0,73 b 0,73 hang 0,0 diff 0,0 error 0.000000 > . > . > . > Delete 132295695 DUPof 211765973 a 0,76 b 0,76 hang 0,0 diff 0,0 > error 0.000000 > Delete 132296968 DUPof 181491499 a 0,76 b 0,76 hang 0,0 diff 0,0 > error 0.000000 > Delete 132297966 DUPof 159665067 a 0,76 b 0,76 hang 0,0 diff 0,0 > error 0.000000 > Delete 132304543 DUPof 155518568 a 0,76 b 0,76 hang 0,0 diff 0,0 > error 0.000000 > Delete 132307934 DUPof 134266938 a 0,76 b 0,76 hang 0,0 diff 0,0 > error 0.000000 > Delete 132309546 DUPof 179301753 a 0,76 b 0,76 hang 0,0 diff 0,0 > error 0.000000 > Delete 132313400 DUPof 153142824 a 0,76 b 0,76 hang 0,0 diff 0,0 > error 0.000000 > Delete 132319681 DUPof 132368976 a 0,76 b 0,76 hang 0,0 diff 0,0 > error 0.000000 > Delete 132323752 DUPof 165992623 a 0,76 (this is exactly how it stopped..) > > Can I maybe run the deduplicate command manually and only make use of > the overlaps in the dupStore? When I tried to start CA again it > continued with finalTrim, so I removed the *.deduplicate.log, etc. files > before I restarted CA. > It would be great if you could help me out! Thanks!! > > cheers, > Christoph > > > On 07/03/2012 06:44 PM, Walenz, Brian wrote: >> Hi, Christoph- >> >> Are you using CA7 or CVS? >> >> This behavior was introduced to CVS on May 21, and fixed on the 29th. The >> bug was after an optimization in loading overlaps was made - only overlaps >> in the 'dupStore' are needed, the 'obtStore' can be ignored. This >> eliminated a huge amount of I/O and overhead from the dedupe compute. >> >> If updating CVS doesn't fix the problem, can you send some of the logging >> from deduplicate? >> >> b >> >> >> On 7/3/12 6:28 AM, "Christoph Hahn" <chr...@gm...> wrote: >> >>> Dear developers and users, >>> >>> I am encountering some problems in the deduplicate step. Unfortunately, >>> the memory usage is steadily increasing until the process dies because >>> of exceeding memory limit. So far, I used up to 32 GB. I could of course >>> just further increase the available memory, but I was wondering if there >>> is a possibility to fix and/or predict the maximum memory usage for this >>> step (and maybe also for the next steps) beforehand. >>> >>> Thanks for your help! >>> >>> much obliged, >>> Christoph >>> >>> Universtiy of Oslo, Norway >>> >>> ---------------------------------------------------------------------------- >>> -- >>> Live Security Virtual Conference >>> Exclusive live event will cover all the ways today's security and >>> threat landscape has changed and how IT managers can respond. Discussions >>> will include endpoint security, mobile security and the latest in malware >>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >>> _______________________________________________ >>> wgs-assembler-users mailing list >>> wgs...@li... >>> https://lists.sourceforge.net/lists/listinfo/wgs-assembler-users > > |