From: Bob H. <han...@br...> - 2010-04-27 13:25:10
|
On 4/26/10 11:21 AM, Sendu Bala wrote: > When I run picard I normally give the jvm 5GB of memory to play with. Assuming you mean a maximum heap size of 5GB, then I think what you need to do is to request a slightly higher memory limit from lsf. The java process will consume more memory/swap than the maximum heap size, in my experience often by a couple of GB. Telling lsf you will use 7G or 8G will probably let this run to completion. -Bob > I > was just running MarkDuplicates and it said: > > [...] > INFO 2010-04-26 14:48:58 MarkDuplicates Read 3435000000 records. > Tracking 26040 as yet unmatched pairs. 15382 records in RAM. Last > sequence index: 82 > INFO 2010-04-26 14:49:05 MarkDuplicates Read 3435702578 records. > 0 pairs never matched. > INFO 2010-04-26 14:52:32 MarkDuplicates After > buildSortedReadEndLists freeMemory: 5096165648; totalMemory: 5127602176; > maxMemory: 5127602176 > INFO 2010-04-26 14:52:32 MarkDuplicates Will retain up to > 160237568 duplicate indices before spilling to disk. > INFO 2010-04-26 14:52:34 MarkDuplicates Traversing read pair > information and detecting duplicates. > > Then it output nothing until at 2010-04-26 15:20:15 my job got killed > because it had used over 6GB. > > Is there anything I can do to avoid this on my end right now, short of > specifying it might use all the memory on the system? Can it's memory > handling be improved at all? > > > On 23/04/2010 17:55, Alec Wysoker wrote: > >> Hi Keiran, >> >> Yes, we typically use the default of 500,000 and run with 2GB RAM, so >> multiplying both by 5 sounds plausible. Unfortunately I don't have a >> good method for figuring out the right number given a particular JVM >> size. Way back when I picked 500,000 as the default, it seemed >> reasonable for the # of reads we were sorting at the time, and it's >> worked well enough, so we haven't looked very hard at it. The >> fundamental question of the memory footprint of a single SAMRecord >> depends on a number of factors: >> >> * read length >> * tag content. E.g. OQ and E2 tags can be large >> * SAM input generally has larger memory footprint than BAM input, >> but if validation stringency is not silent, then BAM can actually >> be larger. >> * Also, setting variable-length attributes onto a record read from a >> BAM file can expand its memory footprint even if validation >> stringency is silent. >> >> It might be possible to spill to disk when the sorter hit a configurable >> RAM threshold rather than a # of records threshold, but figuring out the >> memory footprint of a Java object is a bit of a challenge and this >> hasn't been a big enough problem for us to feel motivated to change it. >> > > |