From: Christoph B. <Chr...@uk...> - 2011-05-28 15:10:18
|
Hi André, thanks a lot for answer, it did the trick! I don't really know why, but it worked. So great, thanks! Christoph Am 27.05.2011 12:04, schrieb André Götze: > Hi Christoph, > > I had a similar problem with a 170 GB BAM lately. Strangley I had to > realize, that Picard's MarkDuplicates is actually running more stable > with less heap. So I would recommend to use only -Xmx4g, not Xmx120g. At > least that's the value working for me. > > André > > > Am 27.05.2011 11:40, schrieb Christoph Bartenhagen: > >> Hello everyone, >> >> I have a quite large alignment of paired-end reads in BAM format (ca. >> 120GB, almost 2 billion reads of 90 bp length). The file is coordinate >> sorted and has been generated by merging the alignment of 8 single lanes >> with MergeSamFiles. >> When I try to remove duplicates (really removing not just marking them) >> from this huge file with MarkDuplicates, I'm running into serious memory >> problems. Duplicate removal seemed to work (it said >> "net.sf.picard.sam.MarkDuplicates done." after 12 hours). But when it >> comes to sorting java says: >> >> Exception in thread "main" java.lang.OutOfMemoryError: Requested array >> size exceeds VM limit >> at >> net.sf.samtools.util.SortingLongCollection.<init>(SortingLongCollection.java:101) >> at >> net.sf.picard.sam.MarkDuplicates.generateDuplicateIndexes(MarkDuplicates.java:426) >> at net.sf.picard.sam.MarkDuplicates.doWork(MarkDuplicates.java:111) >> at >> net.sf.picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:165) >> at net.sf.picard.sam.MarkDuplicates.main(MarkDuplicates.java:93) >> >> I allowed up to 120GB Heap-Space for Java when I started MarkDuplicates >> (the machine has 128GB of RAM): >> >> java -Xmx120g -jar MarkDuplicates.jar INPUT=[...] OUTPUT=[...] >> METRICS_FILE=[...] REMOVE_DUPLICATES=true ASSUME_SORTED=true >> VALIDATION_STRINGENCY=LENIENT TMP_DIR=[...] >> >> Well, I used the quite old Picard version 1.33. Is this a problem? Did >> the memory requirements change in the newer versions? >> If not, does someone have ideas or workarounds to get this thing running >> (like some Java- Picard options I'm not aware of)? What is the usual >> practice for such large datasets? >> I also tried to reduce the amount of data by first removing the >> duplicates on every single lane, merging the duplicate free alignments >> and then removing the duplicates again on this ca. 25% smaller file. But >> I got the same error. >> Would make me very happy if someone could help me out here. Thanks in >> advance! >> >> Cheers, >> Christoph >> >> ------------------------------------------------------------------------------ >> vRanger cuts backup time in half-while increasing security. >> With the market-leading solution for virtual backup and recovery, >> you get blazing-fast, flexible, and affordable data protection. >> Download your free trial now. >> http://p.sf.net/sfu/quest-d2dcopy1 >> _______________________________________________ >> Samtools-help mailing list >> Sam...@li... >> https://lists.sourceforge.net/lists/listinfo/samtools-help >> > > ------------------------------------------------------------------------------ > vRanger cuts backup time in half-while increasing security. > With the market-leading solution for virtual backup and recovery, > you get blazing-fast, flexible, and affordable data protection. > Download your free trial now. > http://p.sf.net/sfu/quest-d2dcopy1 > _______________________________________________ > Samtools-help mailing list > Sam...@li... > https://lists.sourceforge.net/lists/listinfo/samtools-help > > -- Christoph Bartenhagen Institute of Medical Informatics University of Münster Albert-Schweizer-Campus 1 Building A11 48149 Münster, Germany phone: +49 (0)251/83-58367 mail: Chr...@uk... web: http://imi.uni-muenster.de |