From: Alec W. <al...@br...> - 2011-06-27 17:41:57
|
Hi Jessica, You appear to be trying all the right things. There was a problem with an older version of snappy-java in which it would initially allocated large buffers for reading (and we create lots of them), and would grow them but never shrink them. However, the version of snappy-java you are using should not have that problem. Note that if you are running on a Linux box then you shouldn't need to provide the snappy-java jar on the classpath. Some things you might try: * Make sure you are using Picard 1.48. * If running on Linux, omit putting snappy-java on the classpath as it is included in MarkDuplicates.jar. * If not running on Linux, grab the latest snappy-java from Picard repository: http://picard.svn.sourceforge.net/viewvc/picard/trunk/lib/snappy-java-1.0.3-rc3.jar?revision=878 (this shouldn't matter, but I'm grasping at straws). * Send me the stack trace in case that give me some other idea. -Alec On 6/27/11 12:56 PM, Jessica Maia wrote: > Hi there, > > I'm using Picard 1.48 to remove duplicates using the snappy library. I'm encountering similar memory issues as before when I used an earlier version of Picard: > Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded. > > We routinely use Picard to remove duplicates from our whole genome sequencing samples which have about 30-40x coverage. Our alignments are generated with bwa-0.5.5. Samtools 'rmdup' is able to remove duplicates for these 4 samples in question. The size of the BAM files before and after applying Samtools rmdup have not changed by more than 10% so it seems unlikely that duplication is rampant. > > This is how I'm running Picard: > java -jar -Xmx8g -Dsnappy.loader.verbosity=true -classpath snappy-java-1.0.3-rc3-20110610.011644-1.jar $picard_dir/MarkDuplicates.jar TMP_DIR=$out_dir VALIDATION_STRINGENCY=SILENT INPUT=$bam_file OUTPUT=$combined_rmdup_file METRICS_FILE=$duplicate_metrics REMOVE_DUPLICATES=true MAX_FILE_HANDLES_FOR_READ_ENDS_MAP=5000000 MAX_RECORDS_IN_RAM=5000000 VERBOSITY=WARNING ASSUME_SORTED=true SORTING_COLLECTION_SIZE_RATIO=0.005 > > Log file: > [Fri Jun 24 13:42:01 EDT 2011] net.sf.picard.sam.MarkDuplicates INPUT=bam OUTPUT=rmdup.bam METRICS_FILE=duplicate_metrics REMOVE_DUPLICATES=true ASSUME_SORTED=true MAX_FILE_HANDLES_FOR_READ_ENDS_MAP=5000000 SORTING_COLLECTION_SIZE_RATIO=0.0050 TMP_DIR=picard_148 VERBOSITY=WARNING VALIDATION_STRINGENCY=SILENT MAX_RECORDS_IN_RAM=5000000 MAX_SEQUENCES_FOR_DISK_READ_ENDS_MAP=50000 READ_NAME_REGEX=[a-zA-Z0-9]+:[0-9]:([0-9]+):([0-9]+):([0-9]+).* OPTICAL_DUPLICATE_PIXEL_DISTANCE=100 QUIET=false COMPRESSION_LEVEL=5 CREATE_INDEX=false CREATE_MD5_FILE=false > Snappy stream classes loaded. > [Fri Jun 24 14:05:56 EDT 2011] net.sf.picard.sam.MarkDuplicates done. Elapsed time: 24.55 minutes. > Runtime.totalMemory()=3295150080 > Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded. > > I've tried changing SORTING_COLLECTION_SIZE_RATIO, and the -Xmx parameters: > *-Xmx *MAX_FILE_HANDLES_FOR_READ_ENDS_MAP *MAX_RECORDS_IN_RAM SORTING_COLLECTION_SIZE_RATIO > 12g 5 million 5 million 0.1 > 14g 5 million 5 million 0.1 > 14g 5 million 5 million 0.15 > 14g 5 million 5 million 0.05 > 14g 5 million 5 million 0.01 > 14g 5 million 5 million 0.005 > 8g 5 million 5 million 0.005. > > Picard has failed to remove duplicates in all instances. Are there any other suggestions to solve this issue? > Thanks, > > Jessica > > > |