From: Michael J. C. <mj...@uc...> - 2010-05-16 23:43:19
|
Hi All, You may remember a few days ago I had problems with getting MarkDuplicates to work on a particular genomic dataset. Nothing I did solved the problem, unfortunately, so I decided to try splitting the file by chromosome and extracting interchromosomal joins into a seperate file and running each independently. Running MarkDuplicates on some of these worked without a hitch, but on a couple of them I get this error: /usr/java/latest/bin/java -Xmx6G -jar /share/apps/picard-tools-1.19/MarkDuplicates.jar I=1102T.LMP.chrY.bam O=1102T.LMP.rmdup.chrY.bam M=1102T.LMP.rmdup.chrY.metrics TMP_DIR=tmp.files/ REMOVE_DUPLICATES=TRUE VALIDATION_STRINGENCY=SILENT MAX_RECORDS_IN_RAM=8000000 [Sun May 16 16:32:46 PDT 2010] net.sf.picard.sam.MarkDuplicates INPUT=1102T.LMP.chrY.bam OUTPUT=1102T.LMP.rmdup.chrY.bam METRICS_FILE=1102T.LMP.rmdup.chrY.metrics REMOVE_DUPLICATES=true TMP_DIR=tmp.files VALIDATION_STRINGENCY=SILENT MAX_RECORDS_IN_RAM=8000000 ASSUME_SORTED=false MAX_SEQUENCES_FOR_DISK_READ_ENDS_MAP=50000 READ_NAME_REGEX=[a-zA-Z0-9]+:[0-9]:([0-9]+):([0-9]+):([0-9]+).* OPTICAL_DUPLICATE_PIXEL_DISTANCE=100 VERBOSITY=INFO QUIET=false COMPRESSION_LEVEL=5 INFO 2010-05-16 16:32:46 MarkDuplicates Start of doWork freeMemory: 8658104; totalMemory: 9109504; maxMemory: 5726666752 INFO 2010-05-16 16:32:46 MarkDuplicates Reading input file and constructing read end information. INFO 2010-05-16 16:32:46 MarkDuplicates Will retain up to 22724868 data points before spilling to disk. INFO 2010-05-16 16:33:07 MarkDuplicates Read 1000000 records. Tracking 2526 as yet unmatched pairs. 2526 records in RAM. Last sequence index: 23 INFO 2010-05-16 16:33:17 MarkDuplicates Read 2000000 records. Tracking 6482 as yet unmatched pairs. 6482 records in RAM. Last sequence index: 23 INFO 2010-05-16 16:33:27 MarkDuplicates Read 3000000 records. Tracking 10160 as yet unmatched pairs. 10160 records in RAM. Last sequence index: 23 INFO 2010-05-16 16:33:36 MarkDuplicates Read 4000000 records. Tracking 68510 as yet unmatched pairs. 68510 records in RAM. Last sequence index: 23 INFO 2010-05-16 16:33:46 MarkDuplicates Read 5000000 records. Tracking 75782 as yet unmatched pairs. 75782 records in RAM. Last sequence index: 23 INFO 2010-05-16 16:33:55 MarkDuplicates Read 6000000 records. Tracking 97889 as yet unmatched pairs. 97889 records in RAM. Last sequence index: 23 INFO 2010-05-16 16:34:03 MarkDuplicates Read 7000000 records. Tracking 60997 as yet unmatched pairs. 60997 records in RAM. Last sequence index: 23 INFO 2010-05-16 16:34:13 MarkDuplicates Read 8000000 records. Tracking 61125 as yet unmatched pairs. 61125 records in RAM. Last sequence index: 23 INFO 2010-05-16 16:34:22 MarkDuplicates Read 9000000 records. Tracking 60597 as yet unmatched pairs. 60597 records in RAM. Last sequence index: 23 INFO 2010-05-16 16:34:33 MarkDuplicates Read 10000000 records. Tracking 58306 as yet unmatched pairs. 58306 records in RAM. Last sequence index: 23 INFO 2010-05-16 16:34:41 MarkDuplicates Read 11000000 records. Tracking 55412 as yet unmatched pairs. 55412 records in RAM. Last sequence index: 23 INFO 2010-05-16 16:34:53 MarkDuplicates Read 12000000 records. Tracking 51336 as yet unmatched pairs. 51336 records in RAM. Last sequence index: 23 INFO 2010-05-16 16:35:02 MarkDuplicates Read 13000000 records. Tracking 13051 as yet unmatched pairs. 13051 records in RAM. Last sequence index: 23 [Sun May 16 16:35:04 PDT 2010] net.sf.picard.sam.MarkDuplicates done. Runtime.totalMemory()=2987982848 Exception in thread "main" net.sf.samtools.FileTruncatedException: Premature end of file at net.sf.samtools.util.BlockCompressedInputStream.readBlock(BlockCompressedInputStream.java:290) at net.sf.samtools.util.BlockCompressedInputStream.available(BlockCompressedInputStream.java:100) at net.sf.samtools.util.BlockCompressedInputStream.read(BlockCompressedInputStream.java:169) at java.io.DataInputStream.read(DataInputStream.java:132) at net.sf.samtools.util.BinaryCodec.readBytesOrFewer(BinaryCodec.java:394) at net.sf.samtools.util.BinaryCodec.readBytes(BinaryCodec.java:371) at net.sf.samtools.util.BinaryCodec.readBytes(BinaryCodec.java:357) at net.sf.samtools.BAMRecordCodec.decode(BAMRecordCodec.java:182) at net.sf.samtools.BAMFileReader$BAMFileIterator.getNextRecord(BAMFileReader.java:397) at net.sf.samtools.BAMFileReader$BAMFileIterator.advance(BAMFileReader.java:373) at net.sf.samtools.BAMFileReader$BAMFileIterator.next(BAMFileReader.java:363) at net.sf.samtools.BAMFileReader$BAMFileIterator.next(BAMFileReader.java:330) at net.sf.picard.sam.MarkDuplicates.buildSortedReadEndLists(MarkDuplicates.java:261) at net.sf.picard.sam.MarkDuplicates.doWork(MarkDuplicates.java:112) at net.sf.picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:150) at net.sf.picard.sam.MarkDuplicates.main(MarkDuplicates.java:96) Haven't seen this "Premature end of file" error before. Any ideas how to get around it? (By the way, I realize that since this file doesn't include interchromosomal joins, I can likely use samtools rmdup to do it, but I'm afraid I'll get the same error on my interchromosomal joins file and in that case I'll need a solution. Thanks!) Sincerely, Michael |