From: Alec W. <al...@br...> - 2010-05-17 11:31:30
|
Hi Michael, How did you split your original file? It appears that the BAM that produced the error is defective. A gzip block has a field in it containing the size of the block, and the file apparently has fewer bytes in it than the last gzip block indicates it should. -Alec Michael James Clark wrote: > Quick update: > I decided to try to narrow in on the problem by extracting specific > regions from one of the BAMs that isn't working. When I tried to index, > samtools threw this: > > -bash- samtools index 1102T.LMP.chr3_random.bam > [bam_header_read] EOF marker is absent. > [bam_index_core] truncated file? Continue anyway. (-4) > > When I ran samtools index on one of the ones that worked, I received no > such error. Any ideas? Thanks! > > Michael > > On 5/16/2010 4:42 PM, Michael James Clark wrote: > >> Hi All, >> You may remember a few days ago I had problems with getting >> MarkDuplicates to work on a particular genomic dataset. Nothing I did >> solved the problem, unfortunately, so I decided to try splitting the >> file by chromosome and extracting interchromosomal joins into a seperate >> file and running each independently. >> Running MarkDuplicates on some of these worked without a hitch, but on a >> couple of them I get this error: >> >> /usr/java/latest/bin/java -Xmx6G -jar >> /share/apps/picard-tools-1.19/MarkDuplicates.jar I=1102T.LMP.chrY.bam >> O=1102T.LMP.rmdup.chrY.bam M=1102T.LMP.rmdup.chrY.metrics >> TMP_DIR=tmp.files/ REMOVE_DUPLICATES=TRUE VALIDATION_STRINGENCY=SILENT >> MAX_RECORDS_IN_RAM=8000000 >> [Sun May 16 16:32:46 PDT 2010] net.sf.picard.sam.MarkDuplicates >> INPUT=1102T.LMP.chrY.bam OUTPUT=1102T.LMP.rmdup.chrY.bam >> METRICS_FILE=1102T.LMP.rmdup.chrY.metrics REMOVE_DUPLICATES=true >> TMP_DIR=tmp.files VALIDATION_STRINGENCY=SILENT >> MAX_RECORDS_IN_RAM=8000000 ASSUME_SORTED=false >> MAX_SEQUENCES_FOR_DISK_READ_ENDS_MAP=50000 >> READ_NAME_REGEX=[a-zA-Z0-9]+:[0-9]:([0-9]+):([0-9]+):([0-9]+).* >> OPTICAL_DUPLICATE_PIXEL_DISTANCE=100 VERBOSITY=INFO QUIET=false >> COMPRESSION_LEVEL=5 >> INFO 2010-05-16 16:32:46 MarkDuplicates Start of doWork >> freeMemory: 8658104; totalMemory: 9109504; maxMemory: 5726666752 >> INFO 2010-05-16 16:32:46 MarkDuplicates Reading input file and >> constructing read end information. >> INFO 2010-05-16 16:32:46 MarkDuplicates Will retain up to >> 22724868 data points before spilling to disk. >> INFO 2010-05-16 16:33:07 MarkDuplicates Read 1000000 records. >> Tracking 2526 as yet unmatched pairs. 2526 records in RAM. Last >> sequence index: 23 >> INFO 2010-05-16 16:33:17 MarkDuplicates Read 2000000 records. >> Tracking 6482 as yet unmatched pairs. 6482 records in RAM. Last >> sequence index: 23 >> INFO 2010-05-16 16:33:27 MarkDuplicates Read 3000000 records. >> Tracking 10160 as yet unmatched pairs. 10160 records in RAM. Last >> sequence index: 23 >> INFO 2010-05-16 16:33:36 MarkDuplicates Read 4000000 records. >> Tracking 68510 as yet unmatched pairs. 68510 records in RAM. Last >> sequence index: 23 >> INFO 2010-05-16 16:33:46 MarkDuplicates Read 5000000 records. >> Tracking 75782 as yet unmatched pairs. 75782 records in RAM. Last >> sequence index: 23 >> INFO 2010-05-16 16:33:55 MarkDuplicates Read 6000000 records. >> Tracking 97889 as yet unmatched pairs. 97889 records in RAM. Last >> sequence index: 23 >> INFO 2010-05-16 16:34:03 MarkDuplicates Read 7000000 records. >> Tracking 60997 as yet unmatched pairs. 60997 records in RAM. Last >> sequence index: 23 >> INFO 2010-05-16 16:34:13 MarkDuplicates Read 8000000 records. >> Tracking 61125 as yet unmatched pairs. 61125 records in RAM. Last >> sequence index: 23 >> INFO 2010-05-16 16:34:22 MarkDuplicates Read 9000000 records. >> Tracking 60597 as yet unmatched pairs. 60597 records in RAM. Last >> sequence index: 23 >> INFO 2010-05-16 16:34:33 MarkDuplicates Read 10000000 records. >> Tracking 58306 as yet unmatched pairs. 58306 records in RAM. Last >> sequence index: 23 >> INFO 2010-05-16 16:34:41 MarkDuplicates Read 11000000 records. >> Tracking 55412 as yet unmatched pairs. 55412 records in RAM. Last >> sequence index: 23 >> INFO 2010-05-16 16:34:53 MarkDuplicates Read 12000000 records. >> Tracking 51336 as yet unmatched pairs. 51336 records in RAM. Last >> sequence index: 23 >> INFO 2010-05-16 16:35:02 MarkDuplicates Read 13000000 records. >> Tracking 13051 as yet unmatched pairs. 13051 records in RAM. Last >> sequence index: 23 >> [Sun May 16 16:35:04 PDT 2010] net.sf.picard.sam.MarkDuplicates done. >> Runtime.totalMemory()=2987982848 >> Exception in thread "main" net.sf.samtools.FileTruncatedException: >> Premature end of file >> at >> net.sf.samtools.util.BlockCompressedInputStream.readBlock(BlockCompressedInputStream.java:290) >> at >> net.sf.samtools.util.BlockCompressedInputStream.available(BlockCompressedInputStream.java:100) >> at >> net.sf.samtools.util.BlockCompressedInputStream.read(BlockCompressedInputStream.java:169) >> at java.io.DataInputStream.read(DataInputStream.java:132) >> at >> net.sf.samtools.util.BinaryCodec.readBytesOrFewer(BinaryCodec.java:394) >> at net.sf.samtools.util.BinaryCodec.readBytes(BinaryCodec.java:371) >> at net.sf.samtools.util.BinaryCodec.readBytes(BinaryCodec.java:357) >> at net.sf.samtools.BAMRecordCodec.decode(BAMRecordCodec.java:182) >> at >> net.sf.samtools.BAMFileReader$BAMFileIterator.getNextRecord(BAMFileReader.java:397) >> at >> net.sf.samtools.BAMFileReader$BAMFileIterator.advance(BAMFileReader.java:373) >> at >> net.sf.samtools.BAMFileReader$BAMFileIterator.next(BAMFileReader.java:363) >> at >> net.sf.samtools.BAMFileReader$BAMFileIterator.next(BAMFileReader.java:330) >> at >> net.sf.picard.sam.MarkDuplicates.buildSortedReadEndLists(MarkDuplicates.java:261) >> at net.sf.picard.sam.MarkDuplicates.doWork(MarkDuplicates.java:112) >> at >> net.sf.picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:150) >> at net.sf.picard.sam.MarkDuplicates.main(MarkDuplicates.java:96) >> >> Haven't seen this "Premature end of file" error before. Any ideas how to >> get around it? >> (By the way, I realize that since this file doesn't include >> interchromosomal joins, I can likely use samtools rmdup to do it, but >> I'm afraid I'll get the same error on my interchromosomal joins file and >> in that case I'll need a solution. Thanks!) >> >> Sincerely, >> Michael >> >> ------------------------------------------------------------------------------ >> >> _______________________________________________ >> Samtools-help mailing list >> Sam...@li... >> https://lists.sourceforge.net/lists/listinfo/samtools-help >> >> > > > ------------------------------------------------------------------------------ > > _______________________________________________ > Samtools-help mailing list > Sam...@li... > https://lists.sourceforge.net/lists/listinfo/samtools-help > |