|
From: Bret H. <ja...@ga...> - 2010-05-17 05:34:43
|
Interesting, I am going to try to split the file using this script: /data/storage-1-03/bret/analysis/mclark/bam-split.sh It probably won't finish until tomorrow sometime, at which point we can see if it resolves the original error. -bret On Sun, May 16, 2010 at 08:26:17PM -0700, Michael James Clark wrote: > Used the script that Kevin sent around to the Nelsonlab-main list on Friday. > I think it does the same thing? > >> Mike, >> >> How are you splitting the bam files? This is the way that I usually do it. >> >> samtools view -h ./input.bam chr3_random | samtools view -bS -> ./output.bam >> >> -bret >> >> On Sun, May 16, 2010 at 08:11:35PM -0700, Michael James Clark wrote: >> >>> Here's a clue, maybe. To narrow in on the region, I was going to index one >>> of the files. When I ran samtools index, I got a couple of errors on one of >>> the files that didn't work: >>> >>> -bash- samtools index 1102T.LMP.chr3_random.bam >>> [bam_header_read] EOF marker is absent. >>> [bam_index_core] truncated file? Continue anyway. (-4) >>> >>> When I ran samtools index on one of the files that worked, I didn't get that error. Not really sure why the file would be "truncated" when others from the same process aren't. >>> >>> >>> >>> On 5/16/2010 7:13 PM, Bret Harry wrote: >>> >>>> Mike, >>>> >>>> I have not seen this error before, how long does it take to reproduce this for 1102T.LMP.rmdup.chrY.bam ? I think I would start by further splitting up the BAM file until I got to the smallest possible file size that caused a crash. It's unclear if Picard is crashing because of what's *in* the bam file, or because of the *size* of the bam file. >>>> >>>> -bret >>>> >>>> On Sun, May 16, 2010 at 04:51:59PM -0700, Michael James Clark wrote: >>>> >>>> >>>>> Hi guys, >>>>> I've been trying to get 1102's tumor to run through Picard >>>>> MarkDuplicates for two weeks now. Resorted to splitting the whole genome >>>>> file up by chromosome and trying to run that. SOME of it has worked, but >>>>> some of them crashed with the following error: >>>>> >>>>> /usr/java/latest/bin/java -Xmx6G -jar >>>>> /share/apps/picard-tools-1.19/MarkDuplicates.jar I=1102T.LMP.chrY.bam >>>>> O=1102T.LMP.rmdup.chrY.bam M=1102T.LMP.rmdup.chrY.metrics >>>>> TMP_DIR=tmp.files/ REMOVE_DUPLICATES=TRUE VALIDATION_STRINGENCY=SILENT >>>>> MAX_RECORDS_IN_RAM=8000000 >>>>> [Sun May 16 16:32:46 PDT 2010] net.sf.picard.sam.MarkDuplicates >>>>> INPUT=1102T.LMP.chrY.bam OUTPUT=1102T.LMP.rmdup.chrY.bam >>>>> METRICS_FILE=1102T.LMP.rmdup.chrY.metrics REMOVE_DUPLICATES=true >>>>> TMP_DIR=tmp.files VALIDATION_STRINGENCY=SILENT >>>>> MAX_RECORDS_IN_RAM=8000000 ASSUME_SORTED=false >>>>> MAX_SEQUENCES_FOR_DISK_READ_ENDS_MAP=50000 >>>>> READ_NAME_REGEX=[a-zA-Z0-9]+:[0-9]:([0-9]+):([0-9]+):([0-9]+).* >>>>> OPTICAL_DUPLICATE_PIXEL_DISTANCE=100 VERBOSITY=INFO QUIET=false >>>>> COMPRESSION_LEVEL=5 >>>>> INFO 2010-05-16 16:32:46 MarkDuplicates Start of doWork >>>>> freeMemory: 8658104; totalMemory: 9109504; maxMemory: 5726666752 >>>>> INFO 2010-05-16 16:32:46 MarkDuplicates Reading input file and >>>>> constructing read end information. >>>>> INFO 2010-05-16 16:32:46 MarkDuplicates Will retain up to >>>>> 22724868 data points before spilling to disk. >>>>> INFO 2010-05-16 16:33:07 MarkDuplicates Read 1000000 records. >>>>> Tracking 2526 as yet unmatched pairs. 2526 records in RAM. Last >>>>> sequence index: 23 >>>>> INFO 2010-05-16 16:33:17 MarkDuplicates Read 2000000 records. >>>>> Tracking 6482 as yet unmatched pairs. 6482 records in RAM. Last >>>>> sequence index: 23 >>>>> INFO 2010-05-16 16:33:27 MarkDuplicates Read 3000000 records. >>>>> Tracking 10160 as yet unmatched pairs. 10160 records in RAM. Last >>>>> sequence index: 23 >>>>> INFO 2010-05-16 16:33:36 MarkDuplicates Read 4000000 records. >>>>> Tracking 68510 as yet unmatched pairs. 68510 records in RAM. Last >>>>> sequence index: 23 >>>>> INFO 2010-05-16 16:33:46 MarkDuplicates Read 5000000 records. >>>>> Tracking 75782 as yet unmatched pairs. 75782 records in RAM. Last >>>>> sequence index: 23 >>>>> INFO 2010-05-16 16:33:55 MarkDuplicates Read 6000000 records. >>>>> Tracking 97889 as yet unmatched pairs. 97889 records in RAM. Last >>>>> sequence index: 23 >>>>> INFO 2010-05-16 16:34:03 MarkDuplicates Read 7000000 records. >>>>> Tracking 60997 as yet unmatched pairs. 60997 records in RAM. Last >>>>> sequence index: 23 >>>>> INFO 2010-05-16 16:34:13 MarkDuplicates Read 8000000 records. >>>>> Tracking 61125 as yet unmatched pairs. 61125 records in RAM. Last >>>>> sequence index: 23 >>>>> INFO 2010-05-16 16:34:22 MarkDuplicates Read 9000000 records. >>>>> Tracking 60597 as yet unmatched pairs. 60597 records in RAM. Last >>>>> sequence index: 23 >>>>> INFO 2010-05-16 16:34:33 MarkDuplicates Read 10000000 records. >>>>> Tracking 58306 as yet unmatched pairs. 58306 records in RAM. Last >>>>> sequence index: 23 >>>>> INFO 2010-05-16 16:34:41 MarkDuplicates Read 11000000 records. >>>>> Tracking 55412 as yet unmatched pairs. 55412 records in RAM. Last >>>>> sequence index: 23 >>>>> INFO 2010-05-16 16:34:53 MarkDuplicates Read 12000000 records. >>>>> Tracking 51336 as yet unmatched pairs. 51336 records in RAM. Last >>>>> sequence index: 23 >>>>> INFO 2010-05-16 16:35:02 MarkDuplicates Read 13000000 records. >>>>> Tracking 13051 as yet unmatched pairs. 13051 records in RAM. Last >>>>> sequence index: 23 >>>>> [Sun May 16 16:35:04 PDT 2010] net.sf.picard.sam.MarkDuplicates done. >>>>> Runtime.totalMemory()=2987982848 >>>>> Exception in thread "main" net.sf.samtools.FileTruncatedException: >>>>> Premature end of file >>>>> at >>>>> net.sf.samtools.util.BlockCompressedInputStream.readBlock(BlockCompressedInputStream.java:290) >>>>> at >>>>> net.sf.samtools.util.BlockCompressedInputStream.available(BlockCompressedInputStream.java:100) >>>>> at >>>>> net.sf.samtools.util.BlockCompressedInputStream.read(BlockCompressedInputStream.java:169) >>>>> at java.io.DataInputStream.read(DataInputStream.java:132) >>>>> at >>>>> net.sf.samtools.util.BinaryCodec.readBytesOrFewer(BinaryCodec.java:394) >>>>> at net.sf.samtools.util.BinaryCodec.readBytes(BinaryCodec.java:371) >>>>> at net.sf.samtools.util.BinaryCodec.readBytes(BinaryCodec.java:357) >>>>> at net.sf.samtools.BAMRecordCodec.decode(BAMRecordCodec.java:182) >>>>> at >>>>> net.sf.samtools.BAMFileReader$BAMFileIterator.getNextRecord(BAMFileReader.java:397) >>>>> at >>>>> net.sf.samtools.BAMFileReader$BAMFileIterator.advance(BAMFileReader.java:373) >>>>> at >>>>> net.sf.samtools.BAMFileReader$BAMFileIterator.next(BAMFileReader.java:363) >>>>> at >>>>> net.sf.samtools.BAMFileReader$BAMFileIterator.next(BAMFileReader.java:330) >>>>> at >>>>> net.sf.picard.sam.MarkDuplicates.buildSortedReadEndLists(MarkDuplicates.java:261) >>>>> at net.sf.picard.sam.MarkDuplicates.doWork(MarkDuplicates.java:112) >>>>> at >>>>> net.sf.picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:150) >>>>> at net.sf.picard.sam.MarkDuplicates.main(MarkDuplicates.java:96) >>>>> >>>>> It crashes immediately upon saying "net.sf.picard.sam.MarkDuplicates >>>>> done." with this Premature end of file error. Anyone have any ideas how >>>>> to get around this and get it to work? >>>>> >>>>> Thanks, >>>>> Mike >>>>> >>>>> ------------------------------------------------------------------------------ >>>>> >>>>> _______________________________________________ >>>>> Nelsonlab-analysis mailing list >>>>> Nel...@li... >>>>> https://lists.sourceforge.net/lists/listinfo/nelsonlab-analysis >>>>> >>>>> >>> > |