You can subscribe to this list here.
| 2006 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
(1) |
Dec
|
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2007 |
Jan
(6) |
Feb
|
Mar
(1) |
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
(47) |
Dec
(24) |
| 2008 |
Jan
(4) |
Feb
(20) |
Mar
(16) |
Apr
(2) |
May
(2) |
Jun
(2) |
Jul
(7) |
Aug
(1) |
Sep
|
Oct
|
Nov
|
Dec
|
| 2009 |
Jan
|
Feb
|
Mar
(14) |
Apr
(5) |
May
(1) |
Jun
(2) |
Jul
(7) |
Aug
(15) |
Sep
|
Oct
(3) |
Nov
|
Dec
|
| 2010 |
Jan
(6) |
Feb
(14) |
Mar
(1) |
Apr
(1) |
May
(7) |
Jun
(1) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
| 2011 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(2) |
Nov
|
Dec
|
|
From: Ascia E. <as...@uc...> - 2011-10-26 00:32:22
|
Hi, Just a reminder that lab meeting is tomorrow at 2. And if you haven't done so already, let me know if you have a conflict with Tuesday or Friday afternoon. Thanks! Ascia Quoting Ascia Eskin <as...@uc...>: > Hi all, > Next week, lab meeting will be at 2 pm on Wednesday, Oct 26th. Valerie > will be presenting. > The following weeks, we will need to rearrange the schedule. Please > let me know what conflicts you have with Tuesday at 3 pm or Friday > after 11am. We will also discuss this at the meeting on Wednesday. > Next on the schedule is: > Week of Nov 2nd: Natalia > Week of Nov 9th: Erika > Week of Nov 16th: Berit > > Thanks! > Ascia > > > ------------------------------------------------------------------------------ > The demand for IT networking professionals continues to grow, and the > demand for specialized networking skills is growing even more rapidly. > Take a complimentary Learning@Cisco Self-Assessment and learn > about Cisco certifications, training, and career opportunities. > http://p.sf.net/sfu/cisco-dev2dev > _______________________________________________ > Nelsonlab-analysis mailing list > Nel...@li... > https://lists.sourceforge.net/lists/listinfo/nelsonlab-analysis > |
|
From: Ascia E. <as...@uc...> - 2011-10-21 23:19:28
|
Hi all, Next week, lab meeting will be at 2 pm on Wednesday, Oct 26th. Valerie will be presenting. The following weeks, we will need to rearrange the schedule. Please let me know what conflicts you have with Tuesday at 3 pm or Friday after 11am. We will also discuss this at the meeting on Wednesday. Next on the schedule is: Week of Nov 2nd: Natalia Week of Nov 9th: Erika Week of Nov 16th: Berit Thanks! Ascia |
|
From: Michael J. C. <mic...@gm...> - 2010-06-03 07:50:38
|
I'm going to be using a lot of space on /scratch0 hopefully only until tomorrow. I need to do some bam file manipulations that will temporarily take a lot of space. Mike |
|
From: Michael J. C. <mic...@gm...> - 2010-05-18 10:17:21
|
Hi all, I performed copy number analysis on the LMP sequence data. My approach: 1) Calculated mean coverage for 10kb intervals across the entire genome for both tumor and germline. 2) Normalized coverage by dividing each 10kb mean coverage interval by whole genome mean coverage. 3) Calculated log2R-ratio (normalized-tumor-coverage/normalized-germline-coverage). 4) Used DNAcopy to generate smoothed copy number segments (the black lines on each chart). The results: http://genomics.ctrl.ucla.edu/~mclark/HEI_NF2/HEI.coverage-derived.copy.numb er.png Chromosome 22 loss is evident, which Benedicte saw with MLPA as well. Smoothed log2R-ratio of chr22 is at -0.3. Across the non-centromeric length of all other chromosomes, smoothed log2R-ratio is at or below +/-0.1, suggesting no other major losses or gains in the genome. There are some smaller events. I¹m going to run this again using 1kb intervals in the interest of better identifying those small events. Mike |
|
From: Bret H. <ja...@ga...> - 2010-05-17 05:34:43
|
Interesting, I am going to try to split the file using this script: /data/storage-1-03/bret/analysis/mclark/bam-split.sh It probably won't finish until tomorrow sometime, at which point we can see if it resolves the original error. -bret On Sun, May 16, 2010 at 08:26:17PM -0700, Michael James Clark wrote: > Used the script that Kevin sent around to the Nelsonlab-main list on Friday. > I think it does the same thing? > >> Mike, >> >> How are you splitting the bam files? This is the way that I usually do it. >> >> samtools view -h ./input.bam chr3_random | samtools view -bS -> ./output.bam >> >> -bret >> >> On Sun, May 16, 2010 at 08:11:35PM -0700, Michael James Clark wrote: >> >>> Here's a clue, maybe. To narrow in on the region, I was going to index one >>> of the files. When I ran samtools index, I got a couple of errors on one of >>> the files that didn't work: >>> >>> -bash- samtools index 1102T.LMP.chr3_random.bam >>> [bam_header_read] EOF marker is absent. >>> [bam_index_core] truncated file? Continue anyway. (-4) >>> >>> When I ran samtools index on one of the files that worked, I didn't get that error. Not really sure why the file would be "truncated" when others from the same process aren't. >>> >>> >>> >>> On 5/16/2010 7:13 PM, Bret Harry wrote: >>> >>>> Mike, >>>> >>>> I have not seen this error before, how long does it take to reproduce this for 1102T.LMP.rmdup.chrY.bam ? I think I would start by further splitting up the BAM file until I got to the smallest possible file size that caused a crash. It's unclear if Picard is crashing because of what's *in* the bam file, or because of the *size* of the bam file. >>>> >>>> -bret >>>> >>>> On Sun, May 16, 2010 at 04:51:59PM -0700, Michael James Clark wrote: >>>> >>>> >>>>> Hi guys, >>>>> I've been trying to get 1102's tumor to run through Picard >>>>> MarkDuplicates for two weeks now. Resorted to splitting the whole genome >>>>> file up by chromosome and trying to run that. SOME of it has worked, but >>>>> some of them crashed with the following error: >>>>> >>>>> /usr/java/latest/bin/java -Xmx6G -jar >>>>> /share/apps/picard-tools-1.19/MarkDuplicates.jar I=1102T.LMP.chrY.bam >>>>> O=1102T.LMP.rmdup.chrY.bam M=1102T.LMP.rmdup.chrY.metrics >>>>> TMP_DIR=tmp.files/ REMOVE_DUPLICATES=TRUE VALIDATION_STRINGENCY=SILENT >>>>> MAX_RECORDS_IN_RAM=8000000 >>>>> [Sun May 16 16:32:46 PDT 2010] net.sf.picard.sam.MarkDuplicates >>>>> INPUT=1102T.LMP.chrY.bam OUTPUT=1102T.LMP.rmdup.chrY.bam >>>>> METRICS_FILE=1102T.LMP.rmdup.chrY.metrics REMOVE_DUPLICATES=true >>>>> TMP_DIR=tmp.files VALIDATION_STRINGENCY=SILENT >>>>> MAX_RECORDS_IN_RAM=8000000 ASSUME_SORTED=false >>>>> MAX_SEQUENCES_FOR_DISK_READ_ENDS_MAP=50000 >>>>> READ_NAME_REGEX=[a-zA-Z0-9]+:[0-9]:([0-9]+):([0-9]+):([0-9]+).* >>>>> OPTICAL_DUPLICATE_PIXEL_DISTANCE=100 VERBOSITY=INFO QUIET=false >>>>> COMPRESSION_LEVEL=5 >>>>> INFO 2010-05-16 16:32:46 MarkDuplicates Start of doWork >>>>> freeMemory: 8658104; totalMemory: 9109504; maxMemory: 5726666752 >>>>> INFO 2010-05-16 16:32:46 MarkDuplicates Reading input file and >>>>> constructing read end information. >>>>> INFO 2010-05-16 16:32:46 MarkDuplicates Will retain up to >>>>> 22724868 data points before spilling to disk. >>>>> INFO 2010-05-16 16:33:07 MarkDuplicates Read 1000000 records. >>>>> Tracking 2526 as yet unmatched pairs. 2526 records in RAM. Last >>>>> sequence index: 23 >>>>> INFO 2010-05-16 16:33:17 MarkDuplicates Read 2000000 records. >>>>> Tracking 6482 as yet unmatched pairs. 6482 records in RAM. Last >>>>> sequence index: 23 >>>>> INFO 2010-05-16 16:33:27 MarkDuplicates Read 3000000 records. >>>>> Tracking 10160 as yet unmatched pairs. 10160 records in RAM. Last >>>>> sequence index: 23 >>>>> INFO 2010-05-16 16:33:36 MarkDuplicates Read 4000000 records. >>>>> Tracking 68510 as yet unmatched pairs. 68510 records in RAM. Last >>>>> sequence index: 23 >>>>> INFO 2010-05-16 16:33:46 MarkDuplicates Read 5000000 records. >>>>> Tracking 75782 as yet unmatched pairs. 75782 records in RAM. Last >>>>> sequence index: 23 >>>>> INFO 2010-05-16 16:33:55 MarkDuplicates Read 6000000 records. >>>>> Tracking 97889 as yet unmatched pairs. 97889 records in RAM. Last >>>>> sequence index: 23 >>>>> INFO 2010-05-16 16:34:03 MarkDuplicates Read 7000000 records. >>>>> Tracking 60997 as yet unmatched pairs. 60997 records in RAM. Last >>>>> sequence index: 23 >>>>> INFO 2010-05-16 16:34:13 MarkDuplicates Read 8000000 records. >>>>> Tracking 61125 as yet unmatched pairs. 61125 records in RAM. Last >>>>> sequence index: 23 >>>>> INFO 2010-05-16 16:34:22 MarkDuplicates Read 9000000 records. >>>>> Tracking 60597 as yet unmatched pairs. 60597 records in RAM. Last >>>>> sequence index: 23 >>>>> INFO 2010-05-16 16:34:33 MarkDuplicates Read 10000000 records. >>>>> Tracking 58306 as yet unmatched pairs. 58306 records in RAM. Last >>>>> sequence index: 23 >>>>> INFO 2010-05-16 16:34:41 MarkDuplicates Read 11000000 records. >>>>> Tracking 55412 as yet unmatched pairs. 55412 records in RAM. Last >>>>> sequence index: 23 >>>>> INFO 2010-05-16 16:34:53 MarkDuplicates Read 12000000 records. >>>>> Tracking 51336 as yet unmatched pairs. 51336 records in RAM. Last >>>>> sequence index: 23 >>>>> INFO 2010-05-16 16:35:02 MarkDuplicates Read 13000000 records. >>>>> Tracking 13051 as yet unmatched pairs. 13051 records in RAM. Last >>>>> sequence index: 23 >>>>> [Sun May 16 16:35:04 PDT 2010] net.sf.picard.sam.MarkDuplicates done. >>>>> Runtime.totalMemory()=2987982848 >>>>> Exception in thread "main" net.sf.samtools.FileTruncatedException: >>>>> Premature end of file >>>>> at >>>>> net.sf.samtools.util.BlockCompressedInputStream.readBlock(BlockCompressedInputStream.java:290) >>>>> at >>>>> net.sf.samtools.util.BlockCompressedInputStream.available(BlockCompressedInputStream.java:100) >>>>> at >>>>> net.sf.samtools.util.BlockCompressedInputStream.read(BlockCompressedInputStream.java:169) >>>>> at java.io.DataInputStream.read(DataInputStream.java:132) >>>>> at >>>>> net.sf.samtools.util.BinaryCodec.readBytesOrFewer(BinaryCodec.java:394) >>>>> at net.sf.samtools.util.BinaryCodec.readBytes(BinaryCodec.java:371) >>>>> at net.sf.samtools.util.BinaryCodec.readBytes(BinaryCodec.java:357) >>>>> at net.sf.samtools.BAMRecordCodec.decode(BAMRecordCodec.java:182) >>>>> at >>>>> net.sf.samtools.BAMFileReader$BAMFileIterator.getNextRecord(BAMFileReader.java:397) >>>>> at >>>>> net.sf.samtools.BAMFileReader$BAMFileIterator.advance(BAMFileReader.java:373) >>>>> at >>>>> net.sf.samtools.BAMFileReader$BAMFileIterator.next(BAMFileReader.java:363) >>>>> at >>>>> net.sf.samtools.BAMFileReader$BAMFileIterator.next(BAMFileReader.java:330) >>>>> at >>>>> net.sf.picard.sam.MarkDuplicates.buildSortedReadEndLists(MarkDuplicates.java:261) >>>>> at net.sf.picard.sam.MarkDuplicates.doWork(MarkDuplicates.java:112) >>>>> at >>>>> net.sf.picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:150) >>>>> at net.sf.picard.sam.MarkDuplicates.main(MarkDuplicates.java:96) >>>>> >>>>> It crashes immediately upon saying "net.sf.picard.sam.MarkDuplicates >>>>> done." with this Premature end of file error. Anyone have any ideas how >>>>> to get around this and get it to work? >>>>> >>>>> Thanks, >>>>> Mike >>>>> >>>>> ------------------------------------------------------------------------------ >>>>> >>>>> _______________________________________________ >>>>> Nelsonlab-analysis mailing list >>>>> Nel...@li... >>>>> https://lists.sourceforge.net/lists/listinfo/nelsonlab-analysis >>>>> >>>>> >>> > |
|
From: Michael J. C. <mic...@gm...> - 2010-05-17 03:26:25
|
Used the script that Kevin sent around to the Nelsonlab-main list on Friday. I think it does the same thing? > Mike, > > How are you splitting the bam files? This is the way that I usually do it. > > samtools view -h ./input.bam chr3_random | samtools view -bS -> ./output.bam > > -bret > > On Sun, May 16, 2010 at 08:11:35PM -0700, Michael James Clark wrote: > >> Here's a clue, maybe. To narrow in on the region, I was going to index one >> of the files. When I ran samtools index, I got a couple of errors on one of >> the files that didn't work: >> >> -bash- samtools index 1102T.LMP.chr3_random.bam >> [bam_header_read] EOF marker is absent. >> [bam_index_core] truncated file? Continue anyway. (-4) >> >> When I ran samtools index on one of the files that worked, I didn't get that error. Not really sure why the file would be "truncated" when others from the same process aren't. >> >> >> >> On 5/16/2010 7:13 PM, Bret Harry wrote: >> >>> Mike, >>> >>> I have not seen this error before, how long does it take to reproduce this for 1102T.LMP.rmdup.chrY.bam ? I think I would start by further splitting up the BAM file until I got to the smallest possible file size that caused a crash. It's unclear if Picard is crashing because of what's *in* the bam file, or because of the *size* of the bam file. >>> >>> -bret >>> >>> On Sun, May 16, 2010 at 04:51:59PM -0700, Michael James Clark wrote: >>> >>> >>>> Hi guys, >>>> I've been trying to get 1102's tumor to run through Picard >>>> MarkDuplicates for two weeks now. Resorted to splitting the whole genome >>>> file up by chromosome and trying to run that. SOME of it has worked, but >>>> some of them crashed with the following error: >>>> >>>> /usr/java/latest/bin/java -Xmx6G -jar >>>> /share/apps/picard-tools-1.19/MarkDuplicates.jar I=1102T.LMP.chrY.bam >>>> O=1102T.LMP.rmdup.chrY.bam M=1102T.LMP.rmdup.chrY.metrics >>>> TMP_DIR=tmp.files/ REMOVE_DUPLICATES=TRUE VALIDATION_STRINGENCY=SILENT >>>> MAX_RECORDS_IN_RAM=8000000 >>>> [Sun May 16 16:32:46 PDT 2010] net.sf.picard.sam.MarkDuplicates >>>> INPUT=1102T.LMP.chrY.bam OUTPUT=1102T.LMP.rmdup.chrY.bam >>>> METRICS_FILE=1102T.LMP.rmdup.chrY.metrics REMOVE_DUPLICATES=true >>>> TMP_DIR=tmp.files VALIDATION_STRINGENCY=SILENT >>>> MAX_RECORDS_IN_RAM=8000000 ASSUME_SORTED=false >>>> MAX_SEQUENCES_FOR_DISK_READ_ENDS_MAP=50000 >>>> READ_NAME_REGEX=[a-zA-Z0-9]+:[0-9]:([0-9]+):([0-9]+):([0-9]+).* >>>> OPTICAL_DUPLICATE_PIXEL_DISTANCE=100 VERBOSITY=INFO QUIET=false >>>> COMPRESSION_LEVEL=5 >>>> INFO 2010-05-16 16:32:46 MarkDuplicates Start of doWork >>>> freeMemory: 8658104; totalMemory: 9109504; maxMemory: 5726666752 >>>> INFO 2010-05-16 16:32:46 MarkDuplicates Reading input file and >>>> constructing read end information. >>>> INFO 2010-05-16 16:32:46 MarkDuplicates Will retain up to >>>> 22724868 data points before spilling to disk. >>>> INFO 2010-05-16 16:33:07 MarkDuplicates Read 1000000 records. >>>> Tracking 2526 as yet unmatched pairs. 2526 records in RAM. Last >>>> sequence index: 23 >>>> INFO 2010-05-16 16:33:17 MarkDuplicates Read 2000000 records. >>>> Tracking 6482 as yet unmatched pairs. 6482 records in RAM. Last >>>> sequence index: 23 >>>> INFO 2010-05-16 16:33:27 MarkDuplicates Read 3000000 records. >>>> Tracking 10160 as yet unmatched pairs. 10160 records in RAM. Last >>>> sequence index: 23 >>>> INFO 2010-05-16 16:33:36 MarkDuplicates Read 4000000 records. >>>> Tracking 68510 as yet unmatched pairs. 68510 records in RAM. Last >>>> sequence index: 23 >>>> INFO 2010-05-16 16:33:46 MarkDuplicates Read 5000000 records. >>>> Tracking 75782 as yet unmatched pairs. 75782 records in RAM. Last >>>> sequence index: 23 >>>> INFO 2010-05-16 16:33:55 MarkDuplicates Read 6000000 records. >>>> Tracking 97889 as yet unmatched pairs. 97889 records in RAM. Last >>>> sequence index: 23 >>>> INFO 2010-05-16 16:34:03 MarkDuplicates Read 7000000 records. >>>> Tracking 60997 as yet unmatched pairs. 60997 records in RAM. Last >>>> sequence index: 23 >>>> INFO 2010-05-16 16:34:13 MarkDuplicates Read 8000000 records. >>>> Tracking 61125 as yet unmatched pairs. 61125 records in RAM. Last >>>> sequence index: 23 >>>> INFO 2010-05-16 16:34:22 MarkDuplicates Read 9000000 records. >>>> Tracking 60597 as yet unmatched pairs. 60597 records in RAM. Last >>>> sequence index: 23 >>>> INFO 2010-05-16 16:34:33 MarkDuplicates Read 10000000 records. >>>> Tracking 58306 as yet unmatched pairs. 58306 records in RAM. Last >>>> sequence index: 23 >>>> INFO 2010-05-16 16:34:41 MarkDuplicates Read 11000000 records. >>>> Tracking 55412 as yet unmatched pairs. 55412 records in RAM. Last >>>> sequence index: 23 >>>> INFO 2010-05-16 16:34:53 MarkDuplicates Read 12000000 records. >>>> Tracking 51336 as yet unmatched pairs. 51336 records in RAM. Last >>>> sequence index: 23 >>>> INFO 2010-05-16 16:35:02 MarkDuplicates Read 13000000 records. >>>> Tracking 13051 as yet unmatched pairs. 13051 records in RAM. Last >>>> sequence index: 23 >>>> [Sun May 16 16:35:04 PDT 2010] net.sf.picard.sam.MarkDuplicates done. >>>> Runtime.totalMemory()=2987982848 >>>> Exception in thread "main" net.sf.samtools.FileTruncatedException: >>>> Premature end of file >>>> at >>>> net.sf.samtools.util.BlockCompressedInputStream.readBlock(BlockCompressedInputStream.java:290) >>>> at >>>> net.sf.samtools.util.BlockCompressedInputStream.available(BlockCompressedInputStream.java:100) >>>> at >>>> net.sf.samtools.util.BlockCompressedInputStream.read(BlockCompressedInputStream.java:169) >>>> at java.io.DataInputStream.read(DataInputStream.java:132) >>>> at >>>> net.sf.samtools.util.BinaryCodec.readBytesOrFewer(BinaryCodec.java:394) >>>> at net.sf.samtools.util.BinaryCodec.readBytes(BinaryCodec.java:371) >>>> at net.sf.samtools.util.BinaryCodec.readBytes(BinaryCodec.java:357) >>>> at net.sf.samtools.BAMRecordCodec.decode(BAMRecordCodec.java:182) >>>> at >>>> net.sf.samtools.BAMFileReader$BAMFileIterator.getNextRecord(BAMFileReader.java:397) >>>> at >>>> net.sf.samtools.BAMFileReader$BAMFileIterator.advance(BAMFileReader.java:373) >>>> at >>>> net.sf.samtools.BAMFileReader$BAMFileIterator.next(BAMFileReader.java:363) >>>> at >>>> net.sf.samtools.BAMFileReader$BAMFileIterator.next(BAMFileReader.java:330) >>>> at >>>> net.sf.picard.sam.MarkDuplicates.buildSortedReadEndLists(MarkDuplicates.java:261) >>>> at net.sf.picard.sam.MarkDuplicates.doWork(MarkDuplicates.java:112) >>>> at >>>> net.sf.picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:150) >>>> at net.sf.picard.sam.MarkDuplicates.main(MarkDuplicates.java:96) >>>> >>>> It crashes immediately upon saying "net.sf.picard.sam.MarkDuplicates >>>> done." with this Premature end of file error. Anyone have any ideas how >>>> to get around this and get it to work? >>>> >>>> Thanks, >>>> Mike >>>> >>>> ------------------------------------------------------------------------------ >>>> >>>> _______________________________________________ >>>> Nelsonlab-analysis mailing list >>>> Nel...@li... >>>> https://lists.sourceforge.net/lists/listinfo/nelsonlab-analysis >>>> >>>> >> |
|
From: Bret H. <ja...@ga...> - 2010-05-17 03:24:49
|
Mike, How are you splitting the bam files? This is the way that I usually do it. samtools view -h ./input.bam chr3_random | samtools view -bS - > ./output.bam -bret On Sun, May 16, 2010 at 08:11:35PM -0700, Michael James Clark wrote: > Here's a clue, maybe. To narrow in on the region, I was going to index one > of the files. When I ran samtools index, I got a couple of errors on one of > the files that didn't work: > > -bash- samtools index 1102T.LMP.chr3_random.bam > [bam_header_read] EOF marker is absent. > [bam_index_core] truncated file? Continue anyway. (-4) > > When I ran samtools index on one of the files that worked, I didn't get that error. Not really sure why the file would be "truncated" when others from the same process aren't. > > > > On 5/16/2010 7:13 PM, Bret Harry wrote: >> Mike, >> >> I have not seen this error before, how long does it take to reproduce this for 1102T.LMP.rmdup.chrY.bam ? I think I would start by further splitting up the BAM file until I got to the smallest possible file size that caused a crash. It's unclear if Picard is crashing because of what's *in* the bam file, or because of the *size* of the bam file. >> >> -bret >> >> On Sun, May 16, 2010 at 04:51:59PM -0700, Michael James Clark wrote: >> >>> Hi guys, >>> I've been trying to get 1102's tumor to run through Picard >>> MarkDuplicates for two weeks now. Resorted to splitting the whole genome >>> file up by chromosome and trying to run that. SOME of it has worked, but >>> some of them crashed with the following error: >>> >>> /usr/java/latest/bin/java -Xmx6G -jar >>> /share/apps/picard-tools-1.19/MarkDuplicates.jar I=1102T.LMP.chrY.bam >>> O=1102T.LMP.rmdup.chrY.bam M=1102T.LMP.rmdup.chrY.metrics >>> TMP_DIR=tmp.files/ REMOVE_DUPLICATES=TRUE VALIDATION_STRINGENCY=SILENT >>> MAX_RECORDS_IN_RAM=8000000 >>> [Sun May 16 16:32:46 PDT 2010] net.sf.picard.sam.MarkDuplicates >>> INPUT=1102T.LMP.chrY.bam OUTPUT=1102T.LMP.rmdup.chrY.bam >>> METRICS_FILE=1102T.LMP.rmdup.chrY.metrics REMOVE_DUPLICATES=true >>> TMP_DIR=tmp.files VALIDATION_STRINGENCY=SILENT >>> MAX_RECORDS_IN_RAM=8000000 ASSUME_SORTED=false >>> MAX_SEQUENCES_FOR_DISK_READ_ENDS_MAP=50000 >>> READ_NAME_REGEX=[a-zA-Z0-9]+:[0-9]:([0-9]+):([0-9]+):([0-9]+).* >>> OPTICAL_DUPLICATE_PIXEL_DISTANCE=100 VERBOSITY=INFO QUIET=false >>> COMPRESSION_LEVEL=5 >>> INFO 2010-05-16 16:32:46 MarkDuplicates Start of doWork >>> freeMemory: 8658104; totalMemory: 9109504; maxMemory: 5726666752 >>> INFO 2010-05-16 16:32:46 MarkDuplicates Reading input file and >>> constructing read end information. >>> INFO 2010-05-16 16:32:46 MarkDuplicates Will retain up to >>> 22724868 data points before spilling to disk. >>> INFO 2010-05-16 16:33:07 MarkDuplicates Read 1000000 records. >>> Tracking 2526 as yet unmatched pairs. 2526 records in RAM. Last >>> sequence index: 23 >>> INFO 2010-05-16 16:33:17 MarkDuplicates Read 2000000 records. >>> Tracking 6482 as yet unmatched pairs. 6482 records in RAM. Last >>> sequence index: 23 >>> INFO 2010-05-16 16:33:27 MarkDuplicates Read 3000000 records. >>> Tracking 10160 as yet unmatched pairs. 10160 records in RAM. Last >>> sequence index: 23 >>> INFO 2010-05-16 16:33:36 MarkDuplicates Read 4000000 records. >>> Tracking 68510 as yet unmatched pairs. 68510 records in RAM. Last >>> sequence index: 23 >>> INFO 2010-05-16 16:33:46 MarkDuplicates Read 5000000 records. >>> Tracking 75782 as yet unmatched pairs. 75782 records in RAM. Last >>> sequence index: 23 >>> INFO 2010-05-16 16:33:55 MarkDuplicates Read 6000000 records. >>> Tracking 97889 as yet unmatched pairs. 97889 records in RAM. Last >>> sequence index: 23 >>> INFO 2010-05-16 16:34:03 MarkDuplicates Read 7000000 records. >>> Tracking 60997 as yet unmatched pairs. 60997 records in RAM. Last >>> sequence index: 23 >>> INFO 2010-05-16 16:34:13 MarkDuplicates Read 8000000 records. >>> Tracking 61125 as yet unmatched pairs. 61125 records in RAM. Last >>> sequence index: 23 >>> INFO 2010-05-16 16:34:22 MarkDuplicates Read 9000000 records. >>> Tracking 60597 as yet unmatched pairs. 60597 records in RAM. Last >>> sequence index: 23 >>> INFO 2010-05-16 16:34:33 MarkDuplicates Read 10000000 records. >>> Tracking 58306 as yet unmatched pairs. 58306 records in RAM. Last >>> sequence index: 23 >>> INFO 2010-05-16 16:34:41 MarkDuplicates Read 11000000 records. >>> Tracking 55412 as yet unmatched pairs. 55412 records in RAM. Last >>> sequence index: 23 >>> INFO 2010-05-16 16:34:53 MarkDuplicates Read 12000000 records. >>> Tracking 51336 as yet unmatched pairs. 51336 records in RAM. Last >>> sequence index: 23 >>> INFO 2010-05-16 16:35:02 MarkDuplicates Read 13000000 records. >>> Tracking 13051 as yet unmatched pairs. 13051 records in RAM. Last >>> sequence index: 23 >>> [Sun May 16 16:35:04 PDT 2010] net.sf.picard.sam.MarkDuplicates done. >>> Runtime.totalMemory()=2987982848 >>> Exception in thread "main" net.sf.samtools.FileTruncatedException: >>> Premature end of file >>> at >>> net.sf.samtools.util.BlockCompressedInputStream.readBlock(BlockCompressedInputStream.java:290) >>> at >>> net.sf.samtools.util.BlockCompressedInputStream.available(BlockCompressedInputStream.java:100) >>> at >>> net.sf.samtools.util.BlockCompressedInputStream.read(BlockCompressedInputStream.java:169) >>> at java.io.DataInputStream.read(DataInputStream.java:132) >>> at >>> net.sf.samtools.util.BinaryCodec.readBytesOrFewer(BinaryCodec.java:394) >>> at net.sf.samtools.util.BinaryCodec.readBytes(BinaryCodec.java:371) >>> at net.sf.samtools.util.BinaryCodec.readBytes(BinaryCodec.java:357) >>> at net.sf.samtools.BAMRecordCodec.decode(BAMRecordCodec.java:182) >>> at >>> net.sf.samtools.BAMFileReader$BAMFileIterator.getNextRecord(BAMFileReader.java:397) >>> at >>> net.sf.samtools.BAMFileReader$BAMFileIterator.advance(BAMFileReader.java:373) >>> at >>> net.sf.samtools.BAMFileReader$BAMFileIterator.next(BAMFileReader.java:363) >>> at >>> net.sf.samtools.BAMFileReader$BAMFileIterator.next(BAMFileReader.java:330) >>> at >>> net.sf.picard.sam.MarkDuplicates.buildSortedReadEndLists(MarkDuplicates.java:261) >>> at net.sf.picard.sam.MarkDuplicates.doWork(MarkDuplicates.java:112) >>> at >>> net.sf.picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:150) >>> at net.sf.picard.sam.MarkDuplicates.main(MarkDuplicates.java:96) >>> >>> It crashes immediately upon saying "net.sf.picard.sam.MarkDuplicates >>> done." with this Premature end of file error. Anyone have any ideas how >>> to get around this and get it to work? >>> >>> Thanks, >>> Mike >>> >>> ------------------------------------------------------------------------------ >>> >>> _______________________________________________ >>> Nelsonlab-analysis mailing list >>> Nel...@li... >>> https://lists.sourceforge.net/lists/listinfo/nelsonlab-analysis >>> > |
|
From: Michael J. C. <mic...@gm...> - 2010-05-17 03:11:44
|
Here's a clue, maybe. To narrow in on the region, I was going to index one of the files. When I ran samtools index, I got a couple of errors on one of the files that didn't work: -bash- samtools index 1102T.LMP.chr3_random.bam [bam_header_read] EOF marker is absent. [bam_index_core] truncated file? Continue anyway. (-4) When I ran samtools index on one of the files that worked, I didn't get that error. Not really sure why the file would be "truncated" when others from the same process aren't. On 5/16/2010 7:13 PM, Bret Harry wrote: > Mike, > > I have not seen this error before, how long does it take to reproduce this for 1102T.LMP.rmdup.chrY.bam ? I think I would start by further splitting up the BAM file until I got to the smallest possible file size that caused a crash. It's unclear if Picard is crashing because of what's *in* the bam file, or because of the *size* of the bam file. > > -bret > > On Sun, May 16, 2010 at 04:51:59PM -0700, Michael James Clark wrote: > >> Hi guys, >> I've been trying to get 1102's tumor to run through Picard >> MarkDuplicates for two weeks now. Resorted to splitting the whole genome >> file up by chromosome and trying to run that. SOME of it has worked, but >> some of them crashed with the following error: >> >> /usr/java/latest/bin/java -Xmx6G -jar >> /share/apps/picard-tools-1.19/MarkDuplicates.jar I=1102T.LMP.chrY.bam >> O=1102T.LMP.rmdup.chrY.bam M=1102T.LMP.rmdup.chrY.metrics >> TMP_DIR=tmp.files/ REMOVE_DUPLICATES=TRUE VALIDATION_STRINGENCY=SILENT >> MAX_RECORDS_IN_RAM=8000000 >> [Sun May 16 16:32:46 PDT 2010] net.sf.picard.sam.MarkDuplicates >> INPUT=1102T.LMP.chrY.bam OUTPUT=1102T.LMP.rmdup.chrY.bam >> METRICS_FILE=1102T.LMP.rmdup.chrY.metrics REMOVE_DUPLICATES=true >> TMP_DIR=tmp.files VALIDATION_STRINGENCY=SILENT >> MAX_RECORDS_IN_RAM=8000000 ASSUME_SORTED=false >> MAX_SEQUENCES_FOR_DISK_READ_ENDS_MAP=50000 >> READ_NAME_REGEX=[a-zA-Z0-9]+:[0-9]:([0-9]+):([0-9]+):([0-9]+).* >> OPTICAL_DUPLICATE_PIXEL_DISTANCE=100 VERBOSITY=INFO QUIET=false >> COMPRESSION_LEVEL=5 >> INFO 2010-05-16 16:32:46 MarkDuplicates Start of doWork >> freeMemory: 8658104; totalMemory: 9109504; maxMemory: 5726666752 >> INFO 2010-05-16 16:32:46 MarkDuplicates Reading input file and >> constructing read end information. >> INFO 2010-05-16 16:32:46 MarkDuplicates Will retain up to >> 22724868 data points before spilling to disk. >> INFO 2010-05-16 16:33:07 MarkDuplicates Read 1000000 records. >> Tracking 2526 as yet unmatched pairs. 2526 records in RAM. Last >> sequence index: 23 >> INFO 2010-05-16 16:33:17 MarkDuplicates Read 2000000 records. >> Tracking 6482 as yet unmatched pairs. 6482 records in RAM. Last >> sequence index: 23 >> INFO 2010-05-16 16:33:27 MarkDuplicates Read 3000000 records. >> Tracking 10160 as yet unmatched pairs. 10160 records in RAM. Last >> sequence index: 23 >> INFO 2010-05-16 16:33:36 MarkDuplicates Read 4000000 records. >> Tracking 68510 as yet unmatched pairs. 68510 records in RAM. Last >> sequence index: 23 >> INFO 2010-05-16 16:33:46 MarkDuplicates Read 5000000 records. >> Tracking 75782 as yet unmatched pairs. 75782 records in RAM. Last >> sequence index: 23 >> INFO 2010-05-16 16:33:55 MarkDuplicates Read 6000000 records. >> Tracking 97889 as yet unmatched pairs. 97889 records in RAM. Last >> sequence index: 23 >> INFO 2010-05-16 16:34:03 MarkDuplicates Read 7000000 records. >> Tracking 60997 as yet unmatched pairs. 60997 records in RAM. Last >> sequence index: 23 >> INFO 2010-05-16 16:34:13 MarkDuplicates Read 8000000 records. >> Tracking 61125 as yet unmatched pairs. 61125 records in RAM. Last >> sequence index: 23 >> INFO 2010-05-16 16:34:22 MarkDuplicates Read 9000000 records. >> Tracking 60597 as yet unmatched pairs. 60597 records in RAM. Last >> sequence index: 23 >> INFO 2010-05-16 16:34:33 MarkDuplicates Read 10000000 records. >> Tracking 58306 as yet unmatched pairs. 58306 records in RAM. Last >> sequence index: 23 >> INFO 2010-05-16 16:34:41 MarkDuplicates Read 11000000 records. >> Tracking 55412 as yet unmatched pairs. 55412 records in RAM. Last >> sequence index: 23 >> INFO 2010-05-16 16:34:53 MarkDuplicates Read 12000000 records. >> Tracking 51336 as yet unmatched pairs. 51336 records in RAM. Last >> sequence index: 23 >> INFO 2010-05-16 16:35:02 MarkDuplicates Read 13000000 records. >> Tracking 13051 as yet unmatched pairs. 13051 records in RAM. Last >> sequence index: 23 >> [Sun May 16 16:35:04 PDT 2010] net.sf.picard.sam.MarkDuplicates done. >> Runtime.totalMemory()=2987982848 >> Exception in thread "main" net.sf.samtools.FileTruncatedException: >> Premature end of file >> at >> net.sf.samtools.util.BlockCompressedInputStream.readBlock(BlockCompressedInputStream.java:290) >> at >> net.sf.samtools.util.BlockCompressedInputStream.available(BlockCompressedInputStream.java:100) >> at >> net.sf.samtools.util.BlockCompressedInputStream.read(BlockCompressedInputStream.java:169) >> at java.io.DataInputStream.read(DataInputStream.java:132) >> at >> net.sf.samtools.util.BinaryCodec.readBytesOrFewer(BinaryCodec.java:394) >> at net.sf.samtools.util.BinaryCodec.readBytes(BinaryCodec.java:371) >> at net.sf.samtools.util.BinaryCodec.readBytes(BinaryCodec.java:357) >> at net.sf.samtools.BAMRecordCodec.decode(BAMRecordCodec.java:182) >> at >> net.sf.samtools.BAMFileReader$BAMFileIterator.getNextRecord(BAMFileReader.java:397) >> at >> net.sf.samtools.BAMFileReader$BAMFileIterator.advance(BAMFileReader.java:373) >> at >> net.sf.samtools.BAMFileReader$BAMFileIterator.next(BAMFileReader.java:363) >> at >> net.sf.samtools.BAMFileReader$BAMFileIterator.next(BAMFileReader.java:330) >> at >> net.sf.picard.sam.MarkDuplicates.buildSortedReadEndLists(MarkDuplicates.java:261) >> at net.sf.picard.sam.MarkDuplicates.doWork(MarkDuplicates.java:112) >> at >> net.sf.picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:150) >> at net.sf.picard.sam.MarkDuplicates.main(MarkDuplicates.java:96) >> >> It crashes immediately upon saying "net.sf.picard.sam.MarkDuplicates >> done." with this Premature end of file error. Anyone have any ideas how >> to get around this and get it to work? >> >> Thanks, >> Mike >> >> ------------------------------------------------------------------------------ >> >> _______________________________________________ >> Nelsonlab-analysis mailing list >> Nel...@li... >> https://lists.sourceforge.net/lists/listinfo/nelsonlab-analysis >> |
|
From: Bret H. <ja...@ga...> - 2010-05-17 02:14:10
|
Mike, I have not seen this error before, how long does it take to reproduce this for 1102T.LMP.rmdup.chrY.bam ? I think I would start by further splitting up the BAM file until I got to the smallest possible file size that caused a crash. It's unclear if Picard is crashing because of what's *in* the bam file, or because of the *size* of the bam file. -bret On Sun, May 16, 2010 at 04:51:59PM -0700, Michael James Clark wrote: > Hi guys, > I've been trying to get 1102's tumor to run through Picard > MarkDuplicates for two weeks now. Resorted to splitting the whole genome > file up by chromosome and trying to run that. SOME of it has worked, but > some of them crashed with the following error: > > /usr/java/latest/bin/java -Xmx6G -jar > /share/apps/picard-tools-1.19/MarkDuplicates.jar I=1102T.LMP.chrY.bam > O=1102T.LMP.rmdup.chrY.bam M=1102T.LMP.rmdup.chrY.metrics > TMP_DIR=tmp.files/ REMOVE_DUPLICATES=TRUE VALIDATION_STRINGENCY=SILENT > MAX_RECORDS_IN_RAM=8000000 > [Sun May 16 16:32:46 PDT 2010] net.sf.picard.sam.MarkDuplicates > INPUT=1102T.LMP.chrY.bam OUTPUT=1102T.LMP.rmdup.chrY.bam > METRICS_FILE=1102T.LMP.rmdup.chrY.metrics REMOVE_DUPLICATES=true > TMP_DIR=tmp.files VALIDATION_STRINGENCY=SILENT > MAX_RECORDS_IN_RAM=8000000 ASSUME_SORTED=false > MAX_SEQUENCES_FOR_DISK_READ_ENDS_MAP=50000 > READ_NAME_REGEX=[a-zA-Z0-9]+:[0-9]:([0-9]+):([0-9]+):([0-9]+).* > OPTICAL_DUPLICATE_PIXEL_DISTANCE=100 VERBOSITY=INFO QUIET=false > COMPRESSION_LEVEL=5 > INFO 2010-05-16 16:32:46 MarkDuplicates Start of doWork > freeMemory: 8658104; totalMemory: 9109504; maxMemory: 5726666752 > INFO 2010-05-16 16:32:46 MarkDuplicates Reading input file and > constructing read end information. > INFO 2010-05-16 16:32:46 MarkDuplicates Will retain up to > 22724868 data points before spilling to disk. > INFO 2010-05-16 16:33:07 MarkDuplicates Read 1000000 records. > Tracking 2526 as yet unmatched pairs. 2526 records in RAM. Last > sequence index: 23 > INFO 2010-05-16 16:33:17 MarkDuplicates Read 2000000 records. > Tracking 6482 as yet unmatched pairs. 6482 records in RAM. Last > sequence index: 23 > INFO 2010-05-16 16:33:27 MarkDuplicates Read 3000000 records. > Tracking 10160 as yet unmatched pairs. 10160 records in RAM. Last > sequence index: 23 > INFO 2010-05-16 16:33:36 MarkDuplicates Read 4000000 records. > Tracking 68510 as yet unmatched pairs. 68510 records in RAM. Last > sequence index: 23 > INFO 2010-05-16 16:33:46 MarkDuplicates Read 5000000 records. > Tracking 75782 as yet unmatched pairs. 75782 records in RAM. Last > sequence index: 23 > INFO 2010-05-16 16:33:55 MarkDuplicates Read 6000000 records. > Tracking 97889 as yet unmatched pairs. 97889 records in RAM. Last > sequence index: 23 > INFO 2010-05-16 16:34:03 MarkDuplicates Read 7000000 records. > Tracking 60997 as yet unmatched pairs. 60997 records in RAM. Last > sequence index: 23 > INFO 2010-05-16 16:34:13 MarkDuplicates Read 8000000 records. > Tracking 61125 as yet unmatched pairs. 61125 records in RAM. Last > sequence index: 23 > INFO 2010-05-16 16:34:22 MarkDuplicates Read 9000000 records. > Tracking 60597 as yet unmatched pairs. 60597 records in RAM. Last > sequence index: 23 > INFO 2010-05-16 16:34:33 MarkDuplicates Read 10000000 records. > Tracking 58306 as yet unmatched pairs. 58306 records in RAM. Last > sequence index: 23 > INFO 2010-05-16 16:34:41 MarkDuplicates Read 11000000 records. > Tracking 55412 as yet unmatched pairs. 55412 records in RAM. Last > sequence index: 23 > INFO 2010-05-16 16:34:53 MarkDuplicates Read 12000000 records. > Tracking 51336 as yet unmatched pairs. 51336 records in RAM. Last > sequence index: 23 > INFO 2010-05-16 16:35:02 MarkDuplicates Read 13000000 records. > Tracking 13051 as yet unmatched pairs. 13051 records in RAM. Last > sequence index: 23 > [Sun May 16 16:35:04 PDT 2010] net.sf.picard.sam.MarkDuplicates done. > Runtime.totalMemory()=2987982848 > Exception in thread "main" net.sf.samtools.FileTruncatedException: > Premature end of file > at > net.sf.samtools.util.BlockCompressedInputStream.readBlock(BlockCompressedInputStream.java:290) > at > net.sf.samtools.util.BlockCompressedInputStream.available(BlockCompressedInputStream.java:100) > at > net.sf.samtools.util.BlockCompressedInputStream.read(BlockCompressedInputStream.java:169) > at java.io.DataInputStream.read(DataInputStream.java:132) > at > net.sf.samtools.util.BinaryCodec.readBytesOrFewer(BinaryCodec.java:394) > at net.sf.samtools.util.BinaryCodec.readBytes(BinaryCodec.java:371) > at net.sf.samtools.util.BinaryCodec.readBytes(BinaryCodec.java:357) > at net.sf.samtools.BAMRecordCodec.decode(BAMRecordCodec.java:182) > at > net.sf.samtools.BAMFileReader$BAMFileIterator.getNextRecord(BAMFileReader.java:397) > at > net.sf.samtools.BAMFileReader$BAMFileIterator.advance(BAMFileReader.java:373) > at > net.sf.samtools.BAMFileReader$BAMFileIterator.next(BAMFileReader.java:363) > at > net.sf.samtools.BAMFileReader$BAMFileIterator.next(BAMFileReader.java:330) > at > net.sf.picard.sam.MarkDuplicates.buildSortedReadEndLists(MarkDuplicates.java:261) > at net.sf.picard.sam.MarkDuplicates.doWork(MarkDuplicates.java:112) > at > net.sf.picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:150) > at net.sf.picard.sam.MarkDuplicates.main(MarkDuplicates.java:96) > > It crashes immediately upon saying "net.sf.picard.sam.MarkDuplicates > done." with this Premature end of file error. Anyone have any ideas how > to get around this and get it to work? > > Thanks, > Mike > > ------------------------------------------------------------------------------ > > _______________________________________________ > Nelsonlab-analysis mailing list > Nel...@li... > https://lists.sourceforge.net/lists/listinfo/nelsonlab-analysis |
|
From: Michael J. C. <mic...@gm...> - 2010-05-16 23:52:08
|
Hi guys,
I've been trying to get 1102's tumor to run through Picard
MarkDuplicates for two weeks now. Resorted to splitting the whole genome
file up by chromosome and trying to run that. SOME of it has worked, but
some of them crashed with the following error:
/usr/java/latest/bin/java -Xmx6G -jar
/share/apps/picard-tools-1.19/MarkDuplicates.jar I=1102T.LMP.chrY.bam
O=1102T.LMP.rmdup.chrY.bam M=1102T.LMP.rmdup.chrY.metrics
TMP_DIR=tmp.files/ REMOVE_DUPLICATES=TRUE VALIDATION_STRINGENCY=SILENT
MAX_RECORDS_IN_RAM=8000000
[Sun May 16 16:32:46 PDT 2010] net.sf.picard.sam.MarkDuplicates
INPUT=1102T.LMP.chrY.bam OUTPUT=1102T.LMP.rmdup.chrY.bam
METRICS_FILE=1102T.LMP.rmdup.chrY.metrics REMOVE_DUPLICATES=true
TMP_DIR=tmp.files VALIDATION_STRINGENCY=SILENT
MAX_RECORDS_IN_RAM=8000000 ASSUME_SORTED=false
MAX_SEQUENCES_FOR_DISK_READ_ENDS_MAP=50000
READ_NAME_REGEX=[a-zA-Z0-9]+:[0-9]:([0-9]+):([0-9]+):([0-9]+).*
OPTICAL_DUPLICATE_PIXEL_DISTANCE=100 VERBOSITY=INFO QUIET=false
COMPRESSION_LEVEL=5
INFO 2010-05-16 16:32:46 MarkDuplicates Start of doWork
freeMemory: 8658104; totalMemory: 9109504; maxMemory: 5726666752
INFO 2010-05-16 16:32:46 MarkDuplicates Reading input file and
constructing read end information.
INFO 2010-05-16 16:32:46 MarkDuplicates Will retain up to
22724868 data points before spilling to disk.
INFO 2010-05-16 16:33:07 MarkDuplicates Read 1000000 records.
Tracking 2526 as yet unmatched pairs. 2526 records in RAM. Last
sequence index: 23
INFO 2010-05-16 16:33:17 MarkDuplicates Read 2000000 records.
Tracking 6482 as yet unmatched pairs. 6482 records in RAM. Last
sequence index: 23
INFO 2010-05-16 16:33:27 MarkDuplicates Read 3000000 records.
Tracking 10160 as yet unmatched pairs. 10160 records in RAM. Last
sequence index: 23
INFO 2010-05-16 16:33:36 MarkDuplicates Read 4000000 records.
Tracking 68510 as yet unmatched pairs. 68510 records in RAM. Last
sequence index: 23
INFO 2010-05-16 16:33:46 MarkDuplicates Read 5000000 records.
Tracking 75782 as yet unmatched pairs. 75782 records in RAM. Last
sequence index: 23
INFO 2010-05-16 16:33:55 MarkDuplicates Read 6000000 records.
Tracking 97889 as yet unmatched pairs. 97889 records in RAM. Last
sequence index: 23
INFO 2010-05-16 16:34:03 MarkDuplicates Read 7000000 records.
Tracking 60997 as yet unmatched pairs. 60997 records in RAM. Last
sequence index: 23
INFO 2010-05-16 16:34:13 MarkDuplicates Read 8000000 records.
Tracking 61125 as yet unmatched pairs. 61125 records in RAM. Last
sequence index: 23
INFO 2010-05-16 16:34:22 MarkDuplicates Read 9000000 records.
Tracking 60597 as yet unmatched pairs. 60597 records in RAM. Last
sequence index: 23
INFO 2010-05-16 16:34:33 MarkDuplicates Read 10000000 records.
Tracking 58306 as yet unmatched pairs. 58306 records in RAM. Last
sequence index: 23
INFO 2010-05-16 16:34:41 MarkDuplicates Read 11000000 records.
Tracking 55412 as yet unmatched pairs. 55412 records in RAM. Last
sequence index: 23
INFO 2010-05-16 16:34:53 MarkDuplicates Read 12000000 records.
Tracking 51336 as yet unmatched pairs. 51336 records in RAM. Last
sequence index: 23
INFO 2010-05-16 16:35:02 MarkDuplicates Read 13000000 records.
Tracking 13051 as yet unmatched pairs. 13051 records in RAM. Last
sequence index: 23
[Sun May 16 16:35:04 PDT 2010] net.sf.picard.sam.MarkDuplicates done.
Runtime.totalMemory()=2987982848
Exception in thread "main" net.sf.samtools.FileTruncatedException:
Premature end of file
at
net.sf.samtools.util.BlockCompressedInputStream.readBlock(BlockCompressedInputStream.java:290)
at
net.sf.samtools.util.BlockCompressedInputStream.available(BlockCompressedInputStream.java:100)
at
net.sf.samtools.util.BlockCompressedInputStream.read(BlockCompressedInputStream.java:169)
at java.io.DataInputStream.read(DataInputStream.java:132)
at
net.sf.samtools.util.BinaryCodec.readBytesOrFewer(BinaryCodec.java:394)
at net.sf.samtools.util.BinaryCodec.readBytes(BinaryCodec.java:371)
at net.sf.samtools.util.BinaryCodec.readBytes(BinaryCodec.java:357)
at net.sf.samtools.BAMRecordCodec.decode(BAMRecordCodec.java:182)
at
net.sf.samtools.BAMFileReader$BAMFileIterator.getNextRecord(BAMFileReader.java:397)
at
net.sf.samtools.BAMFileReader$BAMFileIterator.advance(BAMFileReader.java:373)
at
net.sf.samtools.BAMFileReader$BAMFileIterator.next(BAMFileReader.java:363)
at
net.sf.samtools.BAMFileReader$BAMFileIterator.next(BAMFileReader.java:330)
at
net.sf.picard.sam.MarkDuplicates.buildSortedReadEndLists(MarkDuplicates.java:261)
at net.sf.picard.sam.MarkDuplicates.doWork(MarkDuplicates.java:112)
at
net.sf.picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:150)
at net.sf.picard.sam.MarkDuplicates.main(MarkDuplicates.java:96)
It crashes immediately upon saying "net.sf.picard.sam.MarkDuplicates
done." with this Premature end of file error. Anyone have any ideas how
to get around this and get it to work?
Thanks,
Mike
|
|
From: Michael J. C. <mic...@gm...> - 2010-04-12 20:48:41
|
Hi all, That¹s right, it¹s time for another whole genome sequencing paper by the Nelson Lab All-Stars! We have a summary of the SOLiD data here: https://secure.genome.ucla.edu/index.php/1102_Sequencing#Standard_Libs I didn¹t personally create those files, but there is a summary about PCR dup removal on that page. Does anyone know if that was done with samtools or Picard? If it was Samtools rmdup (and I¹m guessing it was), I want to re-run it using Picard MarkDuplicates (Samtools misses some percent of the duplicates, and Picard is really the standard tool for it now). The processed data is here: /home/solexa/abi_datasets/Reports/1102_whole_genome_bfast_alignment/ I know there was a variant database created by Brian for 1102 a while back, but from what I understand the variant database server is gone now. Does that mean we should re-run the variant calling/queryengine analysis? I want to get together numbers such as the dbSNP intersect and the coding consequences and so forth for the whole 1102 data set. Will the method I used previously to generate queryengine reports (from here: https://secure.genome.ucla.edu/index.php/Sequence_Analysis_HowTo#Step_4_-_Ge nerating_Reports ) still work? Thanks, Mike |
|
From: Brian O'C. <bri...@gm...> - 2010-03-26 02:27:25
|
Hey Guys, I updated the SeqWare Pipeline workflow documentation today for version 0.6.2.1, specifically for the variant DB workflow and the provisioning of query engine databases. See: https://secure.genome.ucla.edu/index.php/Sequence_Analysis_Workflow_HOWTO for the updated information. Keep in mind there's a little more work for the database provisioning steps to work, I need to get Bret and/or Jordan's help with that once I cleanup some old DBs on compute-2-7. I included a TODO list in the wiki entry for Bret and Jordan: https://secure.genome.ucla.edu/index.php/Sequence_Analysis_Workflow_HOWTO#Provisioning_database_to_SeqWare_QueryEngine The important take home point for "power users" (Kevin, Hane, Michael Clark, Sam, etc) that want to run workflows for alignment, variant calling, and annotation is to please use the current tagged version (0.6.2.1). This way the directions in the wiki and the tools parameters will match up. As new version of SeqWare become available Jordan will coordinate the software release and documentation on this wiki page and send out an email letting people know it's been updated. The next item on my plate is finishing the updates to the LIMS. I'll send around links to docs in the wiki and the updated LIMS URL when it's available (first week of April I hope). At that point I want to get everyone's help moving the unstructured sample metadata in the wiki into the LIMS SRA datamodel. This will make it much easier to track samples and what we've done to them computationally. Hope this helps! --Brian |
|
From: Jordan M. <jme...@uc...> - 2010-02-25 20:29:17
|
I made one additional change, so that external users will only be allowed to use their 8 machines if they are otherwise idle. This means that when the cluster is full and people are waiting, the queued jobs of the external users will have the lowest priority and will not be scheduled unless a node is otherwise going to sit idle. When the cluster is empty, they will still only be able to use up to 8 nodes, to prevent them filling up slots that one of us might otherwise want to use later on. https://secure.genome.ucla.edu/index.php/Sun_Grid_Engine#Setting_a_user_to_only_use_idle_resources Jordan On Tue, Feb 23, 2010 at 6:57 PM, Jordan Mendler <jme...@uc...> wrote: > Hi all, > > I think I resolved two issues, so please keep an eye on the cluster and > your jobs to make sure nothing breaks: > > 1) I added a quota to the external users, so that each can not use more > than 64 slots at a single time. Each machine has 8 slots, so this equates to > 8 full machines. > > 2) I resolved the issues of host prioritization, without having to add a > queue. When a user submits a job, they are scheduled as follows: > - By default, the job will use a low memory node > - If all low memory nodes are full, or the job requires more than 8GB > of RAM, it will use one of the faster high memory nodes (compute-3-x) so > that it ends ASAP. > - If all compute-3-x nodes are full and the job needs more memory, then > it will be scheduled to a slower high memory node (compute-2-x). > > In my testing, everything seems to work properly, but please let me know if > you see anything weird happening. If you want to know the gory > implementation details, I added them to > https://secure.genome.ucla.edu/index.php/Sun_Grid_Engine > > Cordially, > Jordan > |
|
From: Jordan M. <jme...@uc...> - 2010-02-24 02:58:19
|
Hi all,
I think I resolved two issues, so please keep an eye on the cluster and your
jobs to make sure nothing breaks:
1) I added a quota to the external users, so that each can not use more than
64 slots at a single time. Each machine has 8 slots, so this equates to 8
full machines.
2) I resolved the issues of host prioritization, without having to add a
queue. When a user submits a job, they are scheduled as follows:
- By default, the job will use a low memory node
- If all low memory nodes are full, or the job requires more than 8GB of
RAM, it will use one of the faster high memory nodes (compute-3-x) so that
it ends ASAP.
- If all compute-3-x nodes are full and the job needs more memory, then
it will be scheduled to a slower high memory node (compute-2-x).
In my testing, everything seems to work properly, but please let me know if
you see anything weird happening. If you want to know the gory
implementation details, I added them to
https://secure.genome.ucla.edu/index.php/Sun_Grid_Engine
Cordially,
Jordan
|
|
From: Jordan M. <jme...@uc...> - 2010-02-23 19:30:17
|
Alignments are still going, but here is the raw data if anyone wants to look at it: http://genome.ucla.edu/~solexa/solexa_datasets/100218_ILLUMINA-EAS172_61DME/ Jordan |
|
From: Brian O'C. <bri...@gm...> - 2010-02-22 20:05:58
|
Hey Jordan, Just wanted to let you know that Zugen said the whole exome capture samples are now finished on the sequencer. This should be a very interesting dataset, I'm looking forward to seeing it run through your workflow! --B |
|
From: Brian O'C. <bri...@gm...> - 2010-02-22 08:55:50
|
Thanks Jordan! --Brian On Mon, Feb 22, 2010 at 12:51 AM, Jordan Mendler <jme...@uc...> wrote: > Kevin, > > Pileups from our pipeline are available from here. We are still debugging, > so please let me know how the data looks. > > /home/seqware/datasets/experiments/1519/U08/pileup > > Jordan > > > On Sun, Feb 21, 2010 at 8:35 PM, Jordan Mendler <jme...@uc...> wrote: >> >> Jobs are running >> >> >> -------------------- >> Note to self: >> >> bin/pegasus-run.sh workflows/HumanGenomicVariant.ftl >> config/HumanGenomicVariant.ini /tmp/variantnew.dax >> --experiment_name=1518_U08_neck_tumor >> --experiment_bam_inputs=1519_whole_genome_bfast_alignment/nhomer_bfast_alignment_reports-2010-2-02/bam/bfast.susan_20091217_2_50mer_only.bam >> --experiment_directory=experiments/1519/U08 >> >> bin/pegasus-run.sh workflows/HumanGenomicVariant.ftl >> config/HumanGenomicVariant.ini /tmp/variantnew.dax >> --experiment_name=1518_normal_blood >> --experiment_bam_inputs=1519_whole_genome_bfast_alignment/nhomer_bfast_alignment_reports-2010-2-02/bam/bfast.tomoe_20091222_2_50mer_only.bam >> --experiment_directory=experiments/1519/U08 > |
|
From: Jordan M. <jme...@uc...> - 2010-02-22 08:52:03
|
Kevin, Pileups from our pipeline are available from here. We are still debugging, so please let me know how the data looks. /home/seqware/datasets/experiments/1519/U08/pileup Jordan On Sun, Feb 21, 2010 at 8:35 PM, Jordan Mendler <jme...@uc...> wrote: > Jobs are running > > > -------------------- > Note to self: > > bin/pegasus-run.sh workflows/HumanGenomicVariant.ftl > config/HumanGenomicVariant.ini /tmp/variantnew.dax > --experiment_name=1518_U08_neck_tumor > --experiment_bam_inputs=1519_whole_genome_bfast_alignment/nhomer_bfast_alignment_reports-2010-2-02/bam/bfast.susan_20091217_2_50mer_only.bam > --experiment_directory=experiments/1519/U08 > > bin/pegasus-run.sh workflows/HumanGenomicVariant.ftl > config/HumanGenomicVariant.ini /tmp/variantnew.dax > --experiment_name=1518_normal_blood > --experiment_bam_inputs=1519_whole_genome_bfast_alignment/nhomer_bfast_alignment_reports-2010-2-02/bam/bfast.tomoe_20091222_2_50mer_only.bam > --experiment_directory=experiments/1519/U08 |
|
From: Jordan M. <jme...@uc...> - 2010-02-17 05:09:30
|
Below is a list of experiment aggregates I am running through my pipeline for U87. Can people please review and tell me if there is anything else I am missing? Thanks, Jordan ---- A) UCLA 50+50 LMP. (variant calling done) * /home/solexa/abi_datasets/Reports/U87_whole_genome_bfast_alignment/nhomer_bfast_alignment_reports-2010-1-15/bam/bfast.U87.old.flagdup.bam B) 1024 Balanced Probes 50+50 LMP (2 slides) (variant calling done) * /home/solexa/abi_datasets/Reports/U87_whole_genome_bfast_alignment/nhomer_bfast_alignment_reports-2010-1-15/bam/bfast.solid0053_20091211_U87_LMP.flagdup.bam C) 1024 Balanced Probes 50+35 (with degenerate 35bp) (2 slides) (variant calling done) * /home/solexa/abi_datasets/Reports/U87_whole_genome_bfast_alignment/nhomer_bfast_alignment_reports-2010-1-15/bam/bfast.SELMA_20091214_1_U87_1024FAP_B1_F3.bam * /home/solexa/abi_datasets/Reports/U87_whole_genome_bfast_alignment/nhomer_bfast_alignment_reports-2010-1-15/bam/bfast.SELMA_20091214_1_U87_1024FAP_B1_R3.bam * /home/solexa/abi_datasets/Reports/U87_whole_genome_bfast_alignment/nhomer_bfast_alignment_reports-2010-1-15/bam/bfast.SELMA_20091214_2_U87_1024FAP_B2_F3.bam * /home/solexa/abi_datasets/Reports/U87_whole_genome_bfast_alignment/nhomer_bfast_alignment_reports-2010-1-15/bam/bfast.SELMA_20091214_2_U87_1024FAP_B2_R3.bam D) Same as C but 50+25 instead of 50+35. (Still waiting on alignments from Nils). E) take B) and C) (forward only) and merge those together for pileups. (Currently running). What we want for this one is all the 1024 50mer reads merged together. * /home/solexa/abi_datasets/Reports/U87_whole_genome_bfast_alignment/nhomer_bfast_alignment_reports-2010-1-15/bam/bfast.solid0053_20091211_U87_LMP.flagdup.bam * /home/solexa/abi_datasets/Reports/U87_whole_genome_bfast_alignment/nhomer_bfast_alignment_reports-2010-1-15/bam/bfast.SELMA_20091214_1_U87_1024FAP_B1_F3.bam * /home/solexa/abi_datasets/Reports/U87_whole_genome_bfast_alignment/nhomer_bfast_alignment_reports-2010-1-15/bam/bfast.SELMA_20091214_2_U87_1024FAP_B2_F3.bam |
|
From: Nils H. <nil...@uc...> - 2010-02-11 01:09:40
|
Presumably, if I am on compute-3-4 and another user is on compute-3-4 and we are both writing to disk (or reading) then the bottleneck is the Ethernet connection we are sharing on compute-3-4. ³-pe serial 8² may be another way of saying I don¹t want people rsyncing/reading/writing while I am using this node. Nils On 2/10/10 5:00 PM, "Jordan Mendler" <jme...@uc...> wrote: > Not sure SGE/Lustre are that complex. The closer thing will be once > Inifiniband is working, we will be able to request IB nodes which have a much > faster connection to Lustre. > > Jordan > > On Wed, Feb 10, 2010 at 4:02 PM, Nils Homer <nil...@uc...> wrote: >> >> What about reserving I/O? If a job is rsyncing etc. we don¹t want others to >> use bandwidth. >> >> >> Nils >> >> On 2/10/10 3:38 PM, "Jordan Mendler" <jme...@uc... >> <http://jmendler@ucla.edu> > wrote: >> >>> I have noticed that many people have been submitting jobs with -pe serial 8, >>> when their job only uses one thread. Please only use -pe serial #, if your >>> job is multi-threaded and using # threads. >>> >>> If you are simply trying to reserve memory, use -l vf=#G instead. Please >>> note that this is on a per core basis, so -l vf=4G -pe serial 8 will request >>> an 8-core machine with 32G of RAM. -l vf=4G -pe serial 2 will request 8G and >>> 2 cpu's. -l vf=4G alone will request 4GB and 1 cpu. >>> >>> Thanks, >>> Jordan >>> >>> >>> ---------------------------------------------------------------------------- >>> -- >>> SOLARIS 10 is the OS for Data Centers - provides features such as DTrace, >>> Predictive Self Healing and Award Winning ZFS. Get Solaris 10 NOW >>> http://p.sf.net/sfu/solaris-dev2dev >>> >>> _______________________________________________ >>> Nelsonlab-devel mailing list >>> Nel...@li... >>> <http://Nelsonlab-d...@li...> >>> https://lists.sourceforge.net/lists/listinfo/nelsonlab-devel > > |
|
From: Jordan M. <jme...@uc...> - 2010-02-11 01:01:31
|
Not sure SGE/Lustre are that complex. The closer thing will be once Inifiniband is working, we will be able to request IB nodes which have a much faster connection to Lustre. Jordan On Wed, Feb 10, 2010 at 4:02 PM, Nils Homer <nil...@uc...> wrote: > > What about reserving I/O? If a job is rsyncing etc. we don’t want others > to use bandwidth. > > > Nils > > On 2/10/10 3:38 PM, "Jordan Mendler" <jme...@uc...> wrote: > > I have noticed that many people have been submitting jobs with -pe serial > 8, when their job only uses one thread. Please only use -pe serial #, if > your job is multi-threaded and using # threads. > > If you are simply trying to reserve memory, use -l vf=#G instead. Please > note that this is on a per core basis, so -l vf=4G -pe serial 8 will request > an 8-core machine with 32G of RAM. -l vf=4G -pe serial 2 will request 8G and > 2 cpu's. -l vf=4G alone will request 4GB and 1 cpu. > > Thanks, > Jordan > > ------------------------------ > > ------------------------------------------------------------------------------ > SOLARIS 10 is the OS for Data Centers - provides features such as DTrace, > Predictive Self Healing and Award Winning ZFS. Get Solaris 10 NOW > http://p.sf.net/sfu/solaris-dev2dev > ------------------------------ > _______________________________________________ > Nelsonlab-devel mailing list > Nel...@li... > https://lists.sourceforge.net/lists/listinfo/nelsonlab-devel > > |
|
From: Nils H. <nil...@uc...> - 2010-02-11 00:50:40
|
What about reserving I/O? If a job is rsyncing etc. we don¹t want others to use bandwidth. Nils On 2/10/10 3:38 PM, "Jordan Mendler" <jme...@uc...> wrote: > I have noticed that many people have been submitting jobs with -pe serial 8, > when their job only uses one thread. Please only use -pe serial #, if your job > is multi-threaded and using # threads. > > If you are simply trying to reserve memory, use -l vf=#G instead. Please note > that this is on a per core basis, so -l vf=4G -pe serial 8 will request an > 8-core machine with 32G of RAM. -l vf=4G -pe serial 2 will request 8G and 2 > cpu's. -l vf=4G alone will request 4GB and 1 cpu. > > Thanks, > Jordan > > > ------------------------------------------------------------------------------ > SOLARIS 10 is the OS for Data Centers - provides features such as DTrace, > Predictive Self Healing and Award Winning ZFS. Get Solaris 10 NOW > http://p.sf.net/sfu/solaris-dev2dev > > _______________________________________________ > Nelsonlab-devel mailing list > Nel...@li... > https://lists.sourceforge.net/lists/listinfo/nelsonlab-devel |
|
From: Jordan M. <jme...@uc...> - 2010-02-11 00:04:14
|
The comment was more general, as some reservations were of 24Gb for 1 core. On Wed, Feb 10, 2010 at 4:00 PM, Michael Yourshaw <myo...@uc...>wrote: > What difference does it make if you want to reserve all memory on a 32 Gb > node? > > > ॐ > > Michael Yourshaw > UCLA Geffen School of Medicine > Department of Human Genetics, Nelson Lab > 695 Charles E Young Drive S > Gonda 5554 > > Los Angeles CA 90095-8348 USA > > myo...@uc... > > 970.691.8299 > > This message is intended only for the use of the addressee and may contain > information that is PRIVILEGED and CONFIDENTIAL, and/or may contain ATTORNEY > WORK PRODUCT. If you are not the intended recipient, you are hereby notified > that any dissemination of this communication is strictly prohibited. If you > have received this communication in error, please erase all copies of the > message and its attachments and notify us immediately. Thank you. > > > > > On Feb 10, 2010, at 15:38, Jordan Mendler wrote: > > I have noticed that many people have been submitting jobs with -pe serial > 8, when their job only uses one thread. Please only use -pe serial #, if > your job is multi-threaded and using # threads. > > If you are simply trying to reserve memory, use -l vf=#G instead. Please > note that this is on a per core basis, so -l vf=4G -pe serial 8 will request > an 8-core machine with 32G of RAM. -l vf=4G -pe serial 2 will request 8G and > 2 cpu's. -l vf=4G alone will request 4GB and 1 cpu. > > Thanks, > Jordan > > ------------------------------------------------------------------------------ > SOLARIS 10 is the OS for Data Centers - provides features such as DTrace, > Predictive Self Healing and Award Winning ZFS. Get Solaris 10 NOW > > http://p.sf.net/sfu/solaris-dev2dev_______________________________________________ > Nelsonlab-devel mailing list > Nel...@li... > https://lists.sourceforge.net/lists/listinfo/nelsonlab-devel > > > |
|
From: Nils H. <nil...@uc...> - 2010-02-11 00:02:06
|
Not all portions of jobs use 8 threads (i.e. bfast match¹) but nonetheless require 8 threads for various parts of the process. Nils On 2/10/10 3:38 PM, "Jordan Mendler" <jme...@uc...> wrote: > I have noticed that many people have been submitting jobs with -pe serial 8, > when their job only uses one thread. Please only use -pe serial #, if your job > is multi-threaded and using # threads. > > If you are simply trying to reserve memory, use -l vf=#G instead. Please note > that this is on a per core basis, so -l vf=4G -pe serial 8 will request an > 8-core machine with 32G of RAM. -l vf=4G -pe serial 2 will request 8G and 2 > cpu's. -l vf=4G alone will request 4GB and 1 cpu. > > Thanks, > Jordan > > > ------------------------------------------------------------------------------ > SOLARIS 10 is the OS for Data Centers - provides features such as DTrace, > Predictive Self Healing and Award Winning ZFS. Get Solaris 10 NOW > http://p.sf.net/sfu/solaris-dev2dev > > _______________________________________________ > Nelsonlab-analysis mailing list > Nel...@li... > https://lists.sourceforge.net/lists/listinfo/nelsonlab-analysis |
|
From: Michael Y. <myo...@uc...> - 2010-02-11 00:01:08
|
What difference does it make if you want to reserve all memory on a 32 Gb node? ॐ Michael Yourshaw UCLA Geffen School of Medicine Department of Human Genetics, Nelson Lab 695 Charles E Young Drive S Gonda 5554 Los Angeles CA 90095-8348 USA myo...@uc... 970.691.8299 This message is intended only for the use of the addressee and may contain information that is PRIVILEGED and CONFIDENTIAL, and/or may contain ATTORNEY WORK PRODUCT. If you are not the intended recipient, you are hereby notified that any dissemination of this communication is strictly prohibited. If you have received this communication in error, please erase all copies of the message and its attachments and notify us immediately. Thank you. On Feb 10, 2010, at 15:38, Jordan Mendler wrote: > I have noticed that many people have been submitting jobs with -pe serial 8, when their job only uses one thread. Please only use -pe serial #, if your job is multi-threaded and using # threads. > > If you are simply trying to reserve memory, use -l vf=#G instead. Please note that this is on a per core basis, so -l vf=4G -pe serial 8 will request an 8-core machine with 32G of RAM. -l vf=4G -pe serial 2 will request 8G and 2 cpu's. -l vf=4G alone will request 4GB and 1 cpu. > > Thanks, > Jordan > ------------------------------------------------------------------------------ > SOLARIS 10 is the OS for Data Centers - provides features such as DTrace, > Predictive Self Healing and Award Winning ZFS. Get Solaris 10 NOW > http://p.sf.net/sfu/solaris-dev2dev_______________________________________________ > Nelsonlab-devel mailing list > Nel...@li... > https://lists.sourceforge.net/lists/listinfo/nelsonlab-devel |