You can subscribe to this list here.
2009 |
Jan
|
Feb
(26) |
Mar
(37) |
Apr
(20) |
May
(20) |
Jun
(81) |
Jul
(65) |
Aug
(77) |
Sep
(85) |
Oct
(86) |
Nov
(233) |
Dec
(131) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2010 |
Jan
(201) |
Feb
(274) |
Mar
(257) |
Apr
(208) |
May
(99) |
Jun
(223) |
Jul
(131) |
Aug
(139) |
Sep
(195) |
Oct
(133) |
Nov
(196) |
Dec
(184) |
2011 |
Jan
(156) |
Feb
(181) |
Mar
(246) |
Apr
(131) |
May
(172) |
Jun
(157) |
Jul
(121) |
Aug
(200) |
Sep
(199) |
Oct
(114) |
Nov
(189) |
Dec
(68) |
2012 |
Jan
(130) |
Feb
(180) |
Mar
(232) |
Apr
(173) |
May
(149) |
Jun
(105) |
Jul
(107) |
Aug
(69) |
Sep
(59) |
Oct
(49) |
Nov
(43) |
Dec
(56) |
2013 |
Jan
(69) |
Feb
(58) |
Mar
(79) |
Apr
(90) |
May
(80) |
Jun
(62) |
Jul
(76) |
Aug
(63) |
Sep
(49) |
Oct
(66) |
Nov
(66) |
Dec
(32) |
2014 |
Jan
(57) |
Feb
(29) |
Mar
(84) |
Apr
(28) |
May
(73) |
Jun
(57) |
Jul
(65) |
Aug
(91) |
Sep
(118) |
Oct
(133) |
Nov
(99) |
Dec
(69) |
2015 |
Jan
(39) |
Feb
(52) |
Mar
(35) |
Apr
(19) |
May
(56) |
Jun
(29) |
Jul
(37) |
Aug
(44) |
Sep
(21) |
Oct
(17) |
Nov
(43) |
Dec
(35) |
2016 |
Jan
(24) |
Feb
(35) |
Mar
(32) |
Apr
(27) |
May
(33) |
Jun
(26) |
Jul
(20) |
Aug
(35) |
Sep
(10) |
Oct
(26) |
Nov
(22) |
Dec
(12) |
2017 |
Jan
(31) |
Feb
(20) |
Mar
(24) |
Apr
(19) |
May
(17) |
Jun
(8) |
Jul
(9) |
Aug
(15) |
Sep
(8) |
Oct
(4) |
Nov
(21) |
Dec
(8) |
2018 |
Jan
(6) |
Feb
(14) |
Mar
(10) |
Apr
(14) |
May
(10) |
Jun
(5) |
Jul
(8) |
Aug
(12) |
Sep
(7) |
Oct
(8) |
Nov
(19) |
Dec
(2) |
2019 |
Jan
(2) |
Feb
(8) |
Mar
(7) |
Apr
|
May
(5) |
Jun
(8) |
Jul
(2) |
Aug
|
Sep
|
Oct
|
Nov
|
Dec
(1) |
2020 |
Jan
|
Feb
(2) |
Mar
(2) |
Apr
(7) |
May
|
Jun
(14) |
Jul
(7) |
Aug
(2) |
Sep
(14) |
Oct
(4) |
Nov
(1) |
Dec
|
2021 |
Jan
|
Feb
|
Mar
(4) |
Apr
(1) |
May
|
Jun
(3) |
Jul
(3) |
Aug
(3) |
Sep
(8) |
Oct
(1) |
Nov
(4) |
Dec
(1) |
2022 |
Jan
(11) |
Feb
(1) |
Mar
|
Apr
(3) |
May
|
Jun
|
Jul
|
Aug
(5) |
Sep
(5) |
Oct
(2) |
Nov
(3) |
Dec
|
2023 |
Jan
|
Feb
(1) |
Mar
|
Apr
|
May
|
Jun
|
Jul
(3) |
Aug
|
Sep
|
Oct
(2) |
Nov
|
Dec
(1) |
2024 |
Jan
(2) |
Feb
|
Mar
|
Apr
(1) |
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
From: Mehta, P. <Perdeep.Mehta@STJUDE.ORG> - 2010-02-25 23:10:43
|
Hi, I have a file generated by samtools pileup option. The output is fine and everything looks ok, except at few places annotation does not make sense or I may be missing something. According to samtools doc for pileup, "a symbol '^' marks the start of a read segment which is a contiguous subsequence on the read separated by 'N/S/H' CIGAR operations. A symbol '$' marks the end of a read segment." A preceding read in the following example ends at 590115 with '$' mark, but I don't see a '^' at the start of next read beginning at 631553, but it does end at 631582 with '$' mark. It does not seems to be consistent if there is single or multiple mapped reads to a location. If formatting of the example gets screwed up, I have attached a word doc. Any clues or thoughts will help. Thanks. perdeep - - chr10 590112 G G 84 0 51 19 ,,,,,,,,,,,,,,,,,,, 7:@.?<:79;;9(4614<7 chr10 590113 T T 84 0 51 19 ,,,,,,,,,,,,,,,,,,, 6941:@:<0.+0;71/)B2 chr10 590114 G G 84 0 51 19 ,$,$,$,$,,$,$,$,$,$,$,$,$,$,,$,$,,$ A3815501;6;5181;1<= chr10 590115 G G 18 0 54 3 ,$,$,$ )(( chr10 631553 a A 4 0 1 1 . 2 chr10 631554 a A 4 0 1 1 . 9 chr10 631555 a G 4 4 1 1 G 2 chr10 631556 a A 4 0 1 1 . 2 chr10 631557 a A 4 0 1 1 . = chr10 631558 a G 4 4 1 1 G 2 chr10 631559 a A 4 0 1 1 . 1 chr10 631560 a A 4 0 1 1 . @ chr10 631561 a A 4 0 1 1 . 2 chr10 631562 a A 4 0 1 1 . @ chr10 631563 a A 4 0 1 1 . 1 chr10 631564 a A 4 0 1 1 . 9 chr10 631565 a A 4 0 1 1 . @ chr10 631566 a A 4 0 1 1 . ; chr10 631567 a A 4 0 1 1 . ( chr10 631568 a A 4 0 1 1 . : chr10 631569 a A 4 0 1 1 . @ chr10 631570 T T 4 0 1 1 . 2 chr10 631571 T T 4 0 1 1 . > chr10 631572 A A 4 0 1 1 . + chr10 631573 G G 4 0 1 1 . ( chr10 631574 A A 4 0 1 1 . > chr10 631575 T T 4 0 1 1 . 5 chr10 631576 G T 4 4 1 1 T > chr10 631577 T T 4 0 1 1 . 0 chr10 631578 A A 4 0 1 1 . 5 chr10 631579 C C 4 0 1 1 . 0 chr10 631580 T T 4 0 1 1 . 0 chr10 631581 C C 4 0 1 1 . 8 chr10 631582 A A 4 0 1 1 .$ 8 chr10 631914 a A 12 0 12 1 ^-, : chr10 631915 a A 12 0 12 1 , : chr10 631916 a A 12 0 12 1 , @ chr10 631917 a A 12 0 12 1 , @ - - ________________________________ Email Disclaimer: www.stjude.org/emaildisclaimer |
From: Pratap, A. <AP...@so...> - 2010-02-25 21:48:49
|
One more I don’t see a flag which can tell total #reads that mapped irrespective of whether they were paired or not ? How to get this number from bam file ? Thanks! -Abhi From: Pratap, Abhishek [mailto:AP...@so...] Sent: Thursday, February 25, 2010 4:14 PM To: SAMtools Help List Subject: [Samtools-help] FW: [SAMtools-help] Inferring mate pair matching stats from SAM/BAm File Few more questions related to same thread. 1. How do we know the #uniquely mapped reads in a bam file ? Will the following do the trick after subtracting from the total mapped reads ? samtools view -f 100 s_7_combined_sorted.bam | wc –l 2. What does the following mean from the samtools flagstat output. >33501178 with itself and mate mapped : mapped with itself ?? Thanks! -Abhi From: Pratap, Abhishek Sent: Tuesday, January 26, 2010 2:44 PM To: 'Angie Hinrichs' Subject: RE: [Samtools-help] Inferring mate pair matching stats from SAM/BAm File Hi Angie Looking at the filtered sam file more closely I see what you are saying is correct. In a normal samfile these pairs are together and I could visually spot them. Once filtered and sorted the output had these paired far apart which let me to question /2. Thanks! -Abhi From: Angie Hinrichs [mailto:an...@so...] Sent: Tuesday, January 26, 2010 12:37 PM To: Pratap, Abhishek Subject: Re: [Samtools-help] Inferring mate pair matching stats from SAM/BAm File Hi Abhishek, When a mate pair is properly mapped, there is one line for the left read and another line for the right read. samtools view will give you the number of reads; the number of properly paired mates should be half that. Angie ----- "Abhishek Pratap" <AP...@so...> wrote: > From: "Abhishek Pratap" <AP...@so...> > To: "Angie Hinrichs" <an...@so...> > Sent: Tuesday, January 26, 2010 9:16:30 AM GMT -08:00 US/Canada Pacific > Subject: RE: [Samtools-help] Inferring mate pair matching stats from SAM/BAm File > > > Morning Angie A quick question : # View only reads that are properly paired, and count the number of lines/reads: samtools view -f 3 my.bam | wc -l # Divide that number by two Why are we dividing by 2 here. I see each line as one mat pair .. right ? -A > From: Angie Hinrichs [mailto:an...@so...] > Sent: Monday, January 25, 2010 12:17 PM > To: Pratap, Abhishek > Cc: SAMtools Help List > Subject: Re: [Samtools-help] Inferring mate pair matching stats from SAM/BAm File Hi Abhi, > > I think the samtools view -f (flag filter) feature can help with this. Suggested commands: > 1. How many mate pairs mapped back to the reference (both reads). # View only reads that are properly paired, and count the number of lines/reads: samtools view -f 3 my.bam | wc -l # Divide that number by two > 2. In many mate pairs only one read mapped back to reference. # View only reads that are paired but mate is unmapped, excluding reads # that are themselves unmapped, and count: samtools view -f 9 -F 4 my.bam | wc -l > 3. How many mate pairs (both reads) did not map back to reference. Assuming that your aligner and/or SAM-converter includes reads, and sets both the unmapped and mate-unmapped flag bits: # View only reads that are paired, unmapped and with unmapped mate, and count: samtools view -f 13 my.bam | wc -l # Divide that number by two I have not tested these commands, so please double-check carefully that I used the correct flag values. > > Hope that helps, > Angie > > ----- "Abhishek Pratap" <AP...@so...> wrote: > > From: "Abhishek Pratap" <AP...@so...> > > To: "SAMtools Help List" <sam...@li...> > > Sent: Monday, January 25, 2010 7:53:39 AM GMT -08:00 US/Canada Pacific > > Subject: Re: [Samtools-help] Inferring mate pair matching stats from SAM/BAm File > > > > > Hi All I believe my last email went un-noticed over the weekend. I copy it again. It will be great if you help me. I get partially that these counts have to do with bitwise flag but not beyond it. Cheers, -Abhi > > From: Pratap, Abhishek [mailto:AP...@so...] > > Sent: Saturday, January 23, 2010 12:23 AM > > To: SAMtools Help List > > Subject: [Samtools-help] Inferring mate pair matching stats from SAM/BAm File Hi All Is there any existing method to find out from SAM/BAM file, a summary of the following.: 1. How many mate pairs mapped back to the reference (both reads). 2. In many mate pairs only one read mapped back to reference. 3. How many mate pairs (both reads) did not map back to reference. Thanks, -Abhi > > ------------------------------------------------------------------------------ Throughout its 18-year history, RSA Conference consistently attracts the world's best and brightest in the field, creating opportunities for Conference attendees to learn about information security's most important issues through interactions with peers, luminaries and emerging and established companies. http://p.sf.net/sfu/rsaconf-dev2dev > > _______________________________________________ Samtools-help mailing list Sam...@li... https://lists.sourceforge.net/lists/listinfo/samtools-help |
From: Pratap, A. <AP...@so...> - 2010-02-25 21:31:30
|
Hi Valeria I think that is how it should be. According to my understanding the BAI file is a index file and act as a look up table for bam file. The actual reads are still present in the bam file. May be others can explain it in more technical detail. -Abhi From: Vasta, Valeria [mailto:Val...@se...] Sent: Thursday, February 25, 2010 3:29 PM To: sam...@li... Subject: [Samtools-help] samtool index problem Hello I used samtols index and the output file BAI file is 50kb and only contains BAI£ The input file is sorted.bam and 1 Mb in size Thanks for the help Valeria CONFIDENTIALITY NOTICE: This e-mail message, including any attachments, is for the sole use of the intended recipient(s) and may contain confidential and privileged information protected by law. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply e-mail and destroy all copies of the original message. |
From: Pratap, A. <AP...@so...> - 2010-02-25 21:14:28
|
Few more questions related to same thread. 1. How do we know the #uniquely mapped reads in a bam file ? Will the following do the trick after subtracting from the total mapped reads ? samtools view -f 100 s_7_combined_sorted.bam | wc –l 2. What does the following mean from the samtools flagstat output. >33501178 with itself and mate mapped : mapped with itself ?? Thanks! -Abhi From: Pratap, Abhishek Sent: Tuesday, January 26, 2010 2:44 PM To: 'Angie Hinrichs' Subject: RE: [Samtools-help] Inferring mate pair matching stats from SAM/BAm File Hi Angie Looking at the filtered sam file more closely I see what you are saying is correct. In a normal samfile these pairs are together and I could visually spot them. Once filtered and sorted the output had these paired far apart which let me to question /2. Thanks! -Abhi From: Angie Hinrichs [mailto:an...@so...] Sent: Tuesday, January 26, 2010 12:37 PM To: Pratap, Abhishek Subject: Re: [Samtools-help] Inferring mate pair matching stats from SAM/BAm File Hi Abhishek, When a mate pair is properly mapped, there is one line for the left read and another line for the right read. samtools view will give you the number of reads; the number of properly paired mates should be half that. Angie ----- "Abhishek Pratap" <AP...@so...> wrote: > From: "Abhishek Pratap" <AP...@so...> > To: "Angie Hinrichs" <an...@so...> > Sent: Tuesday, January 26, 2010 9:16:30 AM GMT -08:00 US/Canada Pacific > Subject: RE: [Samtools-help] Inferring mate pair matching stats from SAM/BAm File > > > Morning Angie A quick question : # View only reads that are properly paired, and count the number of lines/reads: samtools view -f 3 my.bam | wc -l # Divide that number by two Why are we dividing by 2 here. I see each line as one mat pair .. right ? -A > From: Angie Hinrichs [mailto:an...@so...] > Sent: Monday, January 25, 2010 12:17 PM > To: Pratap, Abhishek > Cc: SAMtools Help List > Subject: Re: [Samtools-help] Inferring mate pair matching stats from SAM/BAm File Hi Abhi, > > I think the samtools view -f (flag filter) feature can help with this. Suggested commands: > 1. How many mate pairs mapped back to the reference (both reads). # View only reads that are properly paired, and count the number of lines/reads: samtools view -f 3 my.bam | wc -l # Divide that number by two > 2. In many mate pairs only one read mapped back to reference. # View only reads that are paired but mate is unmapped, excluding reads # that are themselves unmapped, and count: samtools view -f 9 -F 4 my.bam | wc -l > 3. How many mate pairs (both reads) did not map back to reference. Assuming that your aligner and/or SAM-converter includes reads, and sets both the unmapped and mate-unmapped flag bits: # View only reads that are paired, unmapped and with unmapped mate, and count: samtools view -f 13 my.bam | wc -l # Divide that number by two I have not tested these commands, so please double-check carefully that I used the correct flag values. > > Hope that helps, > Angie > > ----- "Abhishek Pratap" <AP...@so...> wrote: > > From: "Abhishek Pratap" <AP...@so...> > > To: "SAMtools Help List" <sam...@li...> > > Sent: Monday, January 25, 2010 7:53:39 AM GMT -08:00 US/Canada Pacific > > Subject: Re: [Samtools-help] Inferring mate pair matching stats from SAM/BAm File > > > > > Hi All I believe my last email went un-noticed over the weekend. I copy it again. It will be great if you help me. I get partially that these counts have to do with bitwise flag but not beyond it. Cheers, -Abhi > > From: Pratap, Abhishek [mailto:AP...@so...] > > Sent: Saturday, January 23, 2010 12:23 AM > > To: SAMtools Help List > > Subject: [Samtools-help] Inferring mate pair matching stats from SAM/BAm File Hi All Is there any existing method to find out from SAM/BAM file, a summary of the following.: 1. How many mate pairs mapped back to the reference (both reads). 2. In many mate pairs only one read mapped back to reference. 3. How many mate pairs (both reads) did not map back to reference. Thanks, -Abhi > > ------------------------------------------------------------------------------ Throughout its 18-year history, RSA Conference consistently attracts the world's best and brightest in the field, creating opportunities for Conference attendees to learn about information security's most important issues through interactions with peers, luminaries and emerging and established companies. http://p.sf.net/sfu/rsaconf-dev2dev > > _______________________________________________ Samtools-help mailing list Sam...@li... https://lists.sourceforge.net/lists/listinfo/samtools-help |
From: Vasta, V. <Val...@se...> - 2010-02-25 21:14:04
|
Hello I used samtols index and the output file BAI file is 50kb and only contains BAI£ The input file is sorted.bam and 1 Mb in size Thanks for the help Valeria CONFIDENTIALITY NOTICE: This e-mail message, including any attachments, is for the sole use of the intended recipient(s) and may contain confidential and privileged information protected by law. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply e-mail and destroy all copies of the original message. |
From: Alec W. <al...@br...> - 2010-02-25 20:59:07
|
Hi Jessica, You have plenty of RAM, so try increasing the MAX_RECORDS_IN_RAM parameter. We should provide better diagnostics so you can know what you need to set this value to. The other thing you can do, if you are running on a Unix box, is talk to your system administrator about increasing the number of file handles a process can have. -Alec Jessica Maia wrote: > Hi there, > > I'm encountering an error while using MergeSamFiles (picard-tools-1.14). This time my two different headers are in compatible order. One of the files to be merged is about 120G in size and I'm running the job with maximum heap size Xmx15g. > > My Log file: > [Wed Feb 24 23:15:29 EST 2010] net.sf.picard.sam.MergeSamFiles OUTPUT=out.bam ASSUME_SORTED=false MERGE_SEQUENCE_DICTIONARIES=true TMP_DIR=tmp VALIDATION_STRINGENCY=SILENT SORT_ORDER=coordinate VERBOSITY=INFO QUIET=false COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 > INFO 2010-02-24 23:15:30 MergeSamFiles Sorting input files using temp directory tmp > INFO 2010-02-24 23:17:15 MergeSamFiles 10000000 records read. > INFO 2010-02-24 23:19:03 MergeSamFiles 20000000 records read. > ... > INFO 2010-02-25 03:44:59 MergeSamFiles 1220000000 records read. > INFO 2010-02-25 03:47:06 MergeSamFiles 1230000000 records read. > INFO 2010-02-25 03:48:43 MergeSamFiles Finished reading inputs. > [Thu Feb 25 03:49:14 EST 2010] net.sf.picard.sam.MergeSamFiles done. > Runtime.totalMemory()=6646202368 > > Picard command: > java -jar -Xmx15g $picard_dir/MergeSamFiles.jar INPUT=$file1 INPUT=$file2 OUTPUT=$merged_file VALIDATION_STRINGENCY=SILENT ASSUME_SORTED=false MERGE_SEQUENCE_DICTIONARIES=true TMP_DIR=$dir > > Error message: > Exception in thread "main" net.sf.samtools.util.RuntimeIOException: java.io.FileNotFoundException:sortingcollection.918424441057879706.tmp (Too many open files) > at net.sf.samtools.util.SortingCollection$FileRecordIterator.<init>(SortingCollection.java:386) > at net.sf.samtools.util.SortingCollection$MergingIterator.<init>(SortingCollection.java:325) > at net.sf.samtools.util.SortingCollection.iterator(SortingCollection.java:221) > at net.sf.samtools.util.SortingCollection.iterator(SortingCollection.java:42) > at net.sf.samtools.SAMFileWriterImpl.close(SAMFileWriterImpl.java:170) > at net.sf.picard.sam.MergeSamFiles.doWork(MergeSamFiles.java:117) > at net.sf.picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:143) > at net.sf.picard.sam.MergeSamFiles.main(MergeSamFiles.java:68) > Caused by: java.io.FileNotFoundException: sortingcollection.918424441057879706.tmp (Too many open files) > at java.io.FileInputStream.open(Native Method) > at java.io.FileInputStream.<init>(Unknown Source) > at net.sf.samtools.util.SortingCollection$FileRecordIterator.<init>(SortingCollection.java:380) > ... 7 more > > > Any help is appreciated. > Thanks, > > Jessica > > ------------------------------------------------------------------------------ > Download Intel® Parallel Studio Eval > Try the new software tools for yourself. Speed compiling, find bugs > proactively, and fine-tune applications for parallel performance. > See why Intel Parallel Studio got high marks during beta. > http://p.sf.net/sfu/intel-sw-dev > _______________________________________________ > Samtools-help mailing list > Sam...@li... > https://lists.sourceforge.net/lists/listinfo/samtools-help > |
From: Joseph F. <jos...@gm...> - 2010-02-25 20:09:35
|
Here's a follow-up question: I'm using the varFilter script to help select SNPs / indels for subsequent validation, and of course most platforms cannot design primers for SNPs / indels that are too close to each other. So I've used some heavy window filtering to try to remove dense areas of (SNP/indel) ... with the following parameters: -W 60 -N 1 -l 60 ... assuming that the '-W' parameter (window size for filtering dense SNPs) and the '-N' parameter (max number of SNPs in a window) parameter work together to ensure that I get no SNPs when 2 or more are closer than 60bp apart. And, similarly, I'm assuming that the '-l' parameter (window size for filtering adjacent gaps) will filter indels in a similar way (though there's no analagous option for "max number of indels in a window"). With these parameters, I still get the following indel "passing filter" (being reported as output from the varFilter command): chr1 18222659 * -T/* 30 30 60 4 -T * 2 2 0 0 0 tview shows this region as follows: 18222611 18222621 18222631 18222641 18222651 18222661 18222671 18222681 18222691 AAAGGTGTTCCTGCCTTCAGAACCAGTGAATGTTAGTCTGTGAAGAGTCCCTTGTGCTTTGTGCTTCTGTTTCCCTGGGGAAATTGTTTTA .................S...................................RY.................................... AAAGGTGTTCCTGCCTTCAGAACCAGTGAATGTTAGTCTGTGAAGAGTCCCTTGTGCTTTGTGCTTCTGTTTCCCTGGGGAAATTGTTTTA AAAGGTGTTCCTGCCTTCAGAACCAGTGAATGTTAGTCTGTGAAGAGTCCCTTGTGCTTTGTGCTTCTGTTTCCCTGGGGAAATTGTTTTA AAAGGTGTTCCTGCCTTGAGAACCAGTGAATGTTAGTCTGTGAAGAGTCCC*TACGCTTTGTGCTTCTGTTTCCCTGGGGAAATTGTTTTA AAAGGTGTTCCTGCCTTGAGAACCAGTGAATGTTAGTCTGTGAAGAGTCCC*TACGCTTTGTGCTTCTGTTTCCCTGGGGAAATTGTTTTA So, it seems as if the SNPs at 18,222,626 and 18,222,662-3 have been filtered out, which makes sense because they're within 60bp of each other. However, the indel at 18,222,659 still comes through. Is this happening because varFilter is not cross-checking against SNPs when filtering indels? This would be strange, since it does seem to cross-check against indels when filtering SNPs (via the '-G' and '-w' options) ... or am I missing something? ~Joe On Tue, Feb 23, 2010 at 4:27 PM, Joseph Fass <jos...@gm...> wrote: > I'm not sure I'm understanding the samtools.py varFilter script, and it's > options. > > What is the meaning of the '-G' option ... "min indel score for nearby SNP > filtering"? Is the "indel score" the 6th column in the pileup format -- the > "SNP probability" (or indel probability, in this case)? And if so, does > this mean that SNPs within '-w' bases of an indel will be filtered out > unless they have a score / probability greater that the '-G' parameter? > > ~Joe > > > > > > -- > Joseph Fass > Bioinformatics Programmer > UC Davis Bioinformatics Core > joseph.fass -at- gmail.com (professional) > 970.227.5928 (c) || 530.752.2698 (w) > -- Joseph Fass Bioinformatics Programmer UC Davis Bioinformatics Core joseph.fass -at- gmail.com (professional) 970.227.5928 (c) || 530.752.2698 (w) |
From: Shobha P. <sh...@gm...> - 2010-02-25 17:51:27
|
Is there any way to convert a pair align file (pairwise alignment of reference sequence and read sequence) obtained from 454 to SAM format? Thanks, Shobha. |
From: Dan B. <dan...@gm...> - 2010-02-25 17:27:00
|
Hi, Firstly, were can I go to complain about a lack of a TopHat mailing list? (Sorry!) There is a generic Sequence Search Mailing List that could be used: http://www.bioinformatics.org/mailman/listinfo/ssml-general OK, in TopHat, does anyone know how I can find the number of reads that map to exon boundaries? Bowtie outputs the number of reads mapped, and I'd like to compare that result to the number of reads mapped in TopHat (the difference being additional reads that presumably span exons and can only be mapped by TopHat). I see several logs that seem to be picking up the set of initially unmapped reads, but I don't know which log is giving me the final or 'correct' figure for the number of additionally mapped reads. $ cat prep_reads.log 48841 out of 11444528 reads have been filtered out (leaving 11395687 reads) $ cat fileUCPhfM.log # reads processed: 11395687 # reads with at least one reported alignment: 7621277 (66.88%) # reads that failed to align: 3771791 (33.10%) # reads with alignments suppressed due to -m: 2619 (0.02%) Reported 12236386 alignments to 1 output stream(s) $ cat fileEUB4FA.log # reads processed: 3771791 # reads with at least one reported alignment: 3244164 (86.01%) # reads that failed to align: 495245 (13.13%) # reads with alignments suppressed due to -m: 32382 (0.86%) Reported 9043018 alignments to 1 output stream(s) ... $ cat fileTVal6y.log # reads processed: 3771791 # reads with at least one reported alignment: 1589994 (42.15%) # reads that failed to align: 2021895 (53.61%) # reads with alignments suppressed due to -m: 159902 (4.24%) Reported 8359523 alignments to 1 output stream(s) Actually... $ wc -l accepted_hits.sam 12879468 accepted_hits.sam i.e. 643,082 additional reads over those originally mapped... Does that seem correct? There is no figure like this in any of the logs. Sorry, I'm totally confused! Thanks for any tips, Dan. |
From: Jessica M. <jm...@du...> - 2010-02-25 14:39:19
|
Hi there, I'm encountering an error while using MergeSamFiles (picard-tools-1.14). This time my two different headers are in compatible order. One of the files to be merged is about 120G in size and I'm running the job with maximum heap size Xmx15g. My Log file: [Wed Feb 24 23:15:29 EST 2010] net.sf.picard.sam.MergeSamFiles OUTPUT=out.bam ASSUME_SORTED=false MERGE_SEQUENCE_DICTIONARIES=true TMP_DIR=tmp VALIDATION_STRINGENCY=SILENT SORT_ORDER=coordinate VERBOSITY=INFO QUIET=false COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 INFO 2010-02-24 23:15:30 MergeSamFiles Sorting input files using temp directory tmp INFO 2010-02-24 23:17:15 MergeSamFiles 10000000 records read. INFO 2010-02-24 23:19:03 MergeSamFiles 20000000 records read. ... INFO 2010-02-25 03:44:59 MergeSamFiles 1220000000 records read. INFO 2010-02-25 03:47:06 MergeSamFiles 1230000000 records read. INFO 2010-02-25 03:48:43 MergeSamFiles Finished reading inputs. [Thu Feb 25 03:49:14 EST 2010] net.sf.picard.sam.MergeSamFiles done. Runtime.totalMemory()=6646202368 Picard command: java -jar -Xmx15g $picard_dir/MergeSamFiles.jar INPUT=$file1 INPUT=$file2 OUTPUT=$merged_file VALIDATION_STRINGENCY=SILENT ASSUME_SORTED=false MERGE_SEQUENCE_DICTIONARIES=true TMP_DIR=$dir Error message: Exception in thread "main" net.sf.samtools.util.RuntimeIOException: java.io.FileNotFoundException:sortingcollection.918424441057879706.tmp (Too many open files) at net.sf.samtools.util.SortingCollection$FileRecordIterator.<init>(SortingCollection.java:386) at net.sf.samtools.util.SortingCollection$MergingIterator.<init>(SortingCollection.java:325) at net.sf.samtools.util.SortingCollection.iterator(SortingCollection.java:221) at net.sf.samtools.util.SortingCollection.iterator(SortingCollection.java:42) at net.sf.samtools.SAMFileWriterImpl.close(SAMFileWriterImpl.java:170) at net.sf.picard.sam.MergeSamFiles.doWork(MergeSamFiles.java:117) at net.sf.picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:143) at net.sf.picard.sam.MergeSamFiles.main(MergeSamFiles.java:68) Caused by: java.io.FileNotFoundException: sortingcollection.918424441057879706.tmp (Too many open files) at java.io.FileInputStream.open(Native Method) at java.io.FileInputStream.<init>(Unknown Source) at net.sf.samtools.util.SortingCollection$FileRecordIterator.<init>(SortingCollection.java:380) ... 7 more Any help is appreciated. Thanks, Jessica |
From: Sendu B. <sb...@sa...> - 2010-02-25 10:54:29
|
On 25/02/2010 10:40, João Fadista wrote: > Hi, > > I was wondering if any experienced this problem. I mapped some > genomic Illumina reads using BWA and SOAP2 and then exported the > alignment files to sam format to be able to use samtools to call > indels and SNPs. Although both BWA and SOAP2 called more or less the > same number of SNPs, SOAP2 did not call any indels. > > Does anyone knows what might be the problem? In my experience, soap is simply unable to map (correctly) any reads that contain insertions > 4bp, and has a ~0.1 sensitivity for reads with deltions in them. For small indels bwa has sensitivity of ~0.9 for both insertions and deletions. Without an alignment, an indel caller isn't going to be able to call anything. -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. |
From: João F. <Joa...@ag...> - 2010-02-25 10:43:30
|
Hi, I was wondering if any experienced this problem. I mapped some genomic Illumina reads using BWA and SOAP2 and then exported the alignment files to sam format to be able to use samtools to call indels and SNPs. Although both BWA and SOAP2 called more or less the same number of SNPs, SOAP2 did not call any indels. Does anyone knows what might be the problem? Best regards, Joao |
From: Saunders, C. <csa...@il...> - 2010-02-25 05:14:51
|
Hi Heng, Yes, the older 'quality-64' values can be negative. This is the reason that an offset of 64 (rather than 33) is used for conversion to ascii. Chris > -----Original Message----- > From: Heng Li [mailto:lh...@sa...] > Sent: Wednesday, February 24, 2010 20:52 > To: Saunders, Chris > Cc: Wes Barris; Sam...@li... help > Subject: Re: [Samtools-help] Updated export2sam.pl > > I think 'quality-64' of a read produced by a pipeline prior to 1.3 can > be a negative number. Is it true? If so, one may tell the version of > pipeline by scanning the whole read file. > > Heng > > On Wed, Feb 24, 2010 at 08:36:16PM -0800, Saunders, Chris wrote: > > Hi Wes, > > > > Your documentation states that your quality values are 'Phred+64' > (i.e. > > produced by Pipeline 1.3+). > > > > Best, > > > > Chris > > > > > > > > > -----Original Message----- > > > From: Wes Barris [mailto:wes...@cs...] > > > Sent: Wednesday, February 24, 2010 19:23 > > > To: Saunders, Chris > > > Cc: Sam...@li... help > > > Subject: Re: [Samtools-help] Updated export2sam.pl > > > > > > Saunders, Chris wrote: > > > > > > > (3) The quality score conversion has been updated to assume that > the > > > > export file contains Phred+64 values by default. A command line > > > option, > > > > "--qlogodds", has been added to the script to restore the > previous > > > > conversion from 'Solexa'+64 quality values as produced by the > > > Illumina > > > > pipeline prior to version 1.3. > > > > > > Hi Chris, > > > > > > Thanks for providing an update of your script. I had been working > > > with the old one with our data from Illumina and I have a question > > > about the quality score conversion. > > > > > > Our data from Illumina does not explicitly state the version of the > > > pipeline used to create it. How do I know whether it is pre or > post > > > version 1.3? The documentation we received with the data states > this: > > > > > > ------------------------- > > > Quality Score for base [1 through n]-Quality scores corresponding > to > > > each of the > > > delivered bases in the sequence. Scores are based on ASCII format, > and > > > can be > > > converted to numeric scores with the following formula: ASCII = > > > 64+10*log10(1/p), p > > > being the estimated probability of the base being in error. The > scores > > > are similar to > > > Phred scores, which are calculated ASCII = 33+10*log10(1/p). This > > > change was made > > > as of Pipeline 1.3, so quality scores calculated with earlier > versions > > > of the pipeline > > > should not be compared with quality scores generated by Pipeline > 1.3 > > > because they use > > > different scales. > > > ------------------------- > > > > > > Is there a way to distinguish between pre 1.3 and post 1.3 quality > > > scores > > > in the export files? > > > -- > > > Wes Barris <wes...@cs...> > > > > --------------------------------------------------------------------- > --------- > > Download Intel® Parallel Studio Eval > > Try the new software tools for yourself. Speed compiling, find bugs > > proactively, and fine-tune applications for parallel performance. > > See why Intel Parallel Studio got high marks during beta. > > http://p.sf.net/sfu/intel-sw-dev > > _______________________________________________ > > Samtools-help mailing list > > Sam...@li... > > https://lists.sourceforge.net/lists/listinfo/samtools-help > > > -- > The Wellcome Trust Sanger Institute is operated by Genome Research > Limited, a charity registered in England with number 1021457 and a > company registered in England with number 2742969, whose registered > office is 215 Euston Road, London, NW1 2BE. |
From: Heng Li <lh...@sa...> - 2010-02-25 04:52:36
|
I think 'quality-64' of a read produced by a pipeline prior to 1.3 can be a negative number. Is it true? If so, one may tell the version of pipeline by scanning the whole read file. Heng On Wed, Feb 24, 2010 at 08:36:16PM -0800, Saunders, Chris wrote: > Hi Wes, > > Your documentation states that your quality values are 'Phred+64' (i.e. > produced by Pipeline 1.3+). > > Best, > > Chris > > > > > -----Original Message----- > > From: Wes Barris [mailto:wes...@cs...] > > Sent: Wednesday, February 24, 2010 19:23 > > To: Saunders, Chris > > Cc: Sam...@li... help > > Subject: Re: [Samtools-help] Updated export2sam.pl > > > > Saunders, Chris wrote: > > > > > (3) The quality score conversion has been updated to assume that the > > > export file contains Phred+64 values by default. A command line > > option, > > > "--qlogodds", has been added to the script to restore the previous > > > conversion from 'Solexa'+64 quality values as produced by the > > Illumina > > > pipeline prior to version 1.3. > > > > Hi Chris, > > > > Thanks for providing an update of your script. I had been working > > with the old one with our data from Illumina and I have a question > > about the quality score conversion. > > > > Our data from Illumina does not explicitly state the version of the > > pipeline used to create it. How do I know whether it is pre or post > > version 1.3? The documentation we received with the data states this: > > > > ------------------------- > > Quality Score for base [1 through n]-Quality scores corresponding to > > each of the > > delivered bases in the sequence. Scores are based on ASCII format, and > > can be > > converted to numeric scores with the following formula: ASCII = > > 64+10*log10(1/p), p > > being the estimated probability of the base being in error. The scores > > are similar to > > Phred scores, which are calculated ASCII = 33+10*log10(1/p). This > > change was made > > as of Pipeline 1.3, so quality scores calculated with earlier versions > > of the pipeline > > should not be compared with quality scores generated by Pipeline 1.3 > > because they use > > different scales. > > ------------------------- > > > > Is there a way to distinguish between pre 1.3 and post 1.3 quality > > scores > > in the export files? > > -- > > Wes Barris <wes...@cs...> > > ------------------------------------------------------------------------------ > Download Intel® Parallel Studio Eval > Try the new software tools for yourself. Speed compiling, find bugs > proactively, and fine-tune applications for parallel performance. > See why Intel Parallel Studio got high marks during beta. > http://p.sf.net/sfu/intel-sw-dev > _______________________________________________ > Samtools-help mailing list > Sam...@li... > https://lists.sourceforge.net/lists/listinfo/samtools-help -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. |
From: Saunders, C. <csa...@il...> - 2010-02-25 04:36:24
|
Hi Wes, Your documentation states that your quality values are 'Phred+64' (i.e. produced by Pipeline 1.3+). Best, Chris > -----Original Message----- > From: Wes Barris [mailto:wes...@cs...] > Sent: Wednesday, February 24, 2010 19:23 > To: Saunders, Chris > Cc: Sam...@li... help > Subject: Re: [Samtools-help] Updated export2sam.pl > > Saunders, Chris wrote: > > > (3) The quality score conversion has been updated to assume that the > > export file contains Phred+64 values by default. A command line > option, > > "--qlogodds", has been added to the script to restore the previous > > conversion from 'Solexa'+64 quality values as produced by the > Illumina > > pipeline prior to version 1.3. > > Hi Chris, > > Thanks for providing an update of your script. I had been working > with the old one with our data from Illumina and I have a question > about the quality score conversion. > > Our data from Illumina does not explicitly state the version of the > pipeline used to create it. How do I know whether it is pre or post > version 1.3? The documentation we received with the data states this: > > ------------------------- > Quality Score for base [1 through n]-Quality scores corresponding to > each of the > delivered bases in the sequence. Scores are based on ASCII format, and > can be > converted to numeric scores with the following formula: ASCII = > 64+10*log10(1/p), p > being the estimated probability of the base being in error. The scores > are similar to > Phred scores, which are calculated ASCII = 33+10*log10(1/p). This > change was made > as of Pipeline 1.3, so quality scores calculated with earlier versions > of the pipeline > should not be compared with quality scores generated by Pipeline 1.3 > because they use > different scales. > ------------------------- > > Is there a way to distinguish between pre 1.3 and post 1.3 quality > scores > in the export files? > -- > Wes Barris <wes...@cs...> |
From: Wes B. <wes...@cs...> - 2010-02-25 03:23:31
|
Saunders, Chris wrote: > (3) The quality score conversion has been updated to assume that the > export file contains Phred+64 values by default. A command line option, > "--qlogodds", has been added to the script to restore the previous > conversion from 'Solexa'+64 quality values as produced by the Illumina > pipeline prior to version 1.3. Hi Chris, Thanks for providing an update of your script. I had been working with the old one with our data from Illumina and I have a question about the quality score conversion. Our data from Illumina does not explicitly state the version of the pipeline used to create it. How do I know whether it is pre or post version 1.3? The documentation we received with the data states this: ------------------------- Quality Score for base [1 through n]—Quality scores corresponding to each of the delivered bases in the sequence. Scores are based on ASCII format, and can be converted to numeric scores with the following formula: ASCII = 64+10*log10(1/p), p being the estimated probability of the base being in error. The scores are similar to Phred scores, which are calculated ASCII = 33+10*log10(1/p). This change was made as of Pipeline 1.3, so quality scores calculated with earlier versions of the pipeline should not be compared with quality scores generated by Pipeline 1.3 because they use different scales. ------------------------- Is there a way to distinguish between pre 1.3 and post 1.3 quality scores in the export files? -- Wes Barris <wes...@cs...> |
From: Alec W. <al...@br...> - 2010-02-25 03:16:55
|
Hi Kelly, No, I'm not suggesting that the SAM spec is or should change with respect to validation for the PI field in read group. Rather, an invalid value for PI does not make the rest of the file unparseable, so ValidateSamFile should report it as an error and then keep going. In other Picard programs, if validation stringency is not strict, an invalid PI value should just result in PI being unset on the read group. -Alec Westbrooks, Kelly wrote: > Hi Alec, > > >> It shouldn't throw an exception. It should just report the error. I will fix this, but I probably won't get to it until next week. >> > > Does this mean that the SAM spec is widening the data type for the PI read group attribute from a single integer to an arbitrary string? Actually, the spec is kind of vague on the question. Can you shed some light on exactly what the datatype of the PI should be, from the perspective of the SAM specification? > > Thanks - you've been very helpful to us! > > Kelly Westbrooks, PhD > Senior Bioinformatics Scientist > Life Technologies > Foster City, CA > > ________________________________________ > From: Pinal Kanabar [pka...@gm...] > Sent: Wednesday, February 24, 2010 5:44 PM > To: Alec Wysoker > Subject: Re: Picard : ValidateSamFile.jar > > Thanks a lot Alec. > > It would really help. Hopefully it would be early next week so I don't have to redo lot of things in terms of CPU time. > > Please will you send me an email when its done. > > Thanks > > Regards > -Pinal > > > > On Wed, Feb 24, 2010 at 5:27 PM, Alec Wysoker <al...@br...<mailto:al...@br...>> wrote: > Hi Pinal, > > It shouldn't throw an exception. It should just report the error. I will fix this, but I probably won't get to it until next week. > > -Alec > > > Pinal Kanabar wrote: > Hi Alec, > > For SOLiD -- when I convert from GFF --> SAM > > In PI tag I put range for e.g. : 1700-2100 > > When I use ValidateSamFile.jar : > > Exception in thread "main" net.sf.samtools.SAMFormatException: PI is not numeric: 1700-2100 > at net.sf.samtools.SAMTextHeaderCodec.parseRGLine(SAMTextHeaderCodec.java:173) > > I understand that it requires a numeric value. But is there a way around where I specify range rather than the median size. > > Thanks > > -Pinal > > |
From: Heng Li <lh...@sa...> - 2010-02-25 02:16:39
|
Chris from Illumina has updated export2sam.pl to 2.0 and sent the message to the samtools-devel list. I have also committed his new version to SVN: http://samtools.svn.sourceforge.net/viewvc/samtools/trunk/samtools/misc/export2sam.pl?revision=537 It would be good to try the latest script as this is maintained by the vendor and will become the official convertor. Thank you, Heng ---- Being Forwarded ---- Hello, I'm writing to submit an Illumina update of the 'export2sam.pl' script to the SAMtools project developers. Please find the updated script attached to this e-mail. I will outline the major changes made from the version of this script distributed with samtools version 0.1.7a below. Note that the script itself contains in its header a thorough list of changes as well as a summary of certain export to SAM conversion limitations so please be sure to take a glance at this for additional information. We have given the updated script a version number of "2.0.0" to clearly delineate it from the previous version, but this is of course just a suggestion. The most important script changes are as follows: (1) The SAM CIGAR string is now updated to reflect gapped alignments in the export record. Note that we consider this change to be very important for the accuracy of any downstream analyses given that ELAND has been changed to produce gapped alignments by default in its current version. (2) The ELAND alignment score handling has changed: the ELAND single-read alignment score is always stored in the optional SAM "SM" field and, if present, the ELAND paired-read alignment score is stored in the optional SAM "AS" field. The SAM MAPQ field is set equal to min(254,max(SM,AS)). Note that this change considerably improves the representation of mapping scores in converted SAM files, because (a) ELAND frequently produces mapping scores higher than 254 and (b) the MAPQ value undergoes the transformation' MAPQ <- MAPQ mod 256' during conversion from SAM to BAM (using 'samtools view' as of SAMtools version 0.1.7a), causing reads with very good ELAND mapping scores to possibly be given low MAPQ values. (3) The quality score conversion has been updated to assume that the export file contains Phred+64 values by default. A command line option, "--qlogodds", has been added to the script to restore the previous conversion from 'Solexa'+64 quality values as produced by the Illumina pipeline prior to version 1.3. (4) The "export match descriptor" is now reverse-complemented when necessary such that it always corresponds to the forward strand of the reference, to be consistent with other information in the SAM record. It is now written to the optional 'XD' field (rather than 'MD') to acknowledge its minor differences from the "SAMtools match descriptor". Note that the differences in the two match descriptors are: (a) that indels are closed with the '$' character in the export match descriptor and (b) that insertions are explicitly included in the export match descriptor, whereas in the SAM record the match descriptor can be written assuming the presence of the CIGAR string to handle this information. For example, an export match descriptor for a 50 base sequence containing a small insertion is: "24^2$24". Changing the name of the tag used for the export match descriptor to 'XD' is intended to relieve users of some potentially troublesome complexity and elide any incompatibility with the match descriptor definition given in the SAM/BAM spec. A few other minor points are noted in the script itself. Also note that the usage has been changed such that read1 and read2 are no longer specified by positional arguments, and consistency checks have been added to the code to ensure that export file pairs contain corresponding read pairs in each line. The runtime usage is as follows: """ export2sam.pl converts GERALD export files to SAM format. Usage: export2sam.pl --read1=3DFILENAME [ options ] | --version | --help --read1=3DFILENAME read1 export file (mandatory) --read2=3DFILENAME read2 export file --nofilter include reads that failed the pipeline/RTA purity filter --qlogodds assume export file(s) use logodds quality values as reported by pipeline prior to v1.3 (default: phred quality values) """ These script modifications should noticeably improve the SAM representation of sequence alignments produced by the current version of ELAND and thus improve the accuracy of downstream biological inference for users following this workflow. I'm happy to answer any questions about the export2sam.pl changes and about any possible incorporation of these or similar changes back into SAMtools. Thank you all, Chris Christopher T. Saunders, Ph.D. Bioinformatics Scientist II Illumina, Inc. 9885 Towne Centre Drive San Diego, CA 92121-1975 csa...@il... <mailto:jl...@il...>=20 www.illumina.com On Wed, Feb 24, 2010 at 08:24:24PM -0500, Peter Chines wrote: > Here it is, Wes. I've incorporated your earlier changes, except the one > that modified the reference (chr) names. The end result could use a > little further cleaning up, e.g. normalizing indentation, but I decided > to avoid making unnecessary changes to make it easier to see the > differences against the original. > > Many thanks to Heng for the original conversion script. > > --Peter > > > On Tue, Feb 23, 2010 at 10:04:40PM -0500, Wes Barris wrote: > > Peter, can I get a copy of your modified export2sam.pl file so that > > I can at least convert export files into sam files with proper > > MD tags? > ------------------------------------------------------------------------------ > Download Intel® Parallel Studio Eval > Try the new software tools for yourself. Speed compiling, find bugs > proactively, and fine-tune applications for parallel performance. > See why Intel Parallel Studio got high marks during beta. > http://p.sf.net/sfu/intel-sw-dev > _______________________________________________ > Samtools-help mailing list > Sam...@li... > https://lists.sourceforge.net/lists/listinfo/samtools-help -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. |
From: Saunders, C. <csa...@il...> - 2010-02-25 02:13:14
|
<<export2sam.pl>> [forwarding this from the samtools-devel list...] Hello, I'm writing to submit an Illumina update of the 'export2sam.pl' script to the SAMtools project developers. Please find the updated script attached to this e-mail. I will outline the major changes made from the version of this script distributed with samtools version 0.1.7a below. Note that the script itself contains in its header a thorough list of changes as well as a summary of certain export to SAM conversion limitations so please be sure to take a glance at this for additional information. We have given the updated script a version number of "2.0.0" to clearly delineate it from the previous version, but this is of course just a suggestion. The most important script changes are as follows: (1) The SAM CIGAR string is now updated to reflect gapped alignments in the export record. Note that we consider this change to be very important for the accuracy of any downstream analyses given that ELAND has been changed to produce gapped alignments by default in its current version. (2) The ELAND alignment score handling has changed: the ELAND single-read alignment score is always stored in the optional SAM "SM" field and, if present, the ELAND paired-read alignment score is stored in the optional SAM "AS" field. The SAM MAPQ field is set equal to min(254,max(SM,AS)). Note that this change considerably improves the representation of mapping scores in converted SAM files, because (a) ELAND frequently produces mapping scores higher than 254 and (b) the MAPQ value undergoes the transformation' MAPQ <- MAPQ mod 256' during conversion from SAM to BAM (using 'samtools view' as of SAMtools version 0.1.7a), causing reads with very good ELAND mapping scores to possibly be given low MAPQ values. (3) The quality score conversion has been updated to assume that the export file contains Phred+64 values by default. A command line option, "--qlogodds", has been added to the script to restore the previous conversion from 'Solexa'+64 quality values as produced by the Illumina pipeline prior to version 1.3. (4) The "export match descriptor" is now reverse-complemented when necessary such that it always corresponds to the forward strand of the reference, to be consistent with other information in the SAM record. It is now written to the optional 'XD' field (rather than 'MD') to acknowledge its minor differences from the "SAMtools match descriptor". Note that the differences in the two match descriptors are: (a) that indels are closed with the '$' character in the export match descriptor and (b) that insertions are explicitly included in the export match descriptor, whereas in the SAM record the match descriptor can be written assuming the presence of the CIGAR string to handle this information. For example, an export match descriptor for a 50 base sequence containing a small insertion is: "24^2$24". Changing the name of the tag used for the export match descriptor to 'XD' is intended to relieve users of some potentially troublesome complexity and elide any incompatibility with the match descriptor definition given in the SAM/BAM spec. A few other minor points are noted in the script itself. Also note that the usage has been changed such that read1 and read2 are no longer specified by positional arguments, and consistency checks have been added to the code to ensure that export file pairs contain corresponding read pairs in each line. The runtime usage is as follows: """ export2sam.pl converts GERALD export files to SAM format. Usage: export2sam.pl --read1=FILENAME [ options ] | --version | --help --read1=FILENAME read1 export file (mandatory) --read2=FILENAME read2 export file --nofilter include reads that failed the pipeline/RTA purity filter --qlogodds assume export file(s) use logodds quality values as reported by pipeline prior to v1.3 (default: phred quality values) """ These script modifications should noticeably improve the SAM representation of sequence alignments produced by the current version of ELAND and thus improve the accuracy of downstream biological inference for users following this workflow. I'm happy to answer any questions about the export2sam.pl changes and about any possible incorporation of these or similar changes back into SAMtools. Thank you all, Chris Christopher T. Saunders, Ph.D. Bioinformatics Scientist II Illumina, Inc. 9885 Towne Centre Drive San Diego, CA 92121-1975 csa...@il... www.illumina.com |
From: Alec W. <al...@br...> - 2010-02-25 01:29:43
|
Hi Klaudia, Although the SAM spec does allow the info about which end is which to be omitted, we (Picard development team) feel that this is bad practice and we don't want to encourage it. Can you write a perl or python script to arbitrarily assign ends to be first or second of pair? -Alec Klaudia Walter wrote: > Hi all, > > I found the following flags paired up 17 with 33 and 19 with 35, which > do not contain the information whether they are the first or the > second mate, if I understand that correctly. > > 1st Example: > > SRR003669.14280418 17 1 1024770 78 51M = > 1025080 259 > TTTGGTCTGTTGTTCTAAGAATCGGAGAGAGAGAGGTTAAAATCTCCGACT > :;7::;9<98=:=0:=@A>=>26:9B=A<B?A5B:<?;A???:=994;6:C > RG:Z:SRR003669 MF:i:4 Aq:i:53 NM:i:3 UQ:i:72 H0:i:1 > H1:i:0 > > SRR003669.14280418 33 1 1025080 53 51M = > 1024770 -259 > TGGTCTATTGTTCTAAGAATCGGAGAGAGAGAGGTTAAAATCTCCAACTAT > C99==@=??8>@;A;?=9@><6=1=9>;=<<8>=9@40A9A8>@6:>9?48 > RG:Z:SRR003669 MF:i:4 Aq:i:53 NM:i:1 UQ:i:29 H0:i:0 > H1:i:1 > > > 2nd Example: > > SRR003667.10102516 35 1 1102933 99 51M = > 1103095 213 > GTCAGTACTTTAGAGGATCCCCTTCCCCAGCAGGAATCCTGGGTGCTGAGG > 3';-;+<35,;==;./<@8/;542864:901>*/5736))4*-)).*-/2) > RG:Z:SRR003667 MF:i:18 Aq:i:57 NM:i:0 UQ:i:0 H0:i:1 > H1:i:0 > > SRR003667.10102516 19 1 1103095 99 51M = > 1102933 -213 > GGGGAGGGGTCTCAGGGCTCCTGACTTCTTCCATTCTTGCCCAGCCCACCC > 40*595,+4><;<>69,?96=@0;<@=<><;<A>:><><<;>A;::<;96@ > RG:Z:SRR003667 MF:i:18 Aq:i:57 NM:i:0 UQ:i:0 H0:i:1 > H1:i:0 > > I am not sure in which circumstances these flags are set. As a > solution for the SamToFastq tool, could not the mate with the smaller > chromosomal position be allocated as the first mate and the other one > as the second mate? > > Thanks, > Klaudia > > > On 23 Feb 2010, at 09:57, John Marshall wrote: > >> On 22 Feb 2010, at 22:56, Alec Wysoker wrote: >>> It looks like there is something strange with your input SAM file. It >>> appears that it does actually contain paired reads, but there are two >>> reads with the same name that are either both marked as being the first >>> of the pair or both marked as being the second of the pair. >> >> I wonder whether Klaudia's input file contains non-primary >> alignments. I guess tools like SamToFastq need to allow for reads >> appearing in more than one SAM alignment record -- hopefully this >> could be as simple as ignoring non-primary records, though the hint >> about split hits in the flag field description suggests that might >> not be quite good enough. >> >> John > |
From: Peter C. <pc...@ma...> - 2010-02-25 01:24:33
|
Here it is, Wes. I've incorporated your earlier changes, except the one that modified the reference (chr) names. The end result could use a little further cleaning up, e.g. normalizing indentation, but I decided to avoid making unnecessary changes to make it easier to see the differences against the original. Many thanks to Heng for the original conversion script. --Peter On Tue, Feb 23, 2010 at 10:04:40PM -0500, Wes Barris wrote: > Peter, can I get a copy of your modified export2sam.pl file so that > I can at least convert export files into sam files with proper > MD tags? |
From: Bret H. <ja...@ga...> - 2010-02-24 23:17:31
|
I get the same results with the most recent version from svn, Version: 0.1.7-6 (r530). Are these records in violation of the spec, or is it just a matter of patching the code? Thanks, -bret On Wed, Feb 24, 2010 at 05:16:56PM -0500, Heng Li wrote: > This is probably because your alignment has CIGAR like "10M2D2I10M" or > "10I10M10D". Tview (probably pileup as well) does not work in this case. > > Heng > > On Wed, Feb 24, 2010 at 02:49:36PM -0700, Bret Harry wrote: > > Hello, > > > > I got a "samtools: bam_lpileup.c:116: tview_func: Assertion `l == tv->n_pre' failed." error using samtools 0.1.7 (r510). I see in previous threads where people have provided the offending read for debugging purposes. The BAM file in question is ~170 GB, does anybody know an efficient way for finding the record that is triggering the assertion? > > > > Thanks, > > -bret > > > > ------------------------------------------------------------------------------ > > Download Intel® Parallel Studio Eval > > Try the new software tools for yourself. Speed compiling, find bugs > > proactively, and fine-tune applications for parallel performance. > > See why Intel Parallel Studio got high marks during beta. > > http://p.sf.net/sfu/intel-sw-dev > > _______________________________________________ > > Samtools-help mailing list > > Sam...@li... > > https://lists.sourceforge.net/lists/listinfo/samtools-help > > > -- > The Wellcome Trust Sanger Institute is operated by Genome Research > Limited, a charity registered in England with number 1021457 and a > company registered in England with number 2742969, whose registered > office is 215 Euston Road, London, NW1 2BE. |
From: Heng Li <lh...@sa...> - 2010-02-24 22:17:06
|
This is probably because your alignment has CIGAR like "10M2D2I10M" or "10I10M10D". Tview (probably pileup as well) does not work in this case. Heng On Wed, Feb 24, 2010 at 02:49:36PM -0700, Bret Harry wrote: > Hello, > > I got a "samtools: bam_lpileup.c:116: tview_func: Assertion `l == tv->n_pre' failed." error using samtools 0.1.7 (r510). I see in previous threads where people have provided the offending read for debugging purposes. The BAM file in question is ~170 GB, does anybody know an efficient way for finding the record that is triggering the assertion? > > Thanks, > -bret > > ------------------------------------------------------------------------------ > Download Intel® Parallel Studio Eval > Try the new software tools for yourself. Speed compiling, find bugs > proactively, and fine-tune applications for parallel performance. > See why Intel Parallel Studio got high marks during beta. > http://p.sf.net/sfu/intel-sw-dev > _______________________________________________ > Samtools-help mailing list > Sam...@li... > https://lists.sourceforge.net/lists/listinfo/samtools-help -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. |
From: Bret H. <ja...@ga...> - 2010-02-24 21:49:45
|
Hello, I got a "samtools: bam_lpileup.c:116: tview_func: Assertion `l == tv->n_pre' failed." error using samtools 0.1.7 (r510). I see in previous threads where people have provided the offending read for debugging purposes. The BAM file in question is ~170 GB, does anybody know an efficient way for finding the record that is triggering the assertion? Thanks, -bret |
From: João F. <Joa...@ag...> - 2010-02-24 11:03:11
|
Hi Wes, Thanks. I was not aware that we could detect that reading column 9. Regards João Fadista -----Original Message----- From: Wes Barris [mailto:wes...@cs...] Sent: Wednesday, February 24, 2010 2:59 AM To: João Fadista Cc: sam...@li... Subject: Re: [Samtools-help] number of aligned reads João Fadista wrote: > Hi Wes, > > I would like to have not only a column consisting of the number of reads covering the position of a SNP, but also another column specifying how many of those reads actually support the SNP. I believe this last column would be of significant importance for further filtering the SNP data. Hi Joao, It is still not clear what you mean by "another column specifying how many of those reads actually support the SNP". Everything you need (except for nearby SNPs) is there. Consider this line: BTA14 24973610 T W 32 32 57 9 ..A,A..,a :@>><BC@B From column 9 I can see that 4 reads match the base of the reference on the forward strand. 2 reads match the base of the reference on the reverse strand. 2 reads have an alternate allele (A) at that position on the forward strand. 1 read has the same alternate allele (a) on the reverse strand. If this is not what you want please explain what you mean by "support". > Best regards, > João > ________________________________________ > From: Wes Barris [wes...@cs...] > Sent: Tuesday, February 23, 2010 12:49 AM > To: João Fadista > Cc: zen lu (RI); sam...@li... > Subject: Re: [Samtools-help] number of aligned reads > > João Fadista wrote: >> Thanks Zen. >> But the column 8 just gives me all the reads covering the position >> but not the ones that support the SNP. > > Hi Joao, > > Could you explain a little better what you mean? > >> Regards >> João Fadista >> >> >> >> --------------------------------------------------------------------- >> --- >> *From:* zen lu (RI) [mailto:ze...@ro...] >> *Sent:* Monday, February 22, 2010 1:00 PM >> *To:* João Fadista; sam...@li... >> *Subject:* RE: number of aligned reads >> >> Hi Joao, >> >> >> >> Use samtools flagstat command for your first question and column 8 of >> the pileup gives the number of reads covering the position. >> >> >> >> Zen >> >> >> >> --------------------------------------------------------------------- >> --- >> >> *From:* João Fadista [mailto:Joa...@ag...] >> *Sent:* 22 February 2010 11:46 >> *To:* sam...@li... >> *Subject:* [Samtools-help] number of aligned reads >> >> >> >> Hi, >> >> >> >> 1 - I would like to know if there is a way to find how many reads >> have been aligned if looking only at a .BAM file. >> >> >> >> 2 - How can I check the number of reads that support a SNP by looking >> at the output of a pileup file. >> >> >> >> >> >> Regards, >> >> João >> > > > -- > Wes Barris <wes...@cs...> -- Wes Barris <wes...@cs...> |