Re: [maq-help] can not get result for ill2sanger
Status: Beta
Brought to you by:
lh3lh3
From: Peng Yu <pen...@gm...> - 2010-05-22 20:03:39
|
Hello, According to fq_all2std.pl in mac-0.7.1, scarf is of the following format, where the quality score is not the same as that in my example. One thing that is quite confusing to me is the different format variants. Is there a place where all the possible next-gen seq formats (including all the variants) are listed along with the conversion tools? Why scarfenc2std used to be in fq_all2std.pl, but they are not in fq_all2std.pl anymore? Considering the fact that scarfenc is still used today, will it be patched to maq in the future? scarf ===== USI-EAS50_1:4:2:710:120:GTCAAAGTAATAATAGGAGATTTGAGCTATTT:23 23 23 23 23 23 23 23 23 23 23 23 23 23 23 23 23 23 23 23 23 23 23 23 19 23 23 23 18 23 23 23 USI-EAS50_1:4:2:690:87:GTTTTTTTTTTTCTTTCCATTAATTTCCCTTT:23 23 23 23 23 23 23 23 23 23 23 23 12 23 23 23 23 23 16 23 23 9 18 23 23 23 12 23 18 23 23 23 USI-EAS50_1:4:2:709:32:GAGAAGTCAAACCTGTGTTAGAAATTTTATAC:23 23 23 23 23 23 23 23 20 23 23 23 23 23 23 23 23 23 23 23 23 12 23 18 23 23 23 23 23 23 23 23 USI-EAS50_1:4:2:886:890:GCTTATTTAAAAATTTACTTGGGGTTGTCTTT:23 23 23 23 23 23 23 23 23 23 23 23 23 23 23 23 23 23 23 23 23 23 23 23 23 23 23 23 23 23 23 23 USI-EAS50_1:4:2:682:91:GGGTTTCTAGACTAAAGGGATTTAACAAGTTT:23 23 23 23 23 23 23 23 23 23 23 23 23 23 23 20 23 23 23 23 23 23 23 23 23 23 23 18 23 23 23 23 USI-EAS50_1:4:2:663:928:GAATTTGTTTGAAGAGTGTCATGGTCAGATCT:23 23 23 23 23 23 23 23 23 23 23 23 23 23 23 23 23 23 23 23 23 23 23 23 23 23 23 23 23 23 23 23 On Sat, May 22, 2010 at 11:31 AM, Joel Martin <j_m...@lb...> wrote: > Hello, > That's scarf format, in some versions of fq_all2std.pl ( maybe the one in > cvs? ) > is a function 'scarfenc2std' you can convert to fastq with that. I don't > see it in the maq-0.7.1 > version though, so. > > #!/usr/bin/perl -w > use strict; > my @conv_table; > for (-64..64) { > $conv_table[$_+64] = chr(int(33 + 10*log(1+10**($_/10.0))/log(10)+.499)); > } > > while (<>) { > my @t = split(':', $_, 7); > my $name = join(':', @t[0..4]); > print "\@$name\n$t[5]\n+\n"; > my $qual = ''; > chomp( $t[ 6 ] ); > my $qual_length = length( $t[6] ); > for ( my $i = 0; $i <= $qual_length; $i++ ) { > $qual .= $conv_table[ ord( substr( $t[ 6 ], $i, 1 ) ) ]; > } > print "$qual\n"; > } > > > On 5/22/2010 7:40 AM, Peng Yu wrote: >> >> On Sat, May 22, 2010 at 9:21 AM, Gareth Bloomfield >> <ga...@mr...> wrote: >> >>> >>> Look at this page: >>> >>> http://en.wikipedia.org/wiki/Fastq >>> >>> which shows the basic format (and also describes the various ways quality >>> scores have been encoded). >>> >>> It shouldn't be too hard to rearrange your data into the format >>> ill2sanger >>> expects. >>> >>> Gareth >>> >>> >>>> >>>> On May 22, 2010, at 4:04 PM, Peng Yu wrote: >>>> >>>> >>>>> >>>>> I have the following sequence.txt file. But maq's ill2sanger doesn't >>>>> give me any result (the output fastq file is empty). Could you let me >>>>> know what is wrong? >>>>> >>>>> >>>> >>>> Looks like your input is not in fastq format... >>>> >>>> d >>>> >>>> >>>>> >>>>> $maq ill2sanger s_1_sequence.txt s_1_sequence.fastq >>>>> >>>>> >>>>> $ cat s_1_sequence.txt >>>>> >>>>> HWI-EAS11X_10097:4:1:1047:11719#0/1:NGGGAAGTCGAAACGGCAGGGAATGGAGAAAAAAGGGCCCGCNAAGGCTGNNAAGGAGGAGATCGGGAGAGGGGC:BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB >>>>> >>>>> HWI-EAS11X_10097:4:1:1047:9026#0/1:NCAGAGAAGACCAAGGAAGGCGTCCTCTATGTCGGAATCAAGNCCAGAGANNGGAATAGCGGTTCAGCATGAATG:BYYYR[\U[W__^]`Z_X_BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB >>>>> >>>>> HWI-EAS11X_10097:4:1:1048:2910#0/1:NTAAATCTGTCGTAAACTCTCCACACCACAGTACTCAGAGATNGTAAGAGNNGTTCAGCAGGAATACCGAGACCG:BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB >>>>> >>>>> HWI-EAS11X_10097:4:1:1048:12092#0/1:NGACTTCCCATGACATTTATACTCCTCCTGCGACCAGATCGGNAGAGCGGNNCAGCAGGAAGGCCGAGACCGAGC:BUYYWY[[YXaa^[aa^^YaX`__[P[XX\_^I^BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB >>>>> >>>>> >>>>> BTW, I downloaded maq-0.7.1.tar.bz2 and applied the >>>>> maq-ill2sanger.patch using the command. >>>>> >>>>> cd maq-0.7.1; patch -p1< ../maq-ill2sanger.patch >>>>> >> >> I got the sequence.txt from Illumina GA 1.5. Also I saw the example on >> maq website. >> >> "The raw reads format used by Solexa (those `s_?_sequence.txt' from >> the Solexa pipeline) are different from mapass' FASTQ format in that >> the qualties are scaled differently. To use maq, you need to first >> convert the format with: >> >> maq sol2sanger s_1_sequence.txt s_1_sequence.fastq >> >> where s_1_sequence.txt is the Solexa read sequence file. Missing this >> step will lead to unreliable SNP calling." >> >> According to the above description, I thought that maq (ill2sanger) >> could take illuminate sequence.txt format directly as well. >> >> It seems that ill2sanger doesn't not expect sequence.txt format. What >> format it expects? Illumina fastq? >> >> > > -- Regards, Peng |