Thread: [maq-help] Beginner's doubts
Status: Beta
Brought to you by:
lh3lh3
From: lavi B. <go...@ya...> - 2009-05-20 14:58:51
|
Hi, I have just started exploring MAQ and we have been using eland in the past. I have some queries as below: 1. I tried using the very basic command as shown below with basic parameters after fasta2bfa: Convert Reference Sequences and fastq2bfq: Convert Reads as explained in the manual steps: maq match -n 2 -u uFILE -H hFILE -N out.map ref.bfa reads.bfq 2. And out.map is converted into text format using the following command mapview -N out.map > out.aln.txt The following are my observations and please let me know how to interpret the output files. And my queries are as below: 1. Since it took almost a day to complete the matching command for 2 million reads (Single read) which are 35 bps length and with 2 mismatches allowed, how to introduce parallelization technique to gain time? 2. Total number of results from uFILE and out.aln.txt tally to total number of reads (i.e 2 million reads). According to manual, uFILE consists of unmapped reads and hFILE consits of multiple hits and more than n mistached reads. But then how uFILE and out.aln.txt constitues the total number of reads. How to interpret hFILE data output? Is my understanding correct? 3. hFILE output file is in binary format and I am not able to view the results? Actually how do I view the results and what does it really mean? 4. And my next query is about using -N while running match command as well as mapview command? Will it give a similar output as eland as showing the mismatch position in the read with respect to reference? If so which column in out.aln.txt is corresponding to that value and how to interpret. Any example would be easier to understand. If eland gives the output as 25AT7, that means he first 25 bps are exact matches with the reference, 26th and 27th positions are mismatches and 7bps at the end are exactly matching with the reference. MAQ giving an output as eland? Thanks and regards New Email names for you! Get the Email name you've always wanted on the new @ymail and @rocketmail. Hurry before someone else does! http://mail.promotions.yahoo.com/newdomains/sg/ |
From: Joseph F. <jos...@gm...> - 2009-05-20 17:04:48
|
Hi Iavi, On Wed, May 20, 2009 at 7:58 AM, lavi Birdie <go...@ya...> wrote: > *...** > *1. Since it took almost a day to complete the matching command for 2 > million reads (Single read) which are 35 bps length and with 2 mismatches > allowed, how to introduce parallelization technique to gain time? > try e.g. 'split -l 4000000 reads.fastq' to split your reads into 1M read chunks, or smaller, then run match (aka map) command on each chunk, then use 'maq merge' to merge the 2 or more maps into one map. > 2. Total number of results from uFILE and out.aln.txt tally to total number > of reads (i.e 2 million reads). According to manual, uFILE consists of > unmapped reads and hFILE consits of multiple hits and more than n mistached > reads. But then how uFILE and out.aln.txt constitues the total number of > reads. How to interpret hFILE data output? Is my understanding correct? uFILE will have one line per unmapped read. out.aln.txt will (as I understand it) have one line per aligned read - this is where maq placed the read on the reference. hFILE records all of the *possible* alignments of reads to the reference (with 0 or 1 mismatches in the first 24, or 28?, bases) ... but in the end, maq places a read in only one of these possible alignment positions ... the best one (or a random one of several equally best). So out.aln.txt will have the exact number of lines as aligned reads, while hFILE will have, possibly, more than one alignment per aligned read (and its format isn't one line per alignment .. there's a list of new names for reads, then alignment sections for each reference sequence, etc.). > > 3. hFILE output file is in binary format and I am not able to view the > results? Actually how do I view the results and what does it really mean? hFILE is just gzipped text (which is unfortunately not mentioned on the maq online man page). > > 4. And my next query is about using -N while running match command as well > as mapview command? Will it give a similar output as eland as showing the > mismatch position in the read with respect to reference? If so which column > in out.aln.txt is corresponding to that value and how to interpret. Any > example would be easier to understand. If eland gives the output as 25AT7, > that means he first 25 bps are exact matches with the reference, 26th and > 27th positions are mismatches and 7bps at the end are exactly matching with > the reference. MAQ giving an output as eland? Sorry - I don't have experience with the -N option ... -- Joseph Fass Bioinformatics Programmer UC Davis Bioinformatics Core joseph.fass -at- gmail.com (professional) 970.227.5928 (c) || 530.752.2698 (w) |
From: Joel M. <ano...@co...> - 2009-05-20 17:45:50
|
Joseph Fass wrote: > Hi Iavi, > > > On Wed, May 20, 2009 at 7:58 AM, lavi Birdie <go...@ya... > <mailto:go...@ya...>> wrote: > > /.../ > / > /1. Since it took almost a day to complete the matching command > for 2 million reads (Single read) which are 35 bps length and with > 2 mismatches allowed, how to introduce parallelization technique > to gain time? > > try e.g. 'split -l 4000000 reads.fastq' to split your reads into 1M > read chunks, or smaller, then run match (aka map) command on each > chunk, then use 'maq merge' to merge the 2 or more maps into one map. alternatively use maq fastq2bfq -n 1000000 ( as you're using it already to make bfq files ), then run as many maq map commands as you have processors available, though if you only have 2million reads of course that just means run 2 of them. the merge command isn't parallelizable. Just follow the steps in maq.pl easyrun. [snip] > > > 4. And my next query is about using -N while running match command > as well as mapview command? Will it give a similar output as eland > as showing the mismatch position in the read with respect to > reference? If so which column in out.aln.txt is corresponding to > that value and how to interpret. Any example would be easier to > understand. If eland gives the output as 25AT7, that means he > first 25 bps are exact matches with the reference, 26th and 27th > positions are mismatches and 7bps at the end are exactly matching > with the reference. MAQ giving an output as eland > I don't use -N either, but you might look at the maq pileup -P all.map output to know where the mismatch occured on a read. joel > > > -- > Joseph Fass > Bioinformatics Programmer > UC Davis Bioinformatics Core > joseph.fass -at- gmail.com <http://gmail.com> (professional) > 970.227.5928 (c) || 530.752.2698 (w) > ------------------------------------------------------------------------ > > ------------------------------------------------------------------------------ > Crystal Reports - New Free Runtime and 30 Day Trial > Check out the new simplified licensing option that enables > unlimited royalty-free distribution of the report engine > for externally facing server and web deployment. > http://p.sf.net/sfu/businessobjects > ------------------------------------------------------------------------ > > _______________________________________________ > maq-help mailing list > maq...@li... > https://lists.sourceforge.net/lists/listinfo/maq-help > |
From: Dan B. <dan...@gm...> - 2009-05-21 07:46:48
|
2009/5/20 lavi Birdie <go...@ya...>: > Hi, > > I have just started exploring MAQ and we have been using eland in the past. > I have some queries as below: ... > 3. hFILE output file is in binary format and I am not able to view the > results? Actually how do I view the results and what does it really mean? Not sure if it helps what you are doing, but I used: ./scripts/maq.pl easyrun And then: ./maqview test.map At least that is one way to view the results. HTH, Dan. > Thanks and regards > > > > > > ________________________________ > Start chatting with friends on the all-new Yahoo! Pingbox today! > It's easy to create your personal chat space on your blogs > ------------------------------------------------------------------------------ > Crystal Reports - New Free Runtime and 30 Day Trial > Check out the new simplified licensing option that enables > unlimited royalty-free distribution of the report engine > for externally facing server and web deployment. > http://p.sf.net/sfu/businessobjects > _______________________________________________ > maq-help mailing list > maq...@li... > https://lists.sourceforge.net/lists/listinfo/maq-help > > |
From: lavi B. <go...@ya...> - 2009-05-21 08:01:19
|
Hi Thanks for you reply. I also did something similat but didn't use easyrun but followed the sequences till mapview to view the results in txt format. but how to find the mismatch positions in the read with respect to reference sequence? As I had mentioned earlier, eland outputs a column as 32AT1. Am also trying to compare the eland and maq mapping quality. How to proceed with that? ________________________________ From: Dan Bolser <dan...@gm...> To: lavi Birdie <go...@ya...> Cc: maq...@li... Sent: Thursday, 21 May 2009 3:46:35 Subject: Re: [maq-help] Beginner's doubts 2009/5/20 lavi Birdie <go...@ya...>: > Hi, > > I have just started exploring MAQ and we have been using eland in the past. > I have some queries as below: ... > 3. hFILE output file is in binary format and I am not able to view the > results? Actually how do I view the results and what does it really mean? Not sure if it helps what you are doing, but I used: ./scripts/maq.pl easyrun And then: ./maqview test.map At least that is one way to view the results. HTH, Dan. > Thanks and regards > > > > > > ________________________________ > Start chatting with friends on the all-new Yahoo! Pingbox today! > It's easy to create your personal chat space on your blogs > ------------------------------------------------------------------------------ > Crystal Reports - New Free Runtime and 30 Day Trial > Check out the new simplified licensing option that enables > unlimited royalty-free distribution of the report engine > for externally facing server and web deployment. > http://p.sf.net/sfu/businessobjects > _______________________________________________ > maq-help mailing list > maq...@li... > https://lists.sourceforge.net/lists/listinfo/maq-help > > New Email names for you! Get the Email name you've always wanted on the new @ymail and @rocketmail. Hurry before someone else does! http://mail.promotions.yahoo.com/newdomains/sg/ |
From: Dan B. <dan...@gm...> - 2009-05-21 08:22:45
|
2009/5/21 lavi Birdie <go...@ya...>: > Hi Thanks for you reply. I also did something similat but didn't use easyrun > but followed the sequences till mapview to view the results in txt format. > but how to find the mismatch positions in the read with respect to reference > sequence? As I had mentioned earlier, eland outputs a column as 32AT1. > > Am also trying to compare the eland and maq mapping quality. How to proceed > with that? http://seqanswers.com/forums/showthread.php?t=145 http://seqanswers.com/forums/showthread.php?t=1642&highlight=comparison May help. > ________________________________ > From: Dan Bolser <dan...@gm...> > To: lavi Birdie <go...@ya...> > Cc: maq...@li... > Sent: Thursday, 21 May 2009 3:46:35 > Subject: Re: [maq-help] Beginner's doubts > > 2009/5/20 lavi Birdie <go...@ya...>: >> Hi, >> >> I have just started exploring MAQ and we have been using eland in the >> past. >> I have some queries as below: > > ... > >> 3. hFILE output file is in binary format and I am not able to view the >> results? Actually how do I view the results and what does it really mean? > > Not sure if it helps what you are doing, but I used: > > ./scripts/maq.pl easyrun > > > And then: > > ./maqview test.map > > > At least that is one way to view the results. > > > HTH, > Dan. > >> Thanks and regards >> >> >> >> >> >> ________________________________ >> Start chatting with friends on the all-new Yahoo! Pingbox today! >> It's easy to create your personal chat space on your blogs >> >> ------------------------------------------------------------------------------ >> Crystal Reports - New Free Runtime and 30 Day Trial >> Check out the new simplified licensing option that enables >> unlimited royalty-free distribution of the report engine >> for externally facing server and web deployment. >> http://p.sf.net/sfu/businessobjects >> _______________________________________________ >> maq-help mailing list >> maq...@li... >> https://lists.sourceforge.net/lists/listinfo/maq-help >> >> > > ________________________________ > Importing contacts has never been easier.. > Bring your friends over to Yahoo! Mail today! |
From: lavi B. <go...@ya...> - 2009-05-21 09:24:29
|
Hi, I am attaching the output data from mapview in txt format as below: XXX:1:13:1719:33#0/1 chr10 3032404 - 0 0 70 70 70 0 0 1 0 35 aCacAGTCCcTtCTGctGCctgcaCTGcCtGcAGA :>68=?=A@6@1@AB;;@@5559;@ AB;=.@5@B@ 0 In the above output, the last column corresponds to quality string. How to interpret this? And why the read sequence contains both large and small alphabets? And can someone list the column details clearly? Thanks. ________________________________ From: Dan Bolser <dan...@gm...> To: lavi Birdie <go...@ya...> Cc: maq...@li... Sent: Thursday, 21 May 2009 4:22:31 Subject: Re: [maq-help] Beginner's doubts 2009/5/21 lavi Birdie <go...@ya...>: > Hi Thanks for you reply. I also did something similat but didn't use easyrun > but followed the sequences till mapview to view the results in txt format. > but how to find the mismatch positions in the read with respect to reference > sequence? As I had mentioned earlier, eland outputs a column as 32AT1. > > Am also trying to compare the eland and maq mapping quality. How to proceed > with that? http://seqanswers.com/forums/showthread.php?t=145 http://seqanswers.com/forums/showthread.php?t=1642&highlight=comparison May help. > ________________________________ > From: Dan Bolser <dan...@gm...> > To: lavi Birdie <go...@ya...> > Cc: maq...@li... > Sent: Thursday, 21 May 2009 3:46:35 > Subject: Re: [maq-help] Beginner's doubts > > 2009/5/20 lavi Birdie <go...@ya...>: >> Hi, >> >> I have just started exploring MAQ and we have been using eland in the >> past. >> I have some queries as below: > > ... > >> 3. hFILE output file is in binary format and I am not able to view the >> results? Actually how do I view the results and what does it really mean? > > Not sure if it helps what you are doing, but I used: > > ./scripts/maq..pl easyrun > > > And then: > > ./maqview test.map > > > At least that is one way to view the results. > > > HTH, > Dan. > >> Thanks and regards >> >> >> >> >> >> ________________________________ >> Start chatting with friends on the all-new Yahoo! Pingbox today! >> It's easy to create your personal chat space on your blogs >> >> ------------------------------------------------------------------------------ >> Crystal Reports - New Free Runtime and 30 Day Trial >> Check out the new simplified licensing option that enables >> unlimited royalty-free distribution of the report engine >> for externally facing server and web deployment. >> http://p.sf.net/sfu/businessobjects >> _______________________________________________ >> maq-help mailing list >> maq...@li... >> https://lists.sourceforge.net/lists/listinfo/maq-help >> >> > > ________________________________ > Importing contacts has never been easier.. > Bring your friends over to Yahoo! Mail today! New Email addresses available on Yahoo! Get the Email name you've always wanted on the new @ymail and @rocketmail. Hurry before someone else does! http://mail.promotions.yahoo.com/newdomains/sg/ |