Re: [maq-help] illumina runs

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Hello Girish,

On 10 Aug 2008, at 23:11, Girish B wrote:

> Hi !
>
> I am wondering why it is taking me so long to do  an alignment using  
> maq. I am trying to align illumina data (7 lanes, each about 1 gig  
> in size ) against ref human genome (1 huge FASTA file about 2.5  
> GB  ) . On maq's website it says for 1-2 million reads and with 1  
> gig of ram , it would take about 10 cpu hours to align against human  
> ref genome.

It is about 6 cpu hours with the latest version.

> I wrote a python script to count the number of reads in one of the  
> lanes and it counted 7 million reads. Moreover, lane 1 ( 1.1 GB ),  
> has been going for about 7 days now ( quad core machine, 32 bits, 6  
> GB RAM).

One probable cause is polyA. Illumina/Solexa base calling may give a  
lot of polyA sequences at the edge of a tile and MAQ will be  
*extremely* slow given a lot of polyA. I would suggest filter polyA  
reads with first. You need to write your own script, unfortunately. I  
will try to provide a tool in future versions.

with kind regards,

Heng

-- 
 The Wellcome Trust Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE.