Re: [maq-help] illumina runs
Status: Beta
Brought to you by:
lh3lh3
From: Heng Li <lh...@sa...> - 2008-08-14 11:54:16
|
Hello Girish, On 10 Aug 2008, at 23:11, Girish B wrote: > Hi ! > > I am wondering why it is taking me so long to do an alignment using > maq. I am trying to align illumina data (7 lanes, each about 1 gig > in size ) against ref human genome (1 huge FASTA file about 2.5 > GB ) . On maq's website it says for 1-2 million reads and with 1 > gig of ram , it would take about 10 cpu hours to align against human > ref genome. It is about 6 cpu hours with the latest version. > I wrote a python script to count the number of reads in one of the > lanes and it counted 7 million reads. Moreover, lane 1 ( 1.1 GB ), > has been going for about 7 days now ( quad core machine, 32 bits, 6 > GB RAM). One probable cause is polyA. Illumina/Solexa base calling may give a lot of polyA sequences at the edge of a tile and MAQ will be *extremely* slow given a lot of polyA. I would suggest filter polyA reads with first. You need to write your own script, unfortunately. I will try to provide a tool in future versions. with kind regards, Heng -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. |