From: Wes B. <wes...@cs...> - 2010-06-17 05:55:42
|
On 17/06/2010 3:50 PM, Florent Angly wrote: > Hi Wes, > > The quality score of the reads is not specified in ACE files. It only > contains the quality score of the contigs. Yes, that is what I understand. > It is likely that at some > point in your conversion process, the quality scores are missing and it > is assumed that all quality values are 20, or some other arbitrary value. Yes, it sounds like AMOS is making an assumption on the quality scores it is placing in the .sam file. > There are a bunch of tools in BioPerl to run different assemblers and > convert their outputs. See > http://www.bioperl.org/wiki/Module:Bio::Assembly::IO and > http://github.com/bioperl/bioperl-live/tree/master/Bio/Assembly/IO/. But > your problem is that there are no quality scores for reads in ACE > files... You could use BioPerl to code something that will that your > QUAL file in combination with the ACE file to produce the desired output. Yes, the problem is the size of the files I am dealing with. > But I am wondering what you are trying to achieve by using the 454 de > novo assembler and converting its results to SAM. I am converting 454 denovo output to SAM for two reasons: 1) I want to be able to visualize the alignments using both samtools tview and gbrowse (which has a nice interface to sam) 2) I want to call SNPs using samtools pileup and varFilter and then post-process the results with my own filtering. > Florent > > > > On 17/06/10 15:17, Wes Barris wrote: >> I have a .ace file output from gsAssembler. I have converted this to a >> .sam file using these steps: >> >> toAmos -ace $data/454Contigs.ace -o 454Contigs.afg >> bank-transact -m 454Contigs.afg -b 454Contigs.bnk -c >> bank2contig -s 454Contigs.bnk> junk1.sam >> >> The .sam file contains a fastq style quality string for each read but I >> don't know where it is getting it from because the .ace file only contains >> quality scores for the contigs. Also, I can tell the quality string in >> the .sam file is not correct because most every character for most every >> read is a '+' or '?'. >> >> I have .sff files and the .ace file. Is there a way to get the quality >> scores for the individual reads into the .sam file? I tried writing a >> perl script to do this but because of the size of the data involved and >> memory constraints, it is difficult to create a hash of all read quality >> strings and loop through the .sam lines. >> > -- Wes Barris |