From: Walenz, B. <bw...@jc...> - 2012-11-26 15:14:22
|
Hi- There are two different methods to load sequences into the assembler. The older method rewrites sequence/quality data into the frg output as you saw with fastaToCA. The newer method leaves the sequence/quality data in the original fastq file, and gives a wrapper to the assembler. In the wrapper are pointers to the original fastq (along with the format of the QV and orientation of mate pairs): fastqQualityValues=sanger fastqOrientation=innie fastqMates=/tmp2/bcs03/melonFastqCorrected/paired/F7HI6DR01-corrected.fastq Is telling the assembler you have interleaved mated reads with the Sanger (offset=33) encoding that are 5'3' -- 3'5' orientation. 'gatekeeper -dumpinfo *gkpStore' will give a summary of the number of reads loaded for each library. Getting the QV format wrong, I think, will generate a ton of warnings in gkpStore.err or gkpStore.errorLog. I'm not sure if the reads are discarded or 'fixed'. In the CVS version of the assembler, 'fastqAnalyze some.fastq' will make a decent guess at what QV encoding you have. b On 11/22/12 6:29 AM, "Jens Hooge" <jen...@go...> wrote: > Hi, > > I have converted a number of 454 reads in FASTQ format and Sanger reads in > FASTA format. For the Sanger reads I have generated my own quality value file > (as well in FASTA format). > > I called the conversion routines as follows: > > Sanger Library: > ./fastaToCA -l BES_random_shear_library_reverse -s > <pathtofasta>/BES_random_shear_library_reverse -q > <pathtoqual>/BES_random_shear_library_reverse.qlt > > <outpath>/BES_random_shear_library_reverse.frg > > 454 Library: > ./fastqToCA -insertsize 2834 172 -libraryname F7HI6DR01-corrected -technology > 454 -mates <pathtofastq>/F7HI6DR01-corrected.fastq > > <outpath>/F7HI6DR01-corrected.frg > > What strikes me here, is that the conversion of my Sanger library results in > an FRG file where the fields seq: and qlt: are filled, while the conversion of > my 454 library doesn't. This especially confusion to me because when I dump a > FRG file "after" running the assembly using the command > > gatekeeper -dumpfrg -allreads assembly.gkpStore > asm.frgs > > The frg file is filled with the assumably correct sequences and quality > values, even though I only used converted FASTQ files for the assembly. Is > this an expected behaviour? > > Thanks in advance for any help on that matter. > > > fastaToCA Conversion Result: please find BES_random_shear_library_reverse.frg |