From: Jens H. <jen...@go...> - 2012-11-20 10:35:26
|
Hi, I'm relatively new to NG and its tools, but at the moment I'm trying to run an assembly of about 70 single- and paired end 454 reads in FASTQ format, using the wgs-7.0-assembler. The version I've been using is the one from http://sourceforge.net/apps/mediawik...itle=Main_Page I have converted my FASTQ files to FRG files using CABOGs fastqTOCA routine using a different library name for each FASTQ file. When I run the actual assembly though with runCA. I get an error message in melonAssembly.gkpStore.err. GKP finished with 11339450 alerts or errors: 11338139 # ILL Error: not a sequence start line. 1292 # ILL Error: not a quality start line. 19 # LIB Alert: suspicious mean and standard deviation; reset stddev to 0.10 * mean. To me this looked as if it was a problem with the format of my FASTQ files, so I ran a script to validate on format consistency of the files which resulted in no errors. Some of my reads are longer than 2047 bp and I have the feeling that the bug fix stated at http://sourceforge.net/apps/mediawik..._Release_Notes under Bug fixes is not yet fixed in the version I'm using. Quote: "Gatekeeper: Numerous problems with reads longer than the maximum allowed (2047bp) and reads of very specific lengths were discovered and fixed. All of these resulted in gatekeeper crashing." Even though gatekeeper doesn't crash, I would expect about 25 million reads to be processed by CABOG, however while running the assembly I get a stdout print message saying "numFrags = 14499910". To me this looks like not all reads are being used for the assembly. If I add the number of ILL Errors, it comes suspiciously close to my expected number of reads though, which makes me think that CABOG just get's rid of the reads which are longer than the maximally allowed length of 2047 bp. My questions would be: What happens with reads that are longer than the maximally allowed length? Are those reads ignored or clipped to the maximum read length? Is there a way to adjust the maximum read length, to make CABOG use those reads in the assembly as well? Does every FASTQ file have to be added to a different gatekeeper library, or is it enough to put single ended and paired ended reads into their respective libraries? I would be very grateful if anyone could help me out. Ciao, Jens |