Originally created by: matt.wor... (code.google.com)@gmail.com
I've been using the A5 pipeline with much success until I attempted an assembly with 250 bp reads from the Illumina MySeq. This caused a memory pointer error in IDBA which subsequently crashed. This error has been previously reported for IDBA (http://code.google.com/p/hku-idba/issues/detail?id=2).
By setting the '--long' option for IDBA and giving the '-r' option a blank file I was able to run the assembly without the memory error. However, in the pipeline I'm not sure how this can be specified if one has a file with longer reads.
I'm getting great results with much less hands-on time than I was doing previously and would like to continue using the A5 pipeline. Is it possible to incorporate this in future versions of the pipeline?
Thanks for making a great piece of software!
View and moderate all "tickets Discussion" comments posted by this user
Mark all as spam, and block user from posting to "Tickets"
Originally posted by: MAlab... (code.google.com)@gmail.com
Hi
I am getting the same problem with my 250 bp reads. I did what you suggested and the idpa step worked fine. but my question, what is the next step in the pipeline and how you run it independently?
Thanks for your time
Magdy
View and moderate all "tickets Discussion" comments posted by this user
Mark all as spam, and block user from posting to "Tickets"
Originally posted by: matt.wor... (code.google.com)@gmail.com
I think the easiest way use long reads with the pipeline is to modify the Perl script which runs the pipeline to allow IDBA to accept the long reads. However, the authors of A5 may have a better suggestion.
View and moderate all "tickets Discussion" comments posted by this user
Mark all as spam, and block user from posting to "Tickets"
Originally posted by: prakhar.... (code.google.com)@gmail.com
IDBA usage:
IDBA: Iterative De Bruijn graph short read Assembler
Version 0.19
Usage: idba --read read-file [--output out] [options]
Allowed Options:
-h, --help produce help message
-r, --read arg read file
-l, --long arg long read file
-o, --output arg (=out) prefix of output
--scaffold use pair end information to merge contigs
--mink arg (=25) minimum k value
--maxk arg (=50) maximum k value
--minCount arg (=2) filtering threshold for each k-mer
--cover arg (=0) the cutting coverage for contigs
--minPairs arg (=5) minimum number of pair-end connections to join two contigs
--prefixLength arg (=3) length of the prefix of k-mer used to split k-mer table
------------------------------------------------------------------------------
I am having the same problem as described above, though I am using idba as part of A5 pipeline.
The -r and -l(--long) options seem to be mutually exclusive, and supplying -r as blank file means no input.
How then will idba assemble reads?
Or am I missing something?
Input data: MiSeq 250 bp reads, both PE and MP libs
Regards,
--
prakhar gaur
View and moderate all "tickets Discussion" comments posted by this user
Mark all as spam, and block user from posting to "Tickets"
Originally posted by: prakhar.... (code.google.com)@gmail.com
Hello,
back with more details on the error.
It seems that the Mate Library, (MiSeq, 250base pair read lenght)
is the one causing the trouble.
On attempting a a5 run only with Mate pair reads as input, the idba step fails
with the above mentioned error message.
regards,
--
prahar gaur
View and moderate all "tickets Discussion" comments posted by this user
Mark all as spam, and block user from posting to "Tickets"
Originally posted by: aaron.darling (code.google.com)
There is a new branch of the code which can work with 250nt and 300nt miseq reads. It's available from the subversion repository here:
http://ngopt.googlecode.com/svn/branches/20130712_miseq_longread/
It can be checked out with subversion and then built with the build_pipeline.sh script.
After some further refinement and testing we will eventually release this version as a separate download. In principle it should work with reads up to 400nt long, but memory requirements for longer reads are very high. Expect to use at least 30GB for a typical bacterial genome.
View and moderate all "tickets Discussion" comments posted by this user
Mark all as spam, and block user from posting to "Tickets"
Originally posted by: prakhar.... (code.google.com)@gmail.com
Hello Aaron,
I used the above mentioned code, for a bacterial assembly.
With two libs,
PE - 250 bp Read lenght
MP - 250 bp Read lenght
I created a lib file,
$cat AKS7.lib
[LIB]
p1=AKS7_S1_L001_R1_001.fastq
p2=AKS7_S1_L001_R2_001.fastq
[LIB]
p1=AKS7-MP_S2_L001_R1_001.fastq
p2=AKS7-MP_S2_L001_R2_001.fastq
The a5 pipeline was run with this command,
$perl /home/prakhar/local_Bin/ngopt_a5pipeline_linux-x64_20131217/bin/a5_pipeline.pl AKS7.lib a5-20131217_AKS7_MP_PE
The assembly was completed and final scaffolds file generated.
But on examining the a5 log,
"[a5] /home/prakhar/local_Bin/ngopt_a5pipeline_linux-x64_20131217/bin/idba_ud250 -r a5-20131217_AKS7_MP_PE.s2/a5-20131217_AKS7_MP_PE.ec.fasta -o a5-20131217_AKS7_MP_PE.s2/a5-20131217_AKS7_MP_PE --mink 35 --maxk 250 --min_pairs 2 --min_count 1
Segmentation fault"
Is this significant ?
Please find the full log file attached herewith.
Regards,
--
prakhar gaur
View and moderate all "tickets Discussion" comments posted by this user
Mark all as spam, and block user from posting to "Tickets"
Originally posted by: aaron.darling (code.google.com)
Hmmm, not sure why that is happening, perhaps idba_ud has a memory corruption bug on exit. In any case, it produced an assembly so you should be able to use it.