Menu

#6 Long reads cause IDBA to crash

New
nobody
None
Medium
Defect
2013-12-19
2013-02-12
Anonymous
No

Originally created by: matt.wor... (code.google.com)@gmail.com

I've been using the A5 pipeline with much success until I attempted an assembly with 250 bp reads from the Illumina MySeq.  This caused a memory pointer error in IDBA which subsequently crashed.  This error has been previously reported for IDBA (http://code.google.com/p/hku-idba/issues/detail?id=2).

By setting the '--long' option for IDBA and giving the '-r' option a blank file I was able to run the assembly without the memory error.  However, in the pipeline I'm not sure how this can be specified if one has a file with longer reads. 

I'm getting great results with much less hands-on time than I was doing previously and would like to continue using the A5 pipeline. Is it possible to incorporate this in future versions of the pipeline?

Thanks for making a great piece of software!

Discussion

  • Anonymous

    Anonymous - 2013-08-30

    Originally posted by: MAlab... (code.google.com)@gmail.com

    Hi
    I am getting the same problem with my 250 bp reads. I did what you suggested and the idpa step worked fine. but my question, what is the next step in the pipeline and how you run it independently?

    Thanks for your time

    Magdy

     
  • Anonymous

    Anonymous - 2013-09-04

    Originally posted by: matt.wor... (code.google.com)@gmail.com

    I think the easiest way use long reads with the pipeline is to modify the Perl script which runs the pipeline to allow IDBA to accept the long reads.  However, the authors of A5 may have a better suggestion.

     
  • Anonymous

    Anonymous - 2013-10-22

    Originally posted by: prakhar.... (code.google.com)@gmail.com

    IDBA usage:
    IDBA: Iterative De Bruijn graph short read Assembler
    Version 0.19

    Usage: idba --read read-file [--output out] [options]

    Allowed Options:
      -h, --help                   produce help message
      -r, --read arg               read file
      -l, --long arg               long read file
      -o, --output arg (=out)      prefix of output
          --scaffold               use pair end information to merge contigs
          --mink arg (=25)         minimum k value
          --maxk arg (=50)         maximum k value
          --minCount arg (=2)      filtering threshold for each k-mer
          --cover arg (=0)         the cutting coverage for contigs
          --minPairs arg (=5)      minimum number of pair-end connections to join two contigs
          --prefixLength arg (=3)  length of the prefix of k-mer used to split k-mer table
    ------------------------------------------------------------------------------
    I am having the same problem as described above, though I am using idba as part of A5 pipeline.

    The -r and -l(--long) options seem to be mutually exclusive, and supplying -r as blank file means no input.
    How then will idba assemble reads?

    Or am I missing something?

    Input data: MiSeq 250 bp reads, both PE and MP libs

    Regards,
    --
    prakhar gaur

     
  • Anonymous

    Anonymous - 2013-10-25

    Originally posted by: prakhar.... (code.google.com)@gmail.com

    Hello,

    back with more details on the error.
    It seems that the Mate Library, (MiSeq, 250base pair read lenght)
    is the one causing the trouble.

    On attempting a a5 run only with Mate pair reads as input, the idba step fails
    with the above mentioned error message.

    regards,
    --
    prahar gaur

     
  • Anonymous

    Anonymous - 2013-12-16

    Originally posted by: aaron.darling (code.google.com)

    There is a new branch of the code which can work with 250nt and 300nt miseq reads. It's available from the subversion repository here:
    http://ngopt.googlecode.com/svn/branches/20130712_miseq_longread/

    It can be checked out with subversion and then built with the build_pipeline.sh script.

    After some further refinement and testing we will eventually release this version as a separate download. In principle it should work with reads up to 400nt long, but memory requirements for longer reads are very high. Expect to use at least 30GB for a typical bacterial genome.

     
  • Anonymous

    Anonymous - 2013-12-17

    Originally posted by: prakhar.... (code.google.com)@gmail.com

    Hello Aaron,

    I used the above mentioned code, for a bacterial assembly.
    With two libs,
    PE - 250 bp Read lenght
    MP - 250 bp Read lenght

    I created a lib file,
    $cat AKS7.lib
    [LIB]
    p1=AKS7_S1_L001_R1_001.fastq
    p2=AKS7_S1_L001_R2_001.fastq
    [LIB]
    p1=AKS7-MP_S2_L001_R1_001.fastq
    p2=AKS7-MP_S2_L001_R2_001.fastq

    The a5 pipeline was run with this command,
    $perl /home/prakhar/local_Bin/ngopt_a5pipeline_linux-x64_20131217/bin/a5_pipeline.pl AKS7.lib a5-20131217_AKS7_MP_PE

    The assembly was completed and final scaffolds file generated.

    But on examining the a5 log,

    "[a5] /home/prakhar/local_Bin/ngopt_a5pipeline_linux-x64_20131217/bin/idba_ud250 -r a5-20131217_AKS7_MP_PE.s2/a5-20131217_AKS7_MP_PE.ec.fasta  -o a5-20131217_AKS7_MP_PE.s2/a5-20131217_AKS7_MP_PE --mink 35 --maxk 250 --min_pairs 2 --min_count 1
    Segmentation fault"

    Is this significant ?

    Please find the full log file attached herewith.

    Regards,
    --
    prakhar gaur

     
  • Anonymous

    Anonymous - 2013-12-19

    Originally posted by: aaron.darling (code.google.com)

    Hmmm, not sure why that is happening, perhaps idba_ud has a memory corruption bug on exit. In any case, it produced an assembly so you should be able to use it.

     

Log in to post a comment.

MongoDB Logo MongoDB