Hey there!
Sorry for bothering you again. As I know got everything compiled and up and running in principle I tried running Metassembler on 6 different assemblies I have done with simulated Illumina MiSeq reads.
As matepair data I have ~15 million simulated "MiSeq" mate pairs of read length 250 bp with a mean insert size of 5k (sd: 500) and for everything else I did stick to the standard parameters given in the examples.
Mapping those reads back to my 6 assemblies works just fine, using the metassemble pipeline & bowtie. But unfortunately mateAn then crashes for one of the assemblies. This only happens for a single assembly, for the other ones it seems to work fine. The full error and command to run it is attached below.
I know, given the little information it's kind of hard to debug, but maybe someone has an idea? :-)
Does the sample data run correctly to completion? If you exclude the one
assembly does it run to completion? If both of those work okay, it must be
something weird in the one assembly. Would you be willing to share the data
set?
Hey there!
Sorry for bothering you again. As I know got everything compiled and up
and running in principle I tried running Metassembler on 6 different
assemblies I have done with simulated Illumina MiSeq reads.
As matepair data I have ~15 million simulated "MiSeq" mate pairs of read
length 250 bp with a mean insert size of 5k (sd: 500) and for everything
else I did stick to the standard parameters given in the examples.
Mapping those reads back to my 6 assemblies works just fine, using the
metassemble pipeline & bowtie. But unfortunately mateAn then crashes for
one of the assemblies. This only happens for a single assembly, for the
other ones it seems to work fine. The full error and command to run it is
attached below.
I know, given the little information it's kind of hard to debug, but maybe
someone has an idea? :-)
The sample data set finishes just fine. For my data I'm currently running it without the one assembly to see whether it works then. And of course I'm willing to share the data set. :-)
Thanks for the great help!
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Let us know how it goes without that one assembly. Otherwise, we will
probably have to get a copy of the data so that we can run it within the
debugger. If you feel up to it, you could also run mateAn in the debugger
(gdb) and get a backtrace where it crashes. That would give us a lot more
information to go on.
The sample data set finishes just fine. For my data I'm currently running
it without the one assembly to see whether it works then. And of course I'm
willing to share the data set. :-)
So for the sample data it works just fine, but running it on my data, even after removing the one assembly, still crashes, apparently other assemblies in my project have a similar problem. As I'm away for a conference this week it might take me a bit longer to figure out which assemblies are to blame. In any case I'll try to run the debugger on mateAn.
Thanks,
Bastian
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Thanks Bastian. Lets us know if you have any progress narrowing down which
assemblies work or do not work. If possible, it would be very helpful if
you could share the files that are crashing.
Wences, looks like it is blowing up in writeBDPsort() as it calls into
BDPfile::sort(). Can you look over that code? If you think it would help we
could recompile with the debugging symbols to get the specific line number
and variable that is crashing.
The names of the sequences are causing the pipeline to crash. If I change
the names of the sequences from things like:
NODE_1_length_34200_cov_22.678041
to things like:
NODE_1_length_34200_cov_22_678041
Then I am able to run the entrie pipeline without errors. However I still
haven't debugged why is it exactly that the names cause the pipeline to
crash. I may need to run a couple of more samples.
The names of the sequences are causing the pipeline to crash. If I change
the names of the sequences from things like:
NODE_1_length_34200_cov_22.678041
to things like:
NODE_1_length_34200_cov_22_678041
Then I am able to run the entrie pipeline without errors. However I still
haven't debugged why is it exactly that the names cause the pipeline to
crash. I may need to run a couple of more samples.
Wences.
On Thu, Apr 2, 2015 at 12:10 PM, Bastian Greshake
gedankenstuecke@users.sf.net wrote:
Thanks a lot for having a look into it. Let me know if I can provide any
more information or you need more help. :-)
Nice catch! At least now we have a workaround. Let me know if you would
like to chat about it.
Thank you!
Mike
On Mon, Apr 6, 2015 at 8:05 PM, Alejandro Hernandez Wences
ahwences@users.sf.net wrote:
Hi
The names of the sequences are causing the pipeline to crash. If I change
the names of the sequences from things like:
NODE_1_length_34200_cov_22.678041
to things like:
NODE_1_length_34200_cov_22_678041
Then I am able to run the entrie pipeline without errors. However I still
haven't debugged why is it exactly that the names cause the pipeline to
crash. I may need to run a couple of more samples.
Wences.
On Thu, Apr 2, 2015 at 12:10 PM, Bastian Greshake
gedankenstuecke@users.sf.net wrote:
Thanks a lot for having a look into it. Let me know if I can provide any
more information or you need more help. :-)
Hey there!
Sorry for bothering you again. As I know got everything compiled and up and running in principle I tried running Metassembler on 6 different assemblies I have done with simulated Illumina MiSeq reads.
As matepair data I have ~15 million simulated "MiSeq" mate pairs of read length 250 bp with a mean insert size of 5k (sd: 500) and for everything else I did stick to the standard parameters given in the examples.
Mapping those reads back to my 6 assemblies works just fine, using the metassemble pipeline & bowtie. But unfortunately mateAn then crashes for one of the assemblies. This only happens for a single assembly, for the other ones it seems to work fine. The full error and command to run it is attached below.
I know, given the little information it's kind of hard to debug, but maybe someone has an idea? :-)
Cheers,
Bastian
Does the sample data run correctly to completion? If you exclude the one
assembly does it run to completion? If both of those work okay, it must be
something weird in the one assembly. Would you be willing to share the data
set?
Thank you,
Mike
On Thu, Mar 19, 2015 at 3:46 PM, Bastian Greshake gedankenstuecke@users.sf.net wrote:
The sample data set finishes just fine. For my data I'm currently running it without the one assembly to see whether it works then. And of course I'm willing to share the data set. :-)
Thanks for the great help!
Let us know how it goes without that one assembly. Otherwise, we will
probably have to get a copy of the data so that we can run it within the
debugger. If you feel up to it, you could also run mateAn in the debugger
(gdb) and get a backtrace where it crashes. That would give us a lot more
information to go on.
Good luck
Mike
On Fri, Mar 20, 2015 at 3:58 AM, Bastian Greshake gedankenstuecke@users.sf.net wrote:
So for the sample data it works just fine, but running it on my data, even after removing the one assembly, still crashes, apparently other assemblies in my project have a similar problem. As I'm away for a conference this week it might take me a bit longer to figure out which assemblies are to blame. In any case I'll try to run the debugger on mateAn.
Thanks,
Bastian
so this is what happens when running gdb on mateAn: https://gist.github.com/gedankenstuecke/410843dace556e70a563
Cheers,
Bastian
Thanks Bastian. Lets us know if you have any progress narrowing down which
assemblies work or do not work. If possible, it would be very helpful if
you could share the files that are crashing.
Wences, looks like it is blowing up in writeBDPsort() as it calls into
BDPfile::sort(). Can you look over that code? If you think it would help we
could recompile with the debugging symbols to get the specific line number
and variable that is crashing.
Thank you,
Mike
On Tue, Mar 24, 2015 at 12:49 PM, Bastian Greshake gedankenstuecke@users.sf.net wrote:
Sure, so my two assemblies that will crash mateAn can be found here:
https://www.dropbox.com/s/oage17dzgfax3iy/MIRA.fasta?dl=0
https://www.dropbox.com/s/hfi7p616ohigpvv/metavelvet.fa?dl=0
The mate pair reads are here:
https://opensnp.org/data/clad9_ast1_matepair1.fq
https://opensnp.org/data/clad9_ast1_matepair2.fq
And the general settings I used are here:
https://gist.github.com/gedankenstuecke/85fdab60b91455f80119
Hope that helps. If you need anything else let me know.
Thank you. Wences, can you try looking at these?
Thank you,
Mike
On Wed, Mar 25, 2015 at 6:43 PM, Bastian Greshake gedankenstuecke@users.sf.net wrote:
Thanks a lot for having a look into it. Let me know if I can provide any more information or you need more help. :-)
Cheers,
Bastian
Hi
The names of the sequences are causing the pipeline to crash. If I change
the names of the sequences from things like:
Wences.
On Thu, Apr 2, 2015 at 12:10 PM, Bastian Greshake gedankenstuecke@users.sf.net wrote:
Nice catch! At least now we have a workaround. Let me know if you would
like to chat about it.
Thank you!
Mike
On Mon, Apr 6, 2015 at 8:05 PM, Alejandro Hernandez Wences ahwences@users.sf.net wrote:
Hi,
I finally fixed the bug you reported, a new version is available in
sourceforge now. Thanks a lot for pointing it out.
Wences
On Tue, Apr 7, 2015 at 12:20 AM, Michael Schatz mcschatz@users.sf.net
wrote:
Hi,
mateAn was also crashing if reads identifier used _1/_2 instead of /1 or /2.
Just as information if you have reads with these identifiers.
Hi !!
I don't have mate-pair/jumping library for our newly sequenced genome. Is there anyway to run metaassembler without mate-pair information ?
Sorry, that's a required data type so that it can evaluate which sequence is more reliable when there is a conflict.
Good luck
Mike