Menu

#140 Trying to use a larger kmer

meryl
closed-wont-fix
Crash (103)
5
2012-01-09
2010-10-29
Anonymous
No

I am trying to run CA with a larger kmer than 32. I recompiled the kmer package from source with KMER_WORDS increased to 2 in the meryl.H file, which should allow me to use kmers up to 64. I then moved the meryl executable from this recompile over to the WGS bin folder, replacing the original meryl executable that came with WGS when it was installed. However, meryl fails during runCA and I get this error:

seqFactory::registerFile()-- Cannot determine type of file '/work/abtucker/Casey_CA/biaurelia_CA9/biaurelia_CA.gkpStore:chain'. Tried:
seqFactory::registerFile()-- 'FastAstream'
seqFactory::registerFile()-- 'FastA'
seqFactory::registerFile()-- 'seqStore'

I read another bug thread (https://sourceforge.net/tracker/index.php?func=detail&aid=3072019&group_id=106905&atid=645639) that said this:
The original error in meryl -- likely caused by the assembler finding
the kmer-only version of the meryl program. This program is compiled both
in the kmer package, and in the assembler. When it is compiled in the
assembler it is extended to read the gkpStore directly. The four error
lines say "you told me to read a gkpStore:chain, but I only know how to
read FastAstream, FastA and seqStore". That's exactly the kmer-only
version of meryl.

So I'm confused about how to compile meryl so that it can read the gkpStore AND so that it can use a higher kmer. If I compile kmer first and then recompile WGS, will it find the kmer I already have installed and edit that version of meryl to read the gkpStore? Any help would be great.

Discussion

  • Brian Walenz

    Brian Walenz - 2010-10-29

    Correct. Compile kmer with 'gmake install' then recompile the wgs/src tree.

    Full details on compiling from source are at:
    http://sourceforge.net/apps/mediawiki/wgs-assembler/index.php?title=Check_out_and_Compile

    I can't guarantee that the assembler will actually work with > 32-mers. The ovl overlapper will definitely NOT work. The mer overlapper should work, but I don't think we have ever tested it. If possible, please share your experience.

     
  • Anonymous

    Anonymous - 2010-10-29

    Great. Will try it and let you know how it goes. Thanks!

     
  • Nobody/Anonymous

    Can't seem to get this compilation to work. Tried compiling from the 6.1 source, everything seems to work, no compilation errors, but I end up with no overmerry in the bin. Even tried with the original kmer version included in the source, no change to kmer.H to increase kmer size, and still ended up with no overmerry. Any ideas? Previously, I had just installed the precompiled Linux-amd64 version, so I've never had this issue.

     
  • Nobody/Anonymous

    Also, I should add that the previous post is when I compile the 6.1 src folder WITHOUT precompiling the kmer folder. Because when I try to compile the kmer folder first, as in the description on the website you mentioned, I get this compilation error:
    merTrim.C:679: error: 'kMerTiny' was not declared in this scope
    merTrim.C:679: error: expected ';' before 'F'
    merTrim.C:680: error: expected ';' before 'R'
    merTrim.C:683: error: 'F' was not declared in this scope
    merTrim.C:684: error: 'R' was not declared in this scope
    merTrim.C:688: error: 'F' was not declared in this scope
    merTrim.C:689: error: 'R' was not declared in this scope
    gmake[1]: *** [merTrim.o] Error 1

     
  • Anonymous

    Anonymous - 2010-11-01

    I just submitted this on a new thread because I feel like it's a completely different problem than the original question, so feel free to answer on the new thread instead of this one.

     

    Last edit: Anonymous 2016-09-17
  • Brian Walenz

    Brian Walenz - 2010-11-02

    The compilation problem was fixed (http://sourceforge.net/tracker/?func=detail&aid=3100910&group_id=106905&atid=645639) but the assembler will not be able to use a mer larger than 31 or 32 in the near future.

    overlapper=ovl is limited to a mer size of 31. At mer size 32, it helpfully crashes (quickly) with:

    Assertion failed: (8 * sizeof (uint64) > 2 * Kmer_Len), function main, file AS_OVL_overlap_common.h, line 841.

    overlapper=ovl is using a 64-bit integer to process kmers, and will be near impossible to fix.

    overlapper=mer is limited to a mer size of 32. Using a larger mer size fails wiith errors like:

    WARNING: mer 'AGACATACTATCATGCACTAGCGAGTGTGTATATATATCTCTATGCTGTA' ('TACAGCATAGAGATATATATACACACTCGCTAGTGCATGATAGTATGTCT') has count 2,0 but posnLen 5,0
    This can be caused by using the wrong merCount file (-mc option) for this data.
    Using the local count for this mer.

    overlapper=mer is using a space-efficient hash table to store the location of each kmer in a read, and this table is also using a 64-bit integer to process kmers. A long time ago I looked at modifying the table to allow up to mer size 46 (or so, I forget the exact limit). I think it should be possible, but it wasn't trivial enough that I could just do it. Going past this limit would take a significant rewrite. It is only this hash table that is limited, and so overmerry at least has a chance of using a larger mer size.

    I'll leave the case open though I don't expect it will be resolved in the near (or far) future. Larger mer sizes aren't something we've needed for either the recent data sets or genomes.

    Summary: the largest mer size you can use is 31 (with ovl) or 32 (with mer).

     
  • Jason Miller

    Jason Miller - 2010-11-16
    • assigned_to: nobody --> brianwalenz
     
  • Brian Walenz

    Brian Walenz - 2012-01-09
    • status: open --> closed-wont-fix
     

Log in to post a comment.

MongoDB Logo MongoDB