tg_index crashes

John Nash
2011-04-07
2013-04-18
  • John Nash
    John Nash
    2011-04-07

    Good morning,

    I mentioned this in a Mac thread, but it's not appropriate so I started a new one.

    Previously, I said:
    "However, gap2caf and tg_index crash if I attempt to run either on my desktop Linux box with 12 GB RAM, and that is just for a 454-generated bacterial genome with 700K reads."

    Andrew replied:
    "tg_index might use a lot of memory when trying to pair reads in a data set where there are few or no pairs. If there are no paired reads, try running tg_index with the -P option to turn off pairing. Alternatively you can use -q to limit the amount of memory used in pairing (a value of 1000000 seems to strike a good balance between speed and memory use)."

    I tried either and both switches(with a smaller assembly).

    Assembly:  454 project, with 200K reads, no pairs assembled using Mira, scaffolded to a related genome.  tg_index works on my BIG unix box, but it's in a village 100 km away from home, with a poor internet connection (for now).

    My little Unix box is a Ubuntu box, 12 GB RAM, 8 CPUs, and tg_index will not work on it. It starts, chugs away and then hangs the system with a load average of 15.  I've done nothing fancy, just a standard install from svn. gap4 works, gap5 works, etc.

    Here's what happens up to the hang.

    $ tg_index -q 10000000 -P -C ../MiraScaff/S_Koessen_mirascaff_assembly/S_Koessen_mirascaff_d_results/S_Koessen_mirascaff_out.caf Koessen_mirascaff

            g_index:        Short Read Alignment Indexer, version 1.2.11

            Author:         James Bonfield (jkb@sanger.ac.uk)
                            2007-2011, Wellcome Trust Sanger Institute

    Selecting output database filename ../MiraScaff/S_Koessen_mirascaff_assembly/S_Koessen_mirascaff_d_results/S_Koessen_mirascaff_out.0
    Processing CAF file ../MiraScaff/S_Koessen_mirascaff_assembly/S_Koessen_mirascaff_d_results/S_Koessen_mirascaff_out.caf
    Loading ../MiraScaff/S_Koessen_mirascaff_assembly/S_Koessen_mirascaff_d_results/S_Koessen_mirascaff_out.caf…
    Input summary
    Reads 203942
    Quality 203943
    Contigs 1
    Bases 203943
    R + C 203943
    Indexed in 5.060475 seconds
    Loading …
    Resizing HacheTable tg_cache to 4096
    Resizing HacheTable tg_cache to 16384
    Resizing HacheTable tg_cache to 65536
    Resizing HacheTable tg_cache to 262144
    Resizing HacheTable gfile->idx_hash to 524288
    Resizing HacheTable tg_cache to 1048576
    Resizing HacheTable gfile->idx_hash to 2097152
    Resizing HacheTable tg_cache to 4194304
    Resizing HacheTable gfile->idx_hash to 8388608
    Resizing HacheTable tg_cache to 16777216

    <hang for hours> This takes 2 min on the BIG box.

    John

     
  • Hi John,

    I must admit you have us baffled.  We tried a mixed 454/capillary Mira assembled caf file with 400k reads and 6k contigs on an 8GB Debian machine and it worked fine.  We then tried to put all the reads in one contig and that worked fine too.  We finally tried to pile up 200k reads in the same position in a contig to see if that was causing the excessive memory usage and, again, no problem.

    I'm not sure we can help without seeing your caf file.  Is it something we can look at?

    Andrew 

     
  • John Nash
    John Nash
    2011-04-07

    When in doubt, blame the computer - which is what I am doing.  I'll email you the CAF file when I get back to Toronto (I'm in the sticks at the lab today) where there is a decent net connection.

    But I'll also recompile and install an svn version, and b8's binary, and its source on a couple of other Unix workstations and virtualboxes to see whether my environment is to blame.

     
  • John Nash
    John Nash
    2011-04-12

    This took a fair bit of work, but apparently I cannot get tg_index to compile properly on my box.

    I have two identically set up Ubuntu 10.10 systems, and a old SUSE system which we can't upgrade because it's pemanently used.

    Ubuntu A - 10 CPUs, 22 GB RAM
    Ubuntu B - 12 CPUs, 12 GB RAM
    SUSE - 16 CPUs, 32 GB RAM

    I have been able to mirror most of the above on a Virtualbox with Ubuntu 10.10 as a guest on my Mac.

    tg_index behaves the same on all systems, i.e. the version in beta8's 64-bit binary works fine, the version in beta8's 32-bit binary throws out "Unable to import data for read XXXXX on contig YYYY Failed to get sequence quality.

    tg_index (from the beta8 source or from svn) compiles fine on all computers but throws the following error when I run it.

    $ /usr/local/staden/bin/tg_index
    /usr/local/staden/bin/tg_index: error while loading shared libraries: libtgap.so: cannot open shared object file: No such file or directory

    I can hear you all saying "Why not stick with beta8''s 64-bit binary?"  I need to trace tags, and that version's gap5  has the tag search bug.  Fortunately, beta8's 32-bit version works fine.

    The benefit of my meanderings is that I discovered that I can use gap5 from the 32-bit package and tg_index from the 64-bit package.

    I cannot get any of the sources (b8 or svn) to execute gap5.  It exits with the following:

    $ gap5
    couldn't load file "/usr/molbin/staden/bin/../lib/staden//libtk_utils.so": libstaden-read.so.1: cannot open shared object file: No such file or directory
        while executing
    "load $env(STADLIB)/${lib_prefix}tk_utils${lib_suffix}"
        (file "/usr/molbin/staden/bin/../share/staden/tcl/gap5/gap5.tcl" line 494)

    So I don't know what I am doing wrong.

    HTH
    John

     
  • James Bonfield
    James Bonfield
    2011-04-13

    You could try building with -disable-rpath. I can't say for sure it's the problem, but it's a simple thing to try. If you do this make sure you do a full make clean before rebuilding and installing.

    Support for rpath was added recently (due to user requests), but it causes as many problems as it solves. Rpath basically encodes the location of the library into binaries, so if you then install then somewhere else or move them it'll fail to find the libraries, even when LD_LIBRARY_PATH is correctly set.

    This can often happen by accident if you rerun configure with a new -prefix and don't do a full make clean before rebuilding and/or installing. Anyway disabling rpath should make most of those issues go away. (Instead you get other issues: you'd need to manually set LD_LIBRARY_PATH if you want to run any of the non-interactive command-line tools like vector_clip.)

    James

     
  • Hi John,

    I've tried to replicate your error on a 32bit machine in my office but with no luck.  I can try with my Ubuntu netbook. It has more of a standard install than my work machine and is solidly 32bit.  I won't be able to do that until tomorrow though.

    Andrew

     
  • Hi John,

    Does your caf file contain very long reads?  James managed to replicate your massive memory use error by including a read over a gig long.  Could this be your problem?

    Andrew

     
  • John Nash
    John Nash
    2011-04-21

    Andrew wrote:

    Does your caf file contain very long reads? James managed to replicate your massive memory use error by including a read over a gig long. Could this be your problem?

    I have different answers for different platforms. My test is with two Salmonella assemblies, both assembled using Mira. The first is a de novo assembly with 800K reads. The second is the same set of reads, assembled with a 5 megabase backbone, using Mira's mapping algorithm. Both input files are CAF files.

    The Mac platform works with both assemblies, converting both into gap5 databases.  My Mac is a MacBook Air with 4 GB RAM. 

    My Ubuntu platform (this happens on two separate computers, including with a fresh OOTB installation of Ubuntu on my test box).  Both are 64-bit computers, have 10 processors and >12 GB RAM.

    The 64-bit b8 binary will convert the de novo assembly. It will not convert the mapping assembly with the 5 megabase backbone.  The error is:

    $ tg_index -P S_Nitra_mirascaff_out.caf 
        g_index:    Short Read Alignment Indexer, version 1.2.11
        Author:     James Bonfield (jkb@sanger.ac.uk)
                    2007-2011, Wellcome Trust Sanger Institute
    Selecting output database filename S_Nitra_mirascaff_out.0
    tg_index.bin: malloc.c:3096: sYSMALLOc: Assertion `(old_top == (((mbinptr) (((char *) &((av)->bins[((1) - 1) * 2])) - __builtin_offsetof (struct malloc_chunk, fd)))) && old_size == 0) || ((unsigned long) (old_size) >= (unsigned long)((((__builtin_offsetof (struct malloc_chunk, fd_nextsize))+((2 * (sizeof(size_t))) - 1)) & ~((2 * (sizeof(size_t))) - 1))) && ((old_top)->size & 0x1) && ((unsigned long)old_end & pagemask) == 0)' failed.
    Aborted
    

    The version from svn (the same source which compiles on my Mac) hangs on either assembly, de novo or mapping:

    $ /usr/local/staden_svn/bin/tg_index -P S_Nitra_mira_out.caf 
        g_index:    Short Read Alignment Indexer, version 1.2.12
        Author:     James Bonfield (jkb@sanger.ac.uk)
                    2007-2011, Wellcome Trust Sanger Institute
    Selecting output database filename S_Nitra_mira_out.0
    Database version=2
    Processing CAF file S_Nitra_mira_out.caf
    Loading S_Nitra_mira_out.caf...
    Input summary
    Reads 799958
    Quality 800132
    Contigs 174
    Bases 800132
    R + C 800132
    Indexed in 7.954034 seconds
    Loading ...
    Resizing HacheTable tg_cache to 4096
    Resizing HacheTable tg_cache to 16384
    Resizing HacheTable tg_cache to 65536
    Resizing HacheTable tg_cache to 262144
    Resizing HacheTable gfile->idx_hash to 524288
    Resizing HacheTable tg_cache to 1048576
    Resizing HacheTable gfile->idx_hash to 2097152
    Resizing HacheTable tg_cache to 4194304
    Resizing HacheTable gfile->idx_hash to 8388608
    Resizing HacheTable tg_cache to 16777216
    

    I also have a 64-bit SUSE server with 32 GB RAM and 20 CPUs (aka my box in the boonies).  tg_index from the 64-bit binary b8 version converts everything fine.  The source won't compile on this box because of outdated libraries which I cannot replace yet.

    I bet dollars to donuts that Ubuntu has a malloc problem!  I am using the Mac version to make my gap5 databases, so functionally, I am fine. However, I am happy to troubleshoot and report bugs for the Ubuntu version for as long as you want.

    HTH
    John

     
  • John,

    Good to know you have a version you can work from.  I'll take another look when I get back in the office after the Easter break.

    Andrew

     
  • James Bonfield
    James Bonfield
    2011-04-25

    This is sounding not so much like a malloc problem on Ubuntu, but perhaps simply that it has additional error checking enabled by default (hence the assertion failure).  Assertions are basically lines of code added to a program to assert that something is true and to crash and burn if not. Basically they are a "this shouldn't happen, but stop if it does" mechanism to detect uncaught bugs.

    It almost certainly is a bug somewhere, just something very hard to identify. When I get time (not likely for a week at least) I'll experiment some more to try and uncover the problem. It sounds like it's data specifc, but having found one existing bug related to extra long sequences I have some more ideas to experiment with.

    James

     
  • John Nash
    John Nash
    2011-04-25

    I get several assertion failures using Ubuntu.  I *could* email them to you now and then.

    John