From: Brian W. <th...@gm...> - 2015-04-29 20:28:17
|
I think this is caused by having more than 2 GBytes of combined sequence ident lines. The fasta/fastq reader was written to allow random access to any sequence, and to allow it by the name of the sequence. When it was written, "big" was human dbEST which contained ~4 GB sequence IIRC. I don't miss the computers from back then, but I do miss the data sizes... At the expense of rewriting the file, you'll see better performance if you strip out the names, merge itty bitty Illumina reads into larger sequences (preserving kmer boundaries), and dump the QVs: http://wgs-assembler.sourceforge.net/wiki/index.php/Fastq-to-fasta-merged.pl Before you ask, no, meryl can't read from a pipe, or gzip. The example that contains this is at: http://wgs-assembler.sourceforge.net/wiki/index.php/Yersinia_pestis_KIM_D27,_using_Illumina_paired-end_reads,_with_CA8.2 You can also tell meryl to run exactly 40 pieces (one per thread) with "-segments 40 -threads 40", which will be optimal use of CPUs. It'll use as much memory as it wants (same as your original command). Doubling the number of segments ("-segments 80 -threads 40") will halve the memory requirement, without impacting run time significantly (I think). The last step in both cases will take the results from each segment and merge them into one file. b On Wed, Apr 29, 2015 at 2:19 PM, mathog <ma...@ca...> wrote: > We just received a bunch of Illumina sequence which has been unpacked > into a single fastq file of 217610498050 bytes. It has not been cleaned > up in any way, so it has lots of N's and many entries which failed the > "chastity filter". Tried to count the tuples on it with this: > > > time ~/wgs*/wgs-8.1/Linux-amd64/bin/meryl -v -B -m 17 -C \ > -s 15659_all.fastq -threads 40 -o killme.table > > and it did this (sorry about the wrap in the backtrace): > > REALLOC len=4194304 from 4194304 to 8388608 > REALLOC len=8388608 from 8388608 to 16777216 > REALLOC len=16777216 from 16777216 to 33554432 > REALLOC len=33554432 from 33554432 to 67108864 > REALLOC len=67108864 from 67108864 to 134217728 > > Failed with 'Segmentation fault' > > The backtrace it emitted was: > > [0] > > /home/mathog/wgs_project/wgs-8.1/Linux-amd64/bin/meryl::AS_UTL_catchCrash(int, > siginfo*, void*) + 0x27 [0x40d477] > [1] /lib64/libpthread.so.0() [0x353d40f710] > [2] /lib64/libc.so.6::(null) + 0x15b [0x353cc897cb] > [3] > > /home/mathog/wgs_project/wgs-8.1/Linux-amd64/bin/meryl::fastqFile::constructIndex() > + 0x551 [0x430e61] > [4] > > /home/mathog/wgs_project/wgs-8.1/Linux-amd64/bin/meryl::fastqFile::fastqFile(char > const*) + 0x43 [0x431323] > [5] > > /home/mathog/wgs_project/wgs-8.1/Linux-amd64/bin/meryl::fastqFile::openFile(char > const*) + 0x109 [0x431499] > [6] > > /home/mathog/wgs_project/wgs-8.1/Linux-amd64/bin/meryl::seqFactory::openFile(char > const*) + 0x3c [0x42ca8c] > [7] > > /home/mathog/wgs_project/wgs-8.1/Linux-amd64/bin/meryl::seqStream::seqStream(char > const*) + 0x42 [0x42d892] > [8] > > /home/mathog/wgs_project/wgs-8.1/Linux-amd64/bin/meryl::prepareBatch(merylArgs*) > + 0xe2 [0x41d882] > [9] > /home/mathog/wgs_project/wgs-8.1/Linux-amd64/bin/meryl::build(merylArgs*) > + 0x67d [0x42110d] > [10] /home/mathog/wgs_project/wgs-8.1/Linux-amd64/bin/meryl::(null) + > 0x158 [0x409578] > > This crash happened after 4 minutes and a few seconds had elapsed. Have > tried it with several other versions of meryl, including from a wgs > built from trunk just this morning, and it always seems to crash the > same way and at about the same place (as judged by run time). The > system has 529G of memory which I would have assumed is sufficient. > > The instructions for meryl are pretty light, so perhaps I missed some > other switch which should be added to the command line? > > Other suggestions? > > Thanks, > > David Mathog > ma...@ca... > Manager, Sequence Analysis Facility, Biology Division, Caltech > > > ------------------------------------------------------------------------------ > One dashboard for servers and applications across Physical-Virtual-Cloud > Widest out-of-the-box monitoring support with 50+ applications > Performance metrics, stats and reports that give you Actionable Insights > Deep dive visibility with transaction tracing using APM Insight. > http://ad.doubleclick.net/ddm/clk/290420510;117567292;y > _______________________________________________ > wgs-assembler-users mailing list > wgs...@li... > https://lists.sourceforge.net/lists/listinfo/wgs-assembler-users > |