You can subscribe to this list here.
2012 |
Jan
(1) |
Feb
(2) |
Mar
|
Apr
(29) |
May
(8) |
Jun
(5) |
Jul
(46) |
Aug
(16) |
Sep
(5) |
Oct
(6) |
Nov
(17) |
Dec
(7) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2013 |
Jan
(5) |
Feb
(2) |
Mar
(10) |
Apr
(13) |
May
(20) |
Jun
(7) |
Jul
(6) |
Aug
(14) |
Sep
(9) |
Oct
(19) |
Nov
(17) |
Dec
(3) |
2014 |
Jan
(3) |
Feb
|
Mar
(7) |
Apr
(1) |
May
(1) |
Jun
(30) |
Jul
(10) |
Aug
(2) |
Sep
(18) |
Oct
(3) |
Nov
(4) |
Dec
(13) |
2015 |
Jan
(27) |
Feb
|
Mar
(19) |
Apr
(12) |
May
(10) |
Jun
(18) |
Jul
(4) |
Aug
(2) |
Sep
(2) |
Oct
|
Nov
(1) |
Dec
(9) |
2016 |
Jan
(6) |
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
(1) |
Aug
(1) |
Sep
(1) |
Oct
|
Nov
|
Dec
|
From: Seth M. <mu...@uw...> - 2015-05-26 15:12:59
|
Hello Everyone, I was running a PBcR through to assembly with nothing in my spec file excet memory options since I share the server. I got all the way to step 9 (terminator) when I got the following error: ----------------------------------------START Tue May 26 02:18:24 2015 /usr/bin/env perl /data/bill.crosby/apps/wgs-8.3rc1/Linux-amd64/bin/caqc.pl -euid /lore/bill.crosby.storage/PI440795/PI440795_Self_Assembled/9-terminator/asm.asm Can't locate Statistics/Descriptive.pm in @INC (@INC contains: /usr/lib64/perl5/site_perl/5.8.8/x86_64-linux-thread-multi /usr/lib/perl5/site_perl/5.8.8 /usr/lib/perl5/site_perl /usr/lib64/perl5/vendor_perl/5.8.8/x86_64-linux-thread-multi /usr/lib/perl5/vendor_perl/5.8.8 /usr/lib/perl5/vendor_perl /usr/lib64/perl5/5.8.8/x86_64-linux-thread-multi /usr/lib/perl5/5.8.8 .) at /data/bill.crosby/apps/wgs-8.3rc1/Linux-amd64/bin/caqc.pl line 18. BEGIN failed--compilation aborted at /data/bill.crosby/apps/wgs-8.3rc1/Linux-amd64/bin/caqc.pl line 18. ----------------------------------------END Tue May 26 02:18:24 2015 (0 seconds) ERROR: Failed with signal INT (2) The Cleaner has arrived. Doing 'none'. ----------------------------------------END Tue May 26 02:18:24 2015 (1490 seconds) I google search tells me that I can try manually running asmOutputFasta to try and make the missing output fasta ( http://sourceforge.net/p/wgs-assembler/mailman/message/33260123/). When I try it the fasta files are only ~3MB. The same link warns that the asm may be incomplete and I might have to repeat step 9 in runCA, this is where I get stuck. I've renamed the 9-terminator folder to 9-terminator-old, but what is the command for runCA to pickup a PBcR run? I tried specifying the directory, prefix, and spec file and after changing to the hash memory options in my spec file i get: Failure message: no fragment files specified, and stores not already created While trying to rerun the PBcR command again gives: Error: requested to output PI440795_Self_Assembled.frg but file already exists. Will not overwrite. Seth Munholland, B.Sc. Department of Biological Sciences Rm. 304 Biology Building University of Windsor 401 Sunset Ave. N9B 3P4 T: (519) 253-3000 Ext: 4755 |
From: Brian W. <th...@gm...> - 2015-05-11 11:12:03
|
60 tb? Repeats ho! For future use, the output of overlapper are gzip compressed, and will be uncompressed, and then doubled to go into the ovlStore. What CA version are you using? Later versions were optimized more for PacBio than for Illumina. One change made data sizes larger than they really need to be. In file AS_global.H, AS_READ_MAX_NORMAL_LEN_BITS should be set to 11 (2047 bases). Smaller won't help. Definitely drop the shorter reads. The historical minimum is 64 bases, but I'd just drop anything that didn't stitch together. Losing the mates here won't degrade the assembly, and will make it easier to run. I don't have a good feel for how much coverage is too much. One way to look at coverage is to compute the expected overlap size. For 60x coverage of 280bp reads, you'd sample a read every 5bp, so the expected overlap is 275bp. 30x coverage would drop this to 270bp. Do you have a (cumulative) histogram of read lengths? In addition to throwing out short reads, definitely increase the minimum overlap size (ovlMinLen) to whatever the length of the shortest read is, -1 (or 2 or ...). What kmer threshold did it pick (0-mercounts, one of the *err files)? Can you send the histogram file? Plotting the first two columns should show a definite hump at the expected coverage, with a large tail. Any humps after that are repeats that probably should be excluded from seeding overlaps. Be sure to check way out on the X axis, with Y zoomed in, for any very common repeats. Increasing the kmer size probably won't reduce disk usage, but might reduce run time, at the expense of sensitivity. Illumina reads are usually pretty clean, and you can probably get away with merSize=28. Any chance there is adapter present? I did that once. Had it completed, I would have ended up with a near complete graph - every read overlapping every other read. If you have pre-trimmed your reads, you can skip OBT, saving a round of overlapper and ovlStore. Finally, nope, mer will use more. Lots and lots more. It (was) appropriate for 454 microbes. b On Fri, May 8, 2015 at 9:01 PM, Langhorst, Brad <Lan...@ne...> wrote: > Hi: > > I’m trying to build unitigs for a ~ 3Gbase organism. > > I’m about 1/6 of the way through the 0-overlaptrim-overlap step and I’m > already using about 10T of disk and I had to pause the job since that’s > all I have...I can’t get 60T of disk. > > I have ~ 1.5B stitched illumina reads = ~ 50 - 280bp (used PEAR to > stitch since wgs could not handle R1+R2 frags due to integer overflow) > i figure about 120X or 60X per allele. > too much? > should I toss half of the shorter reads? > how much will that help with disk space? > > > I’m using the ovl overlapper… > will the mer overlapper use less disk? > > Is there another setting I can change to use less disk? > > > Brad > > spec file: > > merylMemory = 128000 > merylThreads = 16 > > mbtThreads = 8 > > ovlStoreMemory=8192 > > #ovlStoreMemory=10000 > > useGrid = 1 > scriptOnGrid = 0 > > ovlHashBits=25 > > # with this setting I observe about 2 runs at full capacity and one at > 20, - can't use a smaller number because the job count is too large for sge > #ovlHashBlockLength=1800000000 > > ovlHashBlockLength=2880000000 # should be about 400% > ovlThreads=8 > > #ovlOverlapper=mer > #merOverlapperThreads=8 > #merOverlapperSeedBatchSize=1000000 > > > frgCorrBatchSize = 10000000 > frgCorrThreads = 8 > > unitigger=bogart > batThreads=8 > stopAfter=unitigger > sge=-p -100 -pe smp 8 > > frgCorrOnGrid=1 > ovlCorrOnGrid=1 > > > -- > Brad Langhorst, Ph.D. > Applications and Product Development Scientist > > > > ------------------------------------------------------------------------------ > One dashboard for servers and applications across Physical-Virtual-Cloud > Widest out-of-the-box monitoring support with 50+ applications > Performance metrics, stats and reports that give you Actionable Insights > Deep dive visibility with transaction tracing using APM Insight. > http://ad.doubleclick.net/ddm/clk/290420510;117567292;y > _______________________________________________ > wgs-assembler-users mailing list > wgs...@li... > https://lists.sourceforge.net/lists/listinfo/wgs-assembler-users > > |
From: Langhorst, B. <Lan...@ne...> - 2015-05-09 01:34:37
|
Hi: I’m trying to build unitigs for a ~ 3Gbase organism. I’m about 1/6 of the way through the 0-overlaptrim-overlap step and I’m already using about 10T of disk and I had to pause the job since that’s all I have...I can’t get 60T of disk. I have ~ 1.5B stitched illumina reads = ~ 50 - 280bp (used PEAR to stitch since wgs could not handle R1+R2 frags due to integer overflow) i figure about 120X or 60X per allele. too much? should I toss half of the shorter reads? how much will that help with disk space? I’m using the ovl overlapper… will the mer overlapper use less disk? Is there another setting I can change to use less disk? Brad spec file: merylMemory = 128000 merylThreads = 16 mbtThreads = 8 ovlStoreMemory=8192 #ovlStoreMemory=10000 useGrid = 1 scriptOnGrid = 0 ovlHashBits=25 # with this setting I observe about 2 runs at full capacity and one at 20, - can't use a smaller number because the job count is too large for sge #ovlHashBlockLength=1800000000 ovlHashBlockLength=2880000000 # should be about 400% ovlThreads=8 #ovlOverlapper=mer #merOverlapperThreads=8 #merOverlapperSeedBatchSize=1000000 frgCorrBatchSize = 10000000 frgCorrThreads = 8 unitigger=bogart batThreads=8 stopAfter=unitigger sge=-p -100 -pe smp 8 frgCorrOnGrid=1 ovlCorrOnGrid=1 -- Brad Langhorst, Ph.D. Applications and Product Development Scientist |
From: Serge K. <ser...@gm...> - 2015-05-05 02:04:30
|
Hi, Most likely you are missing jellyfish on your system. It comes with the distribution of the assembler (source or pre-compiled) for Linux or OSX. If you have a different system, you need to have jellyfish available in your path. Are any other error messages reported? > On May 2, 2015, at 5:11 AM, Febri Gunawan <feb...@gm...> wrote: > > Hi, I am using Celera assembler, I got this error. ----------------------------------------START Sat May 2 16:05:43 2015 > cp /home/Documents/pacbio/celera/lambda/sampleData//*.14.ignore /home/Documents/pacbio/celera/lambda/sampleData//templambaIll/asm.ignore > cp: cannot stat ‘/home/Documents/pacbio/celera/lambda/sampleData//*.14.ignore’: No such file or directory > ----------------------------------------END Sat May 2 16:05:43 2015 (0 seconds) > Failed to execute cp /home/Documents/pacbio/celera/lambda/sampleData//*.14.ignore /home/Documents/pacbio/celera/lambda/sampleData//templambaIll/asm.ignore > > I don't know what is that, and why it can happen. Thank you > > -- > Best regards, > Febri Gunawan Sugiokto > Fakultas Teknobiologi > Universitas Katolik Indonesia Atma Jaya > Jakarta > > ------------------------------------------------------------------------------ > One dashboard for servers and applications across Physical-Virtual-Cloud > Widest out-of-the-box monitoring support with 50+ applications > Performance metrics, stats and reports that give you Actionable Insights > Deep dive visibility with transaction tracing using APM Insight. > http://ad.doubleclick.net/ddm/clk/290420510;117567292;y_______________________________________________ > wgs-assembler-users mailing list > wgs...@li... > https://lists.sourceforge.net/lists/listinfo/wgs-assembler-users |
From: Febri G. <feb...@gm...> - 2015-05-02 09:11:48
|
Hi, I am using Celera assembler, I got this error. ----------------------------------------START Sat May 2 16:05:43 2015 cp /home/Documents/pacbio/celera/lambda/sampleData//*.14.ignore /home/Documents/pacbio/celera/lambda/sampleData//templambaIll/asm.ignore cp: cannot stat ‘/home/Documents/pacbio/celera/lambda/sampleData//*.14.ignore’: No such file or directory ----------------------------------------END Sat May 2 16:05:43 2015 (0 seconds) Failed to execute cp /home/Documents/pacbio/celera/lambda/sampleData//*.14.ignore /home/Documents/pacbio/celera/lambda/sampleData//templambaIll/asm.ignore I don't know what is that, and why it can happen. Thank you -- Best regards, Febri Gunawan Sugiokto Fakultas Teknobiologi Universitas Katolik Indonesia Atma Jaya Jakarta |
From: Brian W. <th...@gm...> - 2015-04-30 04:54:23
|
On Wed, Apr 29, 2015 at 5:41 PM, mathog <ma...@ca...> wrote: > On 29-Apr-2015 14:19, Brian Walenz wrote: > >> Plus, the assembler does completely ignore >> the name, getting pairing from ordering in the fastq. >> > > So I take it one should not apply any filtering to the Illumina data > before entering it into CA that would cause the fastq files to get out of > sync? > Yeah, that'd be bad. Probably worse than giving the same file name twice, which I've done before. IIRC, the assembly wasn't total garbage, but had bizarre stats in the QC report. The original read names are stashed in *.gkpStore.fastqUIDmap, one line per mate pair, if you want to check previous assemblies. b |
From: mathog <ma...@ca...> - 2015-04-29 21:41:33
|
On 29-Apr-2015 14:19, Brian Walenz wrote: > Plus, the assembler does completely ignore > the name, getting pairing from ordering in the fastq. So I take it one should not apply any filtering to the Illumina data before entering it into CA that would cause the fastq files to get out of sync? Thanks, David Mathog ma...@ca... Manager, Sequence Analysis Facility, Biology Division, Caltech |
From: Brian W. <th...@gm...> - 2015-04-29 21:20:03
|
The version used in the assembler access sequence in the gkpStore directly, so it skips the index building. Plus, the assembler does completely ignore the name, getting pairing from ordering in the fastq. If you've got the memory for it, jellyfish is great. If it doesn't have enough memory, it seems to be slower than meryl. They both end up writing intermediate files to disk, and merging at the end. I wonder if Illumina (the company) owns stock in Seagate and/or Western Digital. On Wed, Apr 29, 2015 at 4:57 PM, mathog <ma...@ca...> wrote: > On 29-Apr-2015 13:28, Brian Walenz wrote: > >> I think this is caused by having more than 2 GBytes of combined sequence >> ident lines. The fasta/fastq reader was written to allow random access to >> any sequence, and to allow it by the name of the sequence. When it was >> written, "big" was human dbEST which contained ~4 GB sequence IIRC. I >> don't miss the computers from back then, but I do miss the data sizes... >> > > The header lines are not very large, they all look like this: > > @HISEQ:348:H2YWCBCXX:1:1101:1057:2031 1:Y:0: > > The file is 217610498050 bytes, the reads are 150bp and that line is 44 > characters, so the number of reads is about: > > 217610498050/(44+1+150+1+1+1+150+1) > 623525782 > giving ~total header length of: > 623525782*45 = ~ 28Gb. > > which is just a wee bit bigger than 2Gb! > > Isn't this bug going to cause meryl to blow up with this data in some run > modes of the assembler? In the assembler one could not (completely) remove > the sequence names or the assembler could never find the pairs. About half > of that string is common in all ident strings, but removing it would only > reduce the total to 14 Gb of ident strings, and the issue would remain. > > In any case, for now I worked around this by using jellyfish instead or > meryl, like: > > jellyfish count -m 17 -C -s 800000000 -t 44 15659_all.fastq > jellyfish histo -t 44 mer_counts.jf >mer_counts.histo > > The first took a bit under 14 minutes and the second just under 5 > minutes. Jellyfish didn't give any warnings or errors when it ran. > > > Thanks, > > David Mathog > ma...@ca... > Manager, Sequence Analysis Facility, Biology Division, Caltech > |
From: mathog <ma...@ca...> - 2015-04-29 20:58:04
|
On 29-Apr-2015 13:28, Brian Walenz wrote: > I think this is caused by having more than 2 GBytes of combined > sequence > ident lines. The fasta/fastq reader was written to allow random access > to > any sequence, and to allow it by the name of the sequence. When it was > written, "big" was human dbEST which contained ~4 GB sequence IIRC. I > don't miss the computers from back then, but I do miss the data > sizes... The header lines are not very large, they all look like this: @HISEQ:348:H2YWCBCXX:1:1101:1057:2031 1:Y:0: The file is 217610498050 bytes, the reads are 150bp and that line is 44 characters, so the number of reads is about: 217610498050/(44+1+150+1+1+1+150+1) 623525782 giving ~total header length of: 623525782*45 = ~ 28Gb. which is just a wee bit bigger than 2Gb! Isn't this bug going to cause meryl to blow up with this data in some run modes of the assembler? In the assembler one could not (completely) remove the sequence names or the assembler could never find the pairs. About half of that string is common in all ident strings, but removing it would only reduce the total to 14 Gb of ident strings, and the issue would remain. In any case, for now I worked around this by using jellyfish instead or meryl, like: jellyfish count -m 17 -C -s 800000000 -t 44 15659_all.fastq jellyfish histo -t 44 mer_counts.jf >mer_counts.histo The first took a bit under 14 minutes and the second just under 5 minutes. Jellyfish didn't give any warnings or errors when it ran. Thanks, David Mathog ma...@ca... Manager, Sequence Analysis Facility, Biology Division, Caltech |
From: Brian W. <th...@gm...> - 2015-04-29 20:28:17
|
I think this is caused by having more than 2 GBytes of combined sequence ident lines. The fasta/fastq reader was written to allow random access to any sequence, and to allow it by the name of the sequence. When it was written, "big" was human dbEST which contained ~4 GB sequence IIRC. I don't miss the computers from back then, but I do miss the data sizes... At the expense of rewriting the file, you'll see better performance if you strip out the names, merge itty bitty Illumina reads into larger sequences (preserving kmer boundaries), and dump the QVs: http://wgs-assembler.sourceforge.net/wiki/index.php/Fastq-to-fasta-merged.pl Before you ask, no, meryl can't read from a pipe, or gzip. The example that contains this is at: http://wgs-assembler.sourceforge.net/wiki/index.php/Yersinia_pestis_KIM_D27,_using_Illumina_paired-end_reads,_with_CA8.2 You can also tell meryl to run exactly 40 pieces (one per thread) with "-segments 40 -threads 40", which will be optimal use of CPUs. It'll use as much memory as it wants (same as your original command). Doubling the number of segments ("-segments 80 -threads 40") will halve the memory requirement, without impacting run time significantly (I think). The last step in both cases will take the results from each segment and merge them into one file. b On Wed, Apr 29, 2015 at 2:19 PM, mathog <ma...@ca...> wrote: > We just received a bunch of Illumina sequence which has been unpacked > into a single fastq file of 217610498050 bytes. It has not been cleaned > up in any way, so it has lots of N's and many entries which failed the > "chastity filter". Tried to count the tuples on it with this: > > > time ~/wgs*/wgs-8.1/Linux-amd64/bin/meryl -v -B -m 17 -C \ > -s 15659_all.fastq -threads 40 -o killme.table > > and it did this (sorry about the wrap in the backtrace): > > REALLOC len=4194304 from 4194304 to 8388608 > REALLOC len=8388608 from 8388608 to 16777216 > REALLOC len=16777216 from 16777216 to 33554432 > REALLOC len=33554432 from 33554432 to 67108864 > REALLOC len=67108864 from 67108864 to 134217728 > > Failed with 'Segmentation fault' > > The backtrace it emitted was: > > [0] > > /home/mathog/wgs_project/wgs-8.1/Linux-amd64/bin/meryl::AS_UTL_catchCrash(int, > siginfo*, void*) + 0x27 [0x40d477] > [1] /lib64/libpthread.so.0() [0x353d40f710] > [2] /lib64/libc.so.6::(null) + 0x15b [0x353cc897cb] > [3] > > /home/mathog/wgs_project/wgs-8.1/Linux-amd64/bin/meryl::fastqFile::constructIndex() > + 0x551 [0x430e61] > [4] > > /home/mathog/wgs_project/wgs-8.1/Linux-amd64/bin/meryl::fastqFile::fastqFile(char > const*) + 0x43 [0x431323] > [5] > > /home/mathog/wgs_project/wgs-8.1/Linux-amd64/bin/meryl::fastqFile::openFile(char > const*) + 0x109 [0x431499] > [6] > > /home/mathog/wgs_project/wgs-8.1/Linux-amd64/bin/meryl::seqFactory::openFile(char > const*) + 0x3c [0x42ca8c] > [7] > > /home/mathog/wgs_project/wgs-8.1/Linux-amd64/bin/meryl::seqStream::seqStream(char > const*) + 0x42 [0x42d892] > [8] > > /home/mathog/wgs_project/wgs-8.1/Linux-amd64/bin/meryl::prepareBatch(merylArgs*) > + 0xe2 [0x41d882] > [9] > /home/mathog/wgs_project/wgs-8.1/Linux-amd64/bin/meryl::build(merylArgs*) > + 0x67d [0x42110d] > [10] /home/mathog/wgs_project/wgs-8.1/Linux-amd64/bin/meryl::(null) + > 0x158 [0x409578] > > This crash happened after 4 minutes and a few seconds had elapsed. Have > tried it with several other versions of meryl, including from a wgs > built from trunk just this morning, and it always seems to crash the > same way and at about the same place (as judged by run time). The > system has 529G of memory which I would have assumed is sufficient. > > The instructions for meryl are pretty light, so perhaps I missed some > other switch which should be added to the command line? > > Other suggestions? > > Thanks, > > David Mathog > ma...@ca... > Manager, Sequence Analysis Facility, Biology Division, Caltech > > > ------------------------------------------------------------------------------ > One dashboard for servers and applications across Physical-Virtual-Cloud > Widest out-of-the-box monitoring support with 50+ applications > Performance metrics, stats and reports that give you Actionable Insights > Deep dive visibility with transaction tracing using APM Insight. > http://ad.doubleclick.net/ddm/clk/290420510;117567292;y > _______________________________________________ > wgs-assembler-users mailing list > wgs...@li... > https://lists.sourceforge.net/lists/listinfo/wgs-assembler-users > |
From: mathog <ma...@ca...> - 2015-04-29 18:19:41
|
We just received a bunch of Illumina sequence which has been unpacked into a single fastq file of 217610498050 bytes. It has not been cleaned up in any way, so it has lots of N's and many entries which failed the "chastity filter". Tried to count the tuples on it with this: time ~/wgs*/wgs-8.1/Linux-amd64/bin/meryl -v -B -m 17 -C \ -s 15659_all.fastq -threads 40 -o killme.table and it did this (sorry about the wrap in the backtrace): REALLOC len=4194304 from 4194304 to 8388608 REALLOC len=8388608 from 8388608 to 16777216 REALLOC len=16777216 from 16777216 to 33554432 REALLOC len=33554432 from 33554432 to 67108864 REALLOC len=67108864 from 67108864 to 134217728 Failed with 'Segmentation fault' The backtrace it emitted was: [0] /home/mathog/wgs_project/wgs-8.1/Linux-amd64/bin/meryl::AS_UTL_catchCrash(int, siginfo*, void*) + 0x27 [0x40d477] [1] /lib64/libpthread.so.0() [0x353d40f710] [2] /lib64/libc.so.6::(null) + 0x15b [0x353cc897cb] [3] /home/mathog/wgs_project/wgs-8.1/Linux-amd64/bin/meryl::fastqFile::constructIndex() + 0x551 [0x430e61] [4] /home/mathog/wgs_project/wgs-8.1/Linux-amd64/bin/meryl::fastqFile::fastqFile(char const*) + 0x43 [0x431323] [5] /home/mathog/wgs_project/wgs-8.1/Linux-amd64/bin/meryl::fastqFile::openFile(char const*) + 0x109 [0x431499] [6] /home/mathog/wgs_project/wgs-8.1/Linux-amd64/bin/meryl::seqFactory::openFile(char const*) + 0x3c [0x42ca8c] [7] /home/mathog/wgs_project/wgs-8.1/Linux-amd64/bin/meryl::seqStream::seqStream(char const*) + 0x42 [0x42d892] [8] /home/mathog/wgs_project/wgs-8.1/Linux-amd64/bin/meryl::prepareBatch(merylArgs*) + 0xe2 [0x41d882] [9] /home/mathog/wgs_project/wgs-8.1/Linux-amd64/bin/meryl::build(merylArgs*) + 0x67d [0x42110d] [10] /home/mathog/wgs_project/wgs-8.1/Linux-amd64/bin/meryl::(null) + 0x158 [0x409578] This crash happened after 4 minutes and a few seconds had elapsed. Have tried it with several other versions of meryl, including from a wgs built from trunk just this morning, and it always seems to crash the same way and at about the same place (as judged by run time). The system has 529G of memory which I would have assumed is sufficient. The instructions for meryl are pretty light, so perhaps I missed some other switch which should be added to the command line? Other suggestions? Thanks, David Mathog ma...@ca... Manager, Sequence Analysis Facility, Biology Division, Caltech |
From: Brian W. <th...@gm...> - 2015-04-24 02:36:07
|
The bad news is that it took me a week to reply. Sorry. The good news is that runCA and PBcR are quite tolerant of being killed. Just rerun the same command and it'll pick up where it left off. You might need to remove the 'overlap.sh' file first (it will probably tell you to do this). I'd suggest decreasing the hash and ref parameters. As set, you got one ginormous overlap job. Since it was PBS that killed the run, I assume there are quite a few spare CPUs it could be using. Remove the whole 0-overlaptrim-overlap directory before restarting with modified parameters. I don't know if runCA/PBcR will submit to PBS directly. Setting "useGrid=1 scriptOnGrid=0" will configure the overlap jobs (making the 0-overlaptrim-overlap directory and overlap.sh script) and print a command for you to cut and paste to launch the jobs. When those jobs finish, you can restart PBcR to finish up. b On Thu, Apr 16, 2015 at 10:16 PM, Stefania Bertazzoni < ste...@po...> wrote: > Good Morning Brian, > > > I was trying to assemble my set of Pacbio Reads with CA 8.3 and > everything seemed to proceed beautifully, until I realised I allocated a > walltime too short for the assembly to finish. I understand this is not a > bug, just my fault in underestimating the required running time. I am very > new to this field so I am sure asking for advice is much better than try > myself to restart the assembly from where it ended. > > > In the online documentation is mentioned "you could run the assembly by > hand" I'm not sure on how to do it. I am leaving unchanged the original > folder structure. > > > Here is the exact moment in which my job was killed: > > > ----------------------------------------START Wed Apr 15 20:23:01 2015 > /home/ssb573/bin/wgs-8.3rc1/Linux-amd64/bin/meryl -Dt -n 238 -s > /work1/ssb573/Pacbio/step1_IlluminaCorrect/PacbioSelfCorrection83/selfcorr14aprhipar/0-mercounts/asm-C-ms22-cm0 > > > /work1/ssb573/Pacbio/step1_IlluminaCorrect/PacbioSelfCorrection83/selfcorr14aprhipar/0-mercounts/asm.nmers.ovl.fasta > 2> > /work1/ssb573/Pacbio/step1_IlluminaCorrect/PacbioSelfCorrection83/selfcorr14aprhipar/0-mercounts/asm.nmers.ovl.fasta.err > > ----------------------------------------END Wed Apr 15 20:23:06 2015 (5 > seconds) > Reset OBT mer threshold from to 238. > Reset OVL mer threshold from to 238. > ----------------------------------------START Wed Apr 15 20:23:06 2015 > /home/ssb573/bin/wgs-8.3rc1/Linux-amd64/bin/overlap_partition \ > -g > /work1/ssb573/Pacbio/step1_IlluminaCorrect/PacbioSelfCorrection83/selfcorr14aprhipar/asm.gkpStore > \ > -bl 1000000000 \ > -bs 0 \ > -rs 0 \ > -rl 100000000000 \ > -o > /work1/ssb573/Pacbio/step1_IlluminaCorrect/PacbioSelfCorrection83/selfcorr14aprhipar/0-overlaptrim-overlap > \ > > > /work1/ssb573/Pacbio/step1_IlluminaCorrect/PacbioSelfCorrection83/selfcorr14aprhipar/0-overlaptrim-overlap/overlap_partition.err > 2>&1 > ----------------------------------------END Wed Apr 15 20:23:06 2015 (0 > seconds) > Created 1 overlap jobs. Last batch '001', last job '000001'. > ----------------------------------------START CONCURRENT Wed Apr 15 > 20:23:06 2015 > /work1/ssb573/Pacbio/step1_IlluminaCorrect/PacbioSelfCorrection83/selfcorr14aprhipar/0-overlaptrim-overlap/overlap.sh > 1 > > /work1/ssb573/Pacbio/step1_IlluminaCorrect/PacbioSelfCorrection83/selfcorr14aprhipar/0-overlaptrim-overlap/000001.out > 2>&1 > =>> PBS: job killed: walltime 180082 exceeded limit 180000 > > > can you please point me in the right direction? > > > Kind Regards > > > Stefania > > > ------------------------------------------------------------------------------ > BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT > Develop your own process in accordance with the BPMN 2 standard > Learn Process modeling best practices with Bonita BPM through live > exercises > http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual- > event?utm_ > source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF > _______________________________________________ > wgs-assembler-users mailing list > wgs...@li... > https://lists.sourceforge.net/lists/listinfo/wgs-assembler-users > > |
From: Stefania B. <ste...@po...> - 2015-04-17 02:48:50
|
Good Morning Brian, I was trying to assemble my set of Pacbio Reads with CA 8.3 and everything seemed to proceed beautifully, until I realised I allocated a walltime too short for the assembly to finish. I understand this is not a bug, just my fault in underestimating the required running time. I am very new to this field so I am sure asking for advice is much better than try myself to restart the assembly from where it ended. In the online documentation is mentioned "you could run the assembly by hand" I'm not sure on how to do it. I am leaving unchanged the original folder structure. Here is the exact moment in which my job was killed: ----------------------------------------START Wed Apr 15 20:23:01 2015 /home/ssb573/bin/wgs-8.3rc1/Linux-amd64/bin/meryl -Dt -n 238 -s /work1/ssb573/Pacbio/step1_IlluminaCorrect/PacbioSelfCorrection83/selfcorr14aprhipar/0-mercounts/asm-C-ms22-cm0 > /work1/ssb573/Pacbio/step1_IlluminaCorrect/PacbioSelfCorrection83/selfcorr14aprhipar/0-mercounts/asm.nmers.ovl.fasta 2> /work1/ssb573/Pacbio/step1_IlluminaCorrect/PacbioSelfCorrection83/selfcorr14aprhipar/0-mercounts/asm.nmers.ovl.fasta.err ----------------------------------------END Wed Apr 15 20:23:06 2015 (5 seconds) Reset OBT mer threshold from to 238. Reset OVL mer threshold from to 238. ----------------------------------------START Wed Apr 15 20:23:06 2015 /home/ssb573/bin/wgs-8.3rc1/Linux-amd64/bin/overlap_partition \ -g /work1/ssb573/Pacbio/step1_IlluminaCorrect/PacbioSelfCorrection83/selfcorr14aprhipar/asm.gkpStore \ -bl 1000000000 \ -bs 0 \ -rs 0 \ -rl 100000000000 \ -o /work1/ssb573/Pacbio/step1_IlluminaCorrect/PacbioSelfCorrection83/selfcorr14aprhipar/0-overlaptrim-overlap \ > /work1/ssb573/Pacbio/step1_IlluminaCorrect/PacbioSelfCorrection83/selfcorr14aprhipar/0-overlaptrim-overlap/overlap_partition.err 2>&1 ----------------------------------------END Wed Apr 15 20:23:06 2015 (0 seconds) Created 1 overlap jobs. Last batch '001', last job '000001'. ----------------------------------------START CONCURRENT Wed Apr 15 20:23:06 2015 /work1/ssb573/Pacbio/step1_IlluminaCorrect/PacbioSelfCorrection83/selfcorr14aprhipar/0-overlaptrim-overlap/overlap.sh 1 > /work1/ssb573/Pacbio/step1_IlluminaCorrect/PacbioSelfCorrection83/selfcorr14aprhipar/0-overlaptrim-overlap/000001.out 2>&1 =>> PBS: job killed: walltime 180082 exceeded limit 180000 can you please point me in the right direction? Kind Regards Stefania |
From: Brian W. <th...@gm...> - 2015-04-14 00:27:03
|
That was one of the pieces we could never effectively thread, due to a large quantity of thread-unsafe code. There was some work starting in CA8.0 to better prune the merges attempted, enabled with cgwMergeFilterLevel=2 (or 5, but scaffolding suffers). You can also increase the minimum number of mate pairs required for a scaffold join with cgwMinMergeWeight=2 (the default). This equates to cgw opiton -minmergeweight. I don't think masurca supports either though. With manual intervention, you can stop scaffolding at any time and move on to the next algorithmic step. The process is hopefully described adequately at: http://wgs-assembler.sourceforge.net/wiki/index.php/Scaffolder_failure under 'force it to recompute' about half way down the page. The summary is to kill the existing cgw, edit the 7-0-CGW/*timing file to change the '(logical ckp*)' to '(logical ckp05-1SM)', and restart. This forces cgw to restart from the algorithmic step after the one it is stuck on. b On Mon, Apr 13, 2015 at 3:49 PM, mathog <ma...@ca...> wrote: > Hi, > > I have been running a bunch of MaSuRCA jobs, testing how well it does > with different simulated library combinations on simulated homozygous > and heterozgyous (4% difference) genomes using only Illumina data. The > homozygous ones finish in about 8 hours. The one heterozygous one to > complete to date took 29 hours. Almost all of the extra time was spent > with cgw running single threaded, or at least it hovers at just around > 100% CPU, which is pretty low on a 46 core machine. So the issue seems > to be in the (modified) wgs assembler, not in the steps that MaSuRCA > runs before that. An earlier experiment with real(ly bad) data saw cgw > run for 29 _days_ before MaSuRCA crashed - almost all of that as cgw > running on one core. > > Is there a way to speed cgw up? Some command line switch which has been > omitted or set incorrectly? Here is the cgw command on the diploid test > that is running now (which is currently showing 6 hours of cumulative > CPU time, and has 8.4G resident): > > /opt/MaSuRCA/CA/Linux-amd64/bin/cgw -j 1 -k 5 -r 5 -s 2 -z -P 2 -B > 167006 -m 100 -g > /home/mathog/wgs_project/do_masurca_diploid/CA/genome.gkpStore -t > /home/mathog/wgs_project/do_masurca_diploid/CA/genome.tigStore -o > /home/mathog/wgs_project/do_masurca_diploid/CA/7-0-CGW/genome > > The "diploid" genome in this experiment is C elegans, with one wild type > copy and one mutated copy. > > Thanks, > > David Mathog > ma...@ca... > Manager, Sequence Analysis Facility, Biology Division, Caltech > > > ------------------------------------------------------------------------------ > BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT > Develop your own process in accordance with the BPMN 2 standard > Learn Process modeling best practices with Bonita BPM through live > exercises > http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual- > event?utm_ > source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF > _______________________________________________ > wgs-assembler-users mailing list > wgs...@li... > https://lists.sourceforge.net/lists/listinfo/wgs-assembler-users > |
From: mathog <ma...@ca...> - 2015-04-13 19:49:31
|
Hi, I have been running a bunch of MaSuRCA jobs, testing how well it does with different simulated library combinations on simulated homozygous and heterozgyous (4% difference) genomes using only Illumina data. The homozygous ones finish in about 8 hours. The one heterozygous one to complete to date took 29 hours. Almost all of the extra time was spent with cgw running single threaded, or at least it hovers at just around 100% CPU, which is pretty low on a 46 core machine. So the issue seems to be in the (modified) wgs assembler, not in the steps that MaSuRCA runs before that. An earlier experiment with real(ly bad) data saw cgw run for 29 _days_ before MaSuRCA crashed - almost all of that as cgw running on one core. Is there a way to speed cgw up? Some command line switch which has been omitted or set incorrectly? Here is the cgw command on the diploid test that is running now (which is currently showing 6 hours of cumulative CPU time, and has 8.4G resident): /opt/MaSuRCA/CA/Linux-amd64/bin/cgw -j 1 -k 5 -r 5 -s 2 -z -P 2 -B 167006 -m 100 -g /home/mathog/wgs_project/do_masurca_diploid/CA/genome.gkpStore -t /home/mathog/wgs_project/do_masurca_diploid/CA/genome.tigStore -o /home/mathog/wgs_project/do_masurca_diploid/CA/7-0-CGW/genome The "diploid" genome in this experiment is C elegans, with one wild type copy and one mutated copy. Thanks, David Mathog ma...@ca... Manager, Sequence Analysis Facility, Biology Division, Caltech |
From: Serge K. <ser...@gm...> - 2015-04-01 15:16:49
|
Hi, As listed on the documentation for PBcR, there is no support for Illumina-based correction of Oxford data: “While PBcR supports high-identity data for correction, this mode is no longer being updated, is not supported for MinION data, and is significantly slower than self-correction due to its reliance on BLASR (or a built-in module) for overlap computation." You can only perform the self-correction using only oxford data. Just remove the frg file from your PBcR command and it should run. Sergey > On Apr 1, 2015, at 10:15 AM, Stephen Taylor <ste...@im...> wrote: > > Hi, > > I am trying to assemble some Oxford Nanopore reads with Illumina PE Sequences using PBcR (8.3rc2). > > So I prepare the Illumina sequences like this > > fastqToCA -libraryname illumina_paired_end -technology illumina -type sanger -mates Lamp_R1_001.fastq,Lamp_R2_001.fastq -insertsize 200 50 > illumina_paired_end.frg > > and then do the assembly thus: > > PBcR -length 500 -partitions 200 -libraryname lamp60 -s oxford.spec -fastq exp16.fastq -genomeSize 60000 illumina_paired_end.frg > > But I get > > Error: The PacBio library 0 must be the last library loaded but it preceedes 1. Please double-check your input files and try again. at /package/wgs/8.3rc2/Linux-amd64/bin/PBcR line 1470. > > I notice this error had been posted before but I think mine is different since ' illumina_paired_end' is the illumina library name and ' lamp60' is the assembled library name so there shouldn't be a conflict. > > http://sourceforge.net/p/wgs-assembler/mailman/wgs-assembler-users/?viewmonth=201307 > > Please can some give me some idea what I should do? > > Kind regards and thanks, > > Steve > > > ------------------------------------------------------------------------------ > Dive into the World of Parallel Programming The Go Parallel Website, sponsored > by Intel and developed in partnership with Slashdot Media, is your hub for all > things parallel software development, from weekly thought leadership blogs to > news, videos, case studies, tutorials and more. Take a look and join the > conversation now. http://goparallel.sourceforge.net/ > _______________________________________________ > wgs-assembler-users mailing list > wgs...@li... > https://lists.sourceforge.net/lists/listinfo/wgs-assembler-users |
From: Stephen T. <ste...@im...> - 2015-04-01 14:40:38
|
Hi, I am trying to assemble some Oxford Nanopore reads with Illumina PE Sequences using PBcR (8.3rc2). So I prepare the Illumina sequences like this fastqToCA -libraryname illumina_paired_end -technology illumina -type sanger -mates Lamp_R1_001.fastq,Lamp_R2_001.fastq -insertsize 200 50 > illumina_paired_end.frg and then do the assembly thus: PBcR -length 500 -partitions 200 -libraryname lamp60 -s oxford.spec -fastq exp16.fastq -genomeSize 60000 illumina_paired_end.frg But I get Error: The PacBio library 0 must be the last library loaded but it preceedes 1. Please double-check your input files and try again. at /package/wgs/8.3rc2/Linux-amd64/bin/PBcR line 1470. I notice this error had been posted before but I think mine is different since ' illumina_paired_end' is the illumina library name and ' lamp60' is the assembled library name so there shouldn't be a conflict. http://sourceforge.net/p/wgs-assembler/mailman/wgs-assembler-users/?viewmonth=201307 Please can some give me some idea what I should do? Kind regards and thanks, Steve |
From: Serge K. <ser...@gm...> - 2015-03-26 20:51:44
|
Hi, This is because the program sets the ovlRefBlockLength by default. Just remove the ovlRefBlockSize from your spec file (or add ovlRefBlockLength=0) and it should run. Sergey > On Mar 25, 2015, at 8:55 PM, Adrian Pelin <ape...@gm...> wrote: > > Hello, > > My run ends with this bizarre error (see below). I ran: > ~/programs/wgs-8.3rc1/Linux-amd64/bin/pacBioToCA -length 500 -partitions > 200 -l ec_pacbio -t 16 -s pacbio.spec -fastq PacBio.fastq Lib3_PE.frg > > run.out 2>&1 > > and my pacbio.spec is: > stopAfter=overlapper > > # original asm settings > utgErrorRate = 0.25 > utgErrorLimit = 4.5 > > cnsErrorRate = 0.25 > cgwErrorRate = 0.25 > ovlErrorRate = 0.25 > > merSize=14 > > merylMemory = 128000 > merylThreads = 16 > > ovlStoreMemory = 8192 > > # grid info > useGrid = 0 > scriptOnGrid = 0 > frgCorrOnGrid = 0 > ovlCorrOnGrid = 0 > > sge = -A assembly > sgeScript = -pe threads 16 > sgeConsensus = -pe threads 1 > sgeOverlap = -pe threads 2 > sgeFragmentCorrection = -pe threads 2 > sgeOverlapCorrection = -pe threads 1 > > #ovlMemory=8GB --hashload 0.7 > ovlHashBits = 25 > ovlThreads = 2 > ovlHashBlockLength = 20000000 > ovlRefBlockSize = 50000000 > > # for mer overlapper > merCompression = 1 > merOverlapperSeedBatchSize = 500000 > merOverlapperExtendBatchSize = 250000 > > frgCorrThreads = 2 > frgCorrBatchSize = 100000 > > ovlCorrBatchSize = 100000 > > # non-Grid settings, if you set useGrid to 0 above these will be used > merylMemory = 128000 > merylThreads = 4 > > ovlStoreMemory = 8192 > > ovlConcurrency = 8 > > cnsConcurrency = 8 > > merOverlapperThreads = 3 > merOverlapperSeedConcurrency = 3 > merOverlapperExtendConcurrency = 3 > > frgCorrConcurrency = 2 > ovlCorrConcurrency = 4 > cnsConcurrency = 4 > > > LOG: > > tail run.out -f > ----------------------------------------START Wed Mar 25 18:47:55 2015 > mkdir tempec_pacbio > ----------------------------------------END Wed Mar 25 18:47:55 2015 (0 > seconds) > ----------------------------------------START Wed Mar 25 18:47:55 2015 > /home/adrian/programs/wgs-8.3rc1/Linux-amd64/bin/fastqToCA -libraryname > ec_pacbio -type sanger -technology none -feature doConsensusCorrection 1 > -reads /home/adrian/Rozella/reads/PacBio.fastq > > /home/adrian/Rozella/reads//tempec_pacbio/ec_pacbio.frg > ----------------------------------------END Wed Mar 25 18:47:55 2015 (0 > seconds) > ----------------------------------------START Wed Mar 25 18:47:55 2015 > /home/adrian/programs/wgs-8.3rc1/Linux-amd64/bin/runCA -s > /home/adrian/Rozella/reads//tempec_pacbio/ec_pacbio.spec -p asm -d > tempec_pacbio stopAfter=initialStoreBuilding > /home/adrian/Rozella/reads/Lib3_PE.frg > /home/adrian/Rozella/reads//tempec_pacbio/ec_pacbio.frg > ----------------------------------------START Wed Mar 25 18:47:55 2015 > /home/adrian/programs/wgs-8.3rc1/Linux-amd64/bin/gatekeeper -o > /home/adrian/Rozella/reads/tempec_pacbio/asm.gkpStore.BUILDING -F > /home/adrian/Rozella/reads/Lib3_PE.frg > /home/adrian/Rozella/reads//tempec_pacbio/ec_pacbio.frg > > /home/adrian/Rozella/reads/tempec_pacbio/asm.gkpStore.err 2>&1 > ----------------------------------------END Wed Mar 25 18:53:44 2015 > (349 seconds) > numFrags = 22786178 > Stop requested after 'initialstorebuilding'. > ----------------------------------------END Wed Mar 25 18:53:44 2015 > (349 seconds) > Will be correcting PacBio library 2 with librarie[s] 1 - 1 > ----------------------------------------START Wed Mar 25 18:53:53 2015 > /home/adrian/programs/wgs-8.3rc1/Linux-amd64/bin/gatekeeper > -dumpfragments -invert -tabular -longestovermin 2 500 -longestlength 2 > 412825510 /home/adrian/Rozella/reads//tempec_pacbio/asm.gkpStore 2> > /home/adrian/Rozella/reads//tempec_pacbio/asm.seedlength |awk '{if > (!(match($1, "UID") != 0 && length($1) == 3)) { print "frg uid "$1" > isdeleted 1"; } }' > > /home/adrian/Rozella/reads//tempec_pacbio/asm.toerase.uid > ----------------------------------------END Wed Mar 25 18:54:00 2015 (7 > seconds) > ----------------------------------------START Wed Mar 25 18:54:00 2015 > /home/adrian/programs/wgs-8.3rc1/Linux-amd64/bin/gatekeeper --edit > /home/adrian/Rozella/reads//tempec_pacbio/asm.toerase.uid > /home/adrian/Rozella/reads//tempec_pacbio/asm.gkpStore > > /home/adrian/Rozella/reads//tempec_pacbio/asm.toerase.out 2> > /home/adrian/Rozella/reads//tempec_pacbio/asm.toerase.err > ----------------------------------------END Wed Mar 25 18:54:05 2015 (5 > seconds) > Running with 370260486 bp for ec_pacbio. > Correcting with 3013291715 bp. > ----------------------------------------START Wed Mar 25 18:54:08 2015 > /home/adrian/programs/wgs-8.3rc1/Linux-amd64/bin/runCA -s > /home/adrian/Rozella/reads//tempec_pacbio/ec_pacbio.spec -p asm -d > tempec_pacbio ovlHashLibrary=2 ovlRefLibrary=1-1 ovlCheckLibrary=1 > obtHashLibrary=1-1 obtRefLibrary=1-1 obtCheckLibrary=0 > doOverlapBasedTrimming=0 stopAfter=meryl > No need to run meryl for OBT (OBT is disabled). > ----------------------------------------START Wed Mar 25 18:54:08 2015 > /home/adrian/programs/wgs-8.3rc1/Linux-amd64/bin/meryl -B -C -v -m 14 > -memory 128000 -threads 16 -c 0 -L 2 -s > /home/adrian/Rozella/reads/tempec_pacbio/asm.gkpStore:chain -o > /home/adrian/Rozella/reads/tempec_pacbio/0-mercounts/asm-C-ms14-cm0 > > /home/adrian/Rozella/reads/tempec_pacbio/0-mercounts/meryl.err 2>&1 > ----------------------------------------END Wed Mar 25 19:17:17 2015 > (1389 seconds) > ----------------------------------------START Wed Mar 25 19:17:17 2015 > /home/adrian/programs/wgs-8.3rc1/Linux-amd64/bin/estimate-mer-threshold > -m /home/adrian/Rozella/reads/tempec_pacbio/0-mercounts/asm-C-ms14-cm0 > > /home/adrian/Rozella/reads/tempec_pacbio/0-mercounts/asm-C-ms14-cm0.estMerThresh.out > 2> > /home/adrian/Rozella/reads/tempec_pacbio/0-mercounts/asm-C-ms14-cm0.estMerThresh.err > ----------------------------------------END Wed Mar 25 19:17:17 2015 (0 > seconds) > ----------------------------------------START Wed Mar 25 19:17:17 2015 > /home/adrian/programs/wgs-8.3rc1/Linux-amd64/bin/meryl -Dt -n 503 -s > /home/adrian/Rozella/reads/tempec_pacbio/0-mercounts/asm-C-ms14-cm0 > > /home/adrian/Rozella/reads/tempec_pacbio/0-mercounts/asm.nmers.ovl.fasta > 2> > /home/adrian/Rozella/reads/tempec_pacbio/0-mercounts/asm.nmers.ovl.fasta.err > > ----------------------------------------END Wed Mar 25 19:17:26 2015 (9 > seconds) > Reset OVL mer threshold from to 503. > Stop requested after 'meryl'. > ----------------------------------------END Wed Mar 25 19:17:26 2015 > (1398 seconds) > ----------------------------------------START Wed Mar 25 19:17:26 2015 > /home/adrian/programs/wgs-8.3rc1/Linux-amd64/bin/runCA -s > /home/adrian/Rozella/reads//tempec_pacbio/ec_pacbio.spec -p asm -d . > ovlMerThreshold=503 ovlHashLibrary=2 ovlRefLibrary=1-1 ovlCheckLibrary=1 > obtHashLibrary=1-1 obtRefLibrary=1-1 obtCheckLibrary=0 > gridEnginePropagateHold="pBcR_asm" stopAfter=overlapper > No need to run meryl for OBT (OBT is disabled). > No need to run meryl for OVL (asm.nmers.ovl.fasta exists). > ================================================================================ > > runCA failed. > > ---------------------------------------- > Stack trace: > > at /home/adrian/programs/wgs-8.3rc1/Linux-amd64/bin/runCA line 1649. > main::caFailure('can\'t set both ovlRefBlockSize and > ovlRefBlockLength', undef) called at > /home/adrian/programs/wgs-8.3rc1/Linux-amd64/bin/runCA line 3789 > main::createOverlapJobs('normal') called at > /home/adrian/programs/wgs-8.3rc1/Linux-amd64/bin/runCA line 6554 > > ---------------------------------------- > Failure message: > > can't set both ovlRefBlockSize and ovlRefBlockLength > > ----------------------------------------END Wed Mar 25 19:17:26 2015 (0 > seconds) > Failed to execute /home/adrian/programs/wgs-8.3rc1/Linux-amd64/bin/runCA > -s /home/adrian/Rozella/reads//tempec_pacbio/ec_pacbio.spec -p asm -d . > ovlMerThreshold=503 ovlHashLibrary=2 ovlRefLibrary=1-1 ovlCheckLibrary=1 > obtHashLibrary=1-1 obtRefLibrary=1-1 obtCheckLibrary=0 > gridEnginePropagateHold="pBcR_asm" stopAfter=overlapper > > > > ------------------------------------------------------------------------------ > Dive into the World of Parallel Programming The Go Parallel Website, sponsored > by Intel and developed in partnership with Slashdot Media, is your hub for all > things parallel software development, from weekly thought leadership blogs to > news, videos, case studies, tutorials and more. Take a look and join the > conversation now. http://goparallel.sourceforge.net/ > _______________________________________________ > wgs-assembler-users mailing list > wgs...@li... > https://lists.sourceforge.net/lists/listinfo/wgs-assembler-users |
From: Adrian P. <ape...@gm...> - 2015-03-26 00:55:14
|
Hello, My run ends with this bizarre error (see below). I ran: ~/programs/wgs-8.3rc1/Linux-amd64/bin/pacBioToCA -length 500 -partitions 200 -l ec_pacbio -t 16 -s pacbio.spec -fastq PacBio.fastq Lib3_PE.frg > run.out 2>&1 and my pacbio.spec is: stopAfter=overlapper # original asm settings utgErrorRate = 0.25 utgErrorLimit = 4.5 cnsErrorRate = 0.25 cgwErrorRate = 0.25 ovlErrorRate = 0.25 merSize=14 merylMemory = 128000 merylThreads = 16 ovlStoreMemory = 8192 # grid info useGrid = 0 scriptOnGrid = 0 frgCorrOnGrid = 0 ovlCorrOnGrid = 0 sge = -A assembly sgeScript = -pe threads 16 sgeConsensus = -pe threads 1 sgeOverlap = -pe threads 2 sgeFragmentCorrection = -pe threads 2 sgeOverlapCorrection = -pe threads 1 #ovlMemory=8GB --hashload 0.7 ovlHashBits = 25 ovlThreads = 2 ovlHashBlockLength = 20000000 ovlRefBlockSize = 50000000 # for mer overlapper merCompression = 1 merOverlapperSeedBatchSize = 500000 merOverlapperExtendBatchSize = 250000 frgCorrThreads = 2 frgCorrBatchSize = 100000 ovlCorrBatchSize = 100000 # non-Grid settings, if you set useGrid to 0 above these will be used merylMemory = 128000 merylThreads = 4 ovlStoreMemory = 8192 ovlConcurrency = 8 cnsConcurrency = 8 merOverlapperThreads = 3 merOverlapperSeedConcurrency = 3 merOverlapperExtendConcurrency = 3 frgCorrConcurrency = 2 ovlCorrConcurrency = 4 cnsConcurrency = 4 LOG: tail run.out -f ----------------------------------------START Wed Mar 25 18:47:55 2015 mkdir tempec_pacbio ----------------------------------------END Wed Mar 25 18:47:55 2015 (0 seconds) ----------------------------------------START Wed Mar 25 18:47:55 2015 /home/adrian/programs/wgs-8.3rc1/Linux-amd64/bin/fastqToCA -libraryname ec_pacbio -type sanger -technology none -feature doConsensusCorrection 1 -reads /home/adrian/Rozella/reads/PacBio.fastq > /home/adrian/Rozella/reads//tempec_pacbio/ec_pacbio.frg ----------------------------------------END Wed Mar 25 18:47:55 2015 (0 seconds) ----------------------------------------START Wed Mar 25 18:47:55 2015 /home/adrian/programs/wgs-8.3rc1/Linux-amd64/bin/runCA -s /home/adrian/Rozella/reads//tempec_pacbio/ec_pacbio.spec -p asm -d tempec_pacbio stopAfter=initialStoreBuilding /home/adrian/Rozella/reads/Lib3_PE.frg /home/adrian/Rozella/reads//tempec_pacbio/ec_pacbio.frg ----------------------------------------START Wed Mar 25 18:47:55 2015 /home/adrian/programs/wgs-8.3rc1/Linux-amd64/bin/gatekeeper -o /home/adrian/Rozella/reads/tempec_pacbio/asm.gkpStore.BUILDING -F /home/adrian/Rozella/reads/Lib3_PE.frg /home/adrian/Rozella/reads//tempec_pacbio/ec_pacbio.frg > /home/adrian/Rozella/reads/tempec_pacbio/asm.gkpStore.err 2>&1 ----------------------------------------END Wed Mar 25 18:53:44 2015 (349 seconds) numFrags = 22786178 Stop requested after 'initialstorebuilding'. ----------------------------------------END Wed Mar 25 18:53:44 2015 (349 seconds) Will be correcting PacBio library 2 with librarie[s] 1 - 1 ----------------------------------------START Wed Mar 25 18:53:53 2015 /home/adrian/programs/wgs-8.3rc1/Linux-amd64/bin/gatekeeper -dumpfragments -invert -tabular -longestovermin 2 500 -longestlength 2 412825510 /home/adrian/Rozella/reads//tempec_pacbio/asm.gkpStore 2> /home/adrian/Rozella/reads//tempec_pacbio/asm.seedlength |awk '{if (!(match($1, "UID") != 0 && length($1) == 3)) { print "frg uid "$1" isdeleted 1"; } }' > /home/adrian/Rozella/reads//tempec_pacbio/asm.toerase.uid ----------------------------------------END Wed Mar 25 18:54:00 2015 (7 seconds) ----------------------------------------START Wed Mar 25 18:54:00 2015 /home/adrian/programs/wgs-8.3rc1/Linux-amd64/bin/gatekeeper --edit /home/adrian/Rozella/reads//tempec_pacbio/asm.toerase.uid /home/adrian/Rozella/reads//tempec_pacbio/asm.gkpStore > /home/adrian/Rozella/reads//tempec_pacbio/asm.toerase.out 2> /home/adrian/Rozella/reads//tempec_pacbio/asm.toerase.err ----------------------------------------END Wed Mar 25 18:54:05 2015 (5 seconds) Running with 370260486 bp for ec_pacbio. Correcting with 3013291715 bp. ----------------------------------------START Wed Mar 25 18:54:08 2015 /home/adrian/programs/wgs-8.3rc1/Linux-amd64/bin/runCA -s /home/adrian/Rozella/reads//tempec_pacbio/ec_pacbio.spec -p asm -d tempec_pacbio ovlHashLibrary=2 ovlRefLibrary=1-1 ovlCheckLibrary=1 obtHashLibrary=1-1 obtRefLibrary=1-1 obtCheckLibrary=0 doOverlapBasedTrimming=0 stopAfter=meryl No need to run meryl for OBT (OBT is disabled). ----------------------------------------START Wed Mar 25 18:54:08 2015 /home/adrian/programs/wgs-8.3rc1/Linux-amd64/bin/meryl -B -C -v -m 14 -memory 128000 -threads 16 -c 0 -L 2 -s /home/adrian/Rozella/reads/tempec_pacbio/asm.gkpStore:chain -o /home/adrian/Rozella/reads/tempec_pacbio/0-mercounts/asm-C-ms14-cm0 > /home/adrian/Rozella/reads/tempec_pacbio/0-mercounts/meryl.err 2>&1 ----------------------------------------END Wed Mar 25 19:17:17 2015 (1389 seconds) ----------------------------------------START Wed Mar 25 19:17:17 2015 /home/adrian/programs/wgs-8.3rc1/Linux-amd64/bin/estimate-mer-threshold -m /home/adrian/Rozella/reads/tempec_pacbio/0-mercounts/asm-C-ms14-cm0 > /home/adrian/Rozella/reads/tempec_pacbio/0-mercounts/asm-C-ms14-cm0.estMerThresh.out 2> /home/adrian/Rozella/reads/tempec_pacbio/0-mercounts/asm-C-ms14-cm0.estMerThresh.err ----------------------------------------END Wed Mar 25 19:17:17 2015 (0 seconds) ----------------------------------------START Wed Mar 25 19:17:17 2015 /home/adrian/programs/wgs-8.3rc1/Linux-amd64/bin/meryl -Dt -n 503 -s /home/adrian/Rozella/reads/tempec_pacbio/0-mercounts/asm-C-ms14-cm0 > /home/adrian/Rozella/reads/tempec_pacbio/0-mercounts/asm.nmers.ovl.fasta 2> /home/adrian/Rozella/reads/tempec_pacbio/0-mercounts/asm.nmers.ovl.fasta.err ----------------------------------------END Wed Mar 25 19:17:26 2015 (9 seconds) Reset OVL mer threshold from to 503. Stop requested after 'meryl'. ----------------------------------------END Wed Mar 25 19:17:26 2015 (1398 seconds) ----------------------------------------START Wed Mar 25 19:17:26 2015 /home/adrian/programs/wgs-8.3rc1/Linux-amd64/bin/runCA -s /home/adrian/Rozella/reads//tempec_pacbio/ec_pacbio.spec -p asm -d . ovlMerThreshold=503 ovlHashLibrary=2 ovlRefLibrary=1-1 ovlCheckLibrary=1 obtHashLibrary=1-1 obtRefLibrary=1-1 obtCheckLibrary=0 gridEnginePropagateHold="pBcR_asm" stopAfter=overlapper No need to run meryl for OBT (OBT is disabled). No need to run meryl for OVL (asm.nmers.ovl.fasta exists). ================================================================================ runCA failed. ---------------------------------------- Stack trace: at /home/adrian/programs/wgs-8.3rc1/Linux-amd64/bin/runCA line 1649. main::caFailure('can\'t set both ovlRefBlockSize and ovlRefBlockLength', undef) called at /home/adrian/programs/wgs-8.3rc1/Linux-amd64/bin/runCA line 3789 main::createOverlapJobs('normal') called at /home/adrian/programs/wgs-8.3rc1/Linux-amd64/bin/runCA line 6554 ---------------------------------------- Failure message: can't set both ovlRefBlockSize and ovlRefBlockLength ----------------------------------------END Wed Mar 25 19:17:26 2015 (0 seconds) Failed to execute /home/adrian/programs/wgs-8.3rc1/Linux-amd64/bin/runCA -s /home/adrian/Rozella/reads//tempec_pacbio/ec_pacbio.spec -p asm -d . ovlMerThreshold=503 ovlHashLibrary=2 ovlRefLibrary=1-1 ovlCheckLibrary=1 obtHashLibrary=1-1 obtRefLibrary=1-1 obtCheckLibrary=0 gridEnginePropagateHold="pBcR_asm" stopAfter=overlapper |
From: Walenz, B. <wa...@nb...> - 2015-03-25 20:31:19
|
No luck on running the 4 billion read test assembly. It's bigger than my development cluster can handle, and all the big machines are busy. I think there is still something fishy going on in gatekeeper; it shouldn't be using as much memory as it is. I was up to ~100gb after only 1 billion reads. From: Walenz, Brian [mailto:wa...@nb...] Sent: Monday, March 23, 2015 4:32 PM To: Langhorst, Brad; wgs...@li... Subject: Re: [wgs-assembler-users] last fragment id < 0 ... runCA can't match the output Congratulations! You're the first to try (or announce trying) more than 2 billion reads. We've run assemblies with up to 2 billion reads, but never more. I forget why we didn't allow up to 4 billion reads. Aside from silly printing errors like this, the are probably places where -1 is used to indicate an invalid ID, or where exactly 31 bits were available to store an ID. I think I can spend a little time running a mock assembly with ~ 4 billion reads. This should catch most of the obvious issues. The fix for this issue is simple. The easiest for you would be to change that perl function to always return 3902080154 (the number of reads you actually have). You can try continuing with that hack, or wait until I can run the mock assembly. The only risk is lost compute; every step of the assembler can be undone. b ________________________________ From: Langhorst, Brad [Lan...@ne...] Sent: Sunday, March 22, 2015 5:09 PM To: wgs...@li...<mailto:wgs...@li...> Subject: [wgs-assembler-users] last fragment id < 0 ... runCA can't match the output Hi: My frag store has finally been built, but runCA can't continue because it can't figure out the number of frags in the store. Is it normal for the endIID to be < 0? gatekeeper -lastfragiid deer.gkpStore Last frag in store is iid = -392887142 runCA can't handle that. in getNumberOfFragsInStore, this regex fails to match due to the "-" $numFrags = $1 if (m/^Last frag in store is iid = (\d+)$/); I could fix that, but it doesn't seem very reasonable to report a negative value from a function that is supposed to be counting the number of frags in a store. Should I fix the getNumberOfFragsInStore function to just parse this file and return the "active" column? or is this an indication of some deeper problem? Where should I look next? Thanks, Brad Here is the .info file: libIID bgnIID endIID active deleted mated totLen clrLen libName 0 1 -392887142 3902080154 0 3902080154 589214103254 589214103254 GLOBAL 0 0 0 0 0 0 0 0 LegacyUnmatedReads 1 1 1321836050 1321836050 0 1321836050 199597243550 199597243550 run15 2 1321836051 2564548948 1242712898 0 1242712898 187649647598 187649647598 run16 3 2564548949 3902080154 1337531206 0 1337531206 201967212106 201967212106 run17 -- Brad Langhorst, Ph.D. Applications and Product Development Scientist |
From: Langhorst, B. <Lan...@ne...> - 2015-03-24 00:55:02
|
Hi Brian: Looks like this overflow may cause more trouble. Tried setting the num reads… meryl starts - but dies quickly . I’ll try to start with merged reads (from PEAR) only… should drop the read count significantly. Let me know if you want the core file. brad /home/NEB/langhorst/wgs-8.3rc1/Linux-amd64/bin/meryl -B -C -v -m 22 -memory 128000 -threads 16 -c 0 -L 2 -s /mnt/galaxy/data/langhorst/deer_unitigs_ca83//deer.gkpStore:chain -o /mnt/galaxy/data/langhorst/deer_unitigs_ca83//0-mercounts/deer-C-ms22-cm0 > /mnt/galaxy/data/langhorst/deer_unitigs_ca83//0-mercounts/meryl.err 2>&1 ----------------------------------------END Mon Mar 23 20:43:51 2015 (1364 seconds) ERROR: Failed with signal ABRT (6) ================================================================================ runCA failed. ---------------------------------------- Stack trace: at /home/NEB/langhorst/wgs-8.3rc1/Linux-amd64/bin/runCA line 1653. main::caFailure('meryl failed', '/mnt/galaxy/data/langhorst/deer_unitigs_ca83//0-mercounts/mer...') called at /home/NEB/langhorst/wgs-8.3rc1/Linux-amd64/bin/runCA line 2523 main::runMeryl(22, 0, '-C', 'auto', undef, undef, 'mbt', 0) called at /home/NEB/langhorst/wgs-8.3rc1/Linux-amd64/bin/runCA line 3088 main::merTrim() called at /home/NEB/langhorst/wgs-8.3rc1/Linux-amd64/bin/runCA line 6556 ---------------------------------------- Last few lines of the relevant log file (/mnt/galaxy/data/langhorst/deer_unitigs_ca83//0-mercounts/meryl.err): meryl: AS_MER_gkpStoreChain.C:69: gkpStoreChain::gkpStoreChain(const char*, uint32): Assertion `_numberOfSequences < _maxChains' failed. Failed with 'Aborted' Backtrace (mangled): /home/NEB/langhorst/wgs-8.3rc1/Linux-amd64/bin/meryl(_Z17AS_UTL_catchCrashiP7siginfoPv+0x2a)[0x40d48a] /lib/x86_64-linux-gnu/libpthread.so.0(+0x10340)[0x7fb3ff878340] /lib/x86_64-linux-gnu/libc.so.6(gsignal+0x39)[0x7fb3feaaacc9] /lib/x86_64-linux-gnu/libc.so.6(abort+0x148)[0x7fb3feaae0d8] /lib/x86_64-linux-gnu/libc.so.6(+0x2fb86)[0x7fb3feaa3b86] /lib/x86_64-linux-gnu/libc.so.6(+0x2fc32)[0x7fb3feaa3c32] /home/NEB/langhorst/wgs-8.3rc1/Linux-amd64/bin/meryl(_ZN13gkpStoreChainC2EPKcj+0x4d5)[0x40bd35] /home/NEB/langhorst/wgs-8.3rc1/Linux-amd64/bin/meryl(_ZN13gkpStoreChain8openFileEPKc+0x62)[0x40be02] /home/NEB/langhorst/wgs-8.3rc1/Linux-amd64/bin/meryl(_ZN10seqFactory8openFileEPKc+0x3c)[0x42d61c] /home/NEB/langhorst/wgs-8.3rc1/Linux-amd64/bin/meryl(_ZN9seqStreamC2EPKc+0x2c)[0x42d9cc] /home/NEB/langhorst/wgs-8.3rc1/Linux-amd64/bin/meryl(_Z12prepareBatchP9merylArgs+0xc9)[0x41e479] /home/NEB/langhorst/wgs-8.3rc1/Linux-amd64/bin/meryl(_Z5buildP9merylArgs+0x685)[0x421775] /home/NEB/langhorst/wgs-8.3rc1/Linux-amd64/bin/meryl(main+0xe7)[0x409cb7] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5)[0x7fb3fea95ec5] /home/NEB/langhorst/wgs-8.3rc1/Linux-amd64/bin/meryl(__gxx_personality_v0+0x141)[0x409a99] Backtrace (demangled): [0] /home/NEB/langhorst/wgs-8.3rc1/Linux-amd64/bin/meryl::AS_UTL_catchCrash(int, siginfo*, void*) + 0x2a [0x40d48a] [1] /lib/x86_64-linux-gnu/libpthread.so.0::(null) + 0x10340 [0x7fb3ff878340] [2] /lib/x86_64-linux-gnu/libc.so.6::(null) + 0x39 [0x7fb3feaaacc9] [3] /lib/x86_64-linux-gnu/libc.so.6::(null) + 0x148 [0x7fb3feaae0d8] [4] /lib/x86_64-linux-gnu/libc.so.6::(null) + 0x2fb86 [0x7fb3feaa3b86] [5] /lib/x86_64-linux-gnu/libc.so.6::(null) + 0x2fc32 [0x7fb3feaa3c32] [6] /home/NEB/langhorst/wgs-8.3rc1/Linux-amd64/bin/meryl::gkpStoreChain::gkpStoreChain(char const*, unsigned int) + 0x4d5 [0x40bd35] [7] /home/NEB/langhorst/wgs-8.3rc1/Linux-amd64/bin/meryl::gkpStoreChain::openFile(char const*) + 0x62 [0x40be02] [8] /home/NEB/langhorst/wgs-8.3rc1/Linux-amd64/bin/meryl::seqFactory::openFile(char const*) + 0x3c [0x42d61c] [9] /home/NEB/langhorst/wgs-8.3rc1/Linux-amd64/bin/meryl::seqStream::seqStream(char const*) + 0x2c [0x42d9cc] [10] /home/NEB/langhorst/wgs-8.3rc1/Linux-amd64/bin/meryl::prepareBatch(merylArgs*) + 0xc9 [0x41e479] [11] /home/NEB/langhorst/wgs-8.3rc1/Linux-amd64/bin/meryl::build(merylArgs*) + 0x685 [0x421775] [12] /home/NEB/langhorst/wgs-8.3rc1/Linux-amd64/bin/meryl::(null) + 0xe7 [0x409cb7] [13] /lib/x86_64-linux-gnu/libc.so.6::(null) + 0xf5 [0x7fb3fea95ec5] [14] /home/NEB/langhorst/wgs-8.3rc1/Linux-amd64/bin/meryl::(null) + 0x141 [0x409a99] -- Brad Langhorst, Ph.D. Applications and Product Development Scientist On Mar 23, 2015, at 4:32 PM, Walenz, Brian <wa...@nb...<mailto:wa...@nb...>> wrote: Congratulations! You're the first to try (or announce trying) more than 2 billion reads. We've run assemblies with up to 2 billion reads, but never more. I forget why we didn't allow up to 4 billion reads. Aside from silly printing errors like this, the are probably places where -1 is used to indicate an invalid ID, or where exactly 31 bits were available to store an ID. I think I can spend a little time running a mock assembly with ~ 4 billion reads. This should catch most of the obvious issues. The fix for this issue is simple. The easiest for you would be to change that perl function to always return 3902080154 (the number of reads you actually have). You can try continuing with that hack, or wait until I can run the mock assembly. The only risk is lost compute; every step of the assembler can be undone. b ________________________________ From: Langhorst, Brad [Lan...@ne...<mailto:Lan...@ne...>] Sent: Sunday, March 22, 2015 5:09 PM To: wgs...@li...<mailto:wgs...@li...> Subject: [wgs-assembler-users] last fragment id < 0 ... runCA can't match the output Hi: My frag store has finally been built, but runCA can’t continue because it can’t figure out the number of frags in the store. Is it normal for the endIID to be < 0? gatekeeper -lastfragiid deer.gkpStore Last frag in store is iid = -392887142 runCA can’t handle that. in getNumberOfFragsInStore, this regex fails to match due to the “-" $numFrags = $1 if (m/^Last frag in store is iid = (\d+)$/); I could fix that, but it doesn’t seem very reasonable to report a negative value from a function that is supposed to be counting the number of frags in a store. Should I fix the getNumberOfFragsInStore function to just parse this file and return the “active” column? or is this an indication of some deeper problem? Where should I look next? Thanks, Brad Here is the .info file: libIID bgnIID endIID active deleted mated totLen clrLen libName 0 1 -392887142 3902080154 0 3902080154 589214103254 589214103254 GLOBAL 0 0 0 0 0 0 0 0 LegacyUnmatedReads 1 1 1321836050 1321836050 0 1321836050 199597243550 199597243550 run15 2 1321836051 2564548948 1242712898 0 1242712898 187649647598 187649647598 run16 3 2564548949 3902080154 1337531206 0 1337531206 201967212106 201967212106 run17 -- Brad Langhorst, Ph.D. Applications and Product Development Scientist |
From: Walenz, B. <wa...@nb...> - 2015-03-23 20:32:11
|
Congratulations! You're the first to try (or announce trying) more than 2 billion reads. We've run assemblies with up to 2 billion reads, but never more. I forget why we didn't allow up to 4 billion reads. Aside from silly printing errors like this, the are probably places where -1 is used to indicate an invalid ID, or where exactly 31 bits were available to store an ID. I think I can spend a little time running a mock assembly with ~ 4 billion reads. This should catch most of the obvious issues. The fix for this issue is simple. The easiest for you would be to change that perl function to always return 3902080154 (the number of reads you actually have). You can try continuing with that hack, or wait until I can run the mock assembly. The only risk is lost compute; every step of the assembler can be undone. b ________________________________ From: Langhorst, Brad [Lan...@ne...] Sent: Sunday, March 22, 2015 5:09 PM To: wgs...@li... Subject: [wgs-assembler-users] last fragment id < 0 ... runCA can't match the output Hi: My frag store has finally been built, but runCA can’t continue because it can’t figure out the number of frags in the store. Is it normal for the endIID to be < 0? gatekeeper -lastfragiid deer.gkpStore Last frag in store is iid = -392887142 runCA can’t handle that. in getNumberOfFragsInStore, this regex fails to match due to the “-" $numFrags = $1 if (m/^Last frag in store is iid = (\d+)$/); I could fix that, but it doesn’t seem very reasonable to report a negative value from a function that is supposed to be counting the number of frags in a store. Should I fix the getNumberOfFragsInStore function to just parse this file and return the “active” column? or is this an indication of some deeper problem? Where should I look next? Thanks, Brad Here is the .info file: libIID bgnIID endIID active deleted mated totLen clrLen libName 0 1 -392887142 3902080154 0 3902080154 589214103254 589214103254 GLOBAL 0 0 0 0 0 0 0 0 LegacyUnmatedReads 1 1 1321836050 1321836050 0 1321836050 199597243550 199597243550 run15 2 1321836051 2564548948 1242712898 0 1242712898 187649647598 187649647598 run16 3 2564548949 3902080154 1337531206 0 1337531206 201967212106 201967212106 run17 -- Brad Langhorst, Ph.D. Applications and Product Development Scientist |
From: Langhorst, B. <Lan...@ne...> - 2015-03-22 21:41:26
|
Hi: My frag store has finally been built, but runCA can’t continue because it can’t figure out the number of frags in the store. Is it normal for the endIID to be < 0? gatekeeper -lastfragiid deer.gkpStore Last frag in store is iid = -392887142 runCA can’t handle that. in getNumberOfFragsInStore, this regex fails to match due to the “-" $numFrags = $1 if (m/^Last frag in store is iid = (\d+)$/); I could fix that, but it doesn’t seem very reasonable to report a negative value from a function that is supposed to be counting the number of frags in a store. Should I fix the getNumberOfFragsInStore function to just parse this file and return the “active” column? or is this an indication of some deeper problem? Where should I look next? Thanks, Brad Here is the .info file: libIID bgnIID endIID active deleted mated totLen clrLen libName 0 1 -392887142 3902080154 0 3902080154 589214103254 589214103254 GLOBAL 0 0 0 0 0 0 0 0 LegacyUnmatedReads 1 1 1321836050 1321836050 0 1321836050 199597243550 199597243550 run15 2 1321836051 2564548948 1242712898 0 1242712898 187649647598 187649647598 run16 3 2564548949 3902080154 1337531206 0 1337531206 201967212106 201967212106 run17 -- Brad Langhorst, Ph.D. Applications and Product Development Scientist |
From: Langhorst, B. <Lan...@ne...> - 2015-03-22 21:26:16
|
whups forgot to cc. the mail list on my earlier reply -- Brad Langhorst, Ph.D. Applications and Product Development Scientist > On Mar 20, 2015, at 4:41 PM, Langhorst, Brad <Lan...@ne...> wrote: > > HI Brian: > >> Do you have any details on the out of memory error? It shouldn't be using much memory for fastq files (opposed to the 'fry' format which WILL use gobs of memory to detect duplicate IDs). The read metadata is kept in core, but just about every process in the assembler will need to do that too. > > The system was loaded when I tried to run the first time: > Processing INNIE SANGER QV encoding reads from: > '/mnt/ngswork/langhorst/deer_assembly/ovi_run15.1.fastq' > and '/mnt/ngswork/langhorst/deer_assembly/ovi_run15.2.fastq' > Could not calloc memory (536870912 * 32 bytes = 17179869184) > gatekeeper: AS_UTL_alloc.C:49: void* safe_calloc(size_t, size_t): Assertion `p != __null' failed. > >> 1.5 billion reads will be difficult to get assembled with overlaps. Have you looked into masurca? This will build 'super reads' from the PE, effectively compressing the data before assembling. > > I’m trying to build unitigs to feed to eautils. > > 80% of these PE reads overlap (based on the result of a stitch with PEAR) but I thought maybe CA would work better with the PE information and all reads. > > I’m running again now that the other jobs have finished, maybe it will complete. > It looks like I’m enabling duplicate removal in frg file (see below). Didn’t mean to do that. Should I disable that feature by just flipping the 1 to a 0? > > Should I give up on CA and switch to masurca? > > > Brad > > Here’s one of the frg files: > {VER > ver:2 > } > {LIB > act:A > acc:run15 > ori:I > mea:280.000 > std:100.000 > src: > . > nft:20 > fea: > forceBOGunitigger=1 > isNotRandom=0 > doNotTrustHomopolymerRuns=0 > doTrim_initialNone=0 > doTrim_initialMerBased=1 > doTrim_initialFlowBased=0 > doTrim_initialQualityBased=0 > doRemoveDuplicateReads=1 > doTrim_finalLargestCovered=1 > doTrim_finalEvidenceBased=0 > doTrim_finalBestEdge=0 > doRemoveSpurReads=1 > doRemoveChimericReads=1 > doCheckForSubReads=0 > doConsensusCorrection=0 > forceShortReadFormat=1 > constantInsertSize=0 > fastqQualityValues=sanger > fastqOrientation=innie > fastqMates=/mnt/ngswork/langhorst/deer_assembly/ovi_run15.1.fastq,/mnt/ngswork/langhorst/deer_assembly/ovi_run15.2.fastq > . > } > {VER > ver:1 > } > >> On Mar 20, 2015, at 4:28 PM, Walenz, Brian <wa...@nb...> wrote: >> >> Unfortunately, appending broke, and we never had a need to fix it. >> >> >> >> >> b >> >> >> ________________________________________ >> From: Langhorst, Brad [Lan...@ne...] >> Sent: Friday, March 20, 2015 1:43 PM >> To: wgs...@li... >> Subject: [wgs-assembler-users] appending to a gkpStore? >> >> Hi: >> >> I ran into a problem creating a gkpStore from 3 frg files (pointing to PE fastq files, each about 500M reads) >> It failed due to a memory allocation error when i tried to import all 3 at once, so I thought I’d try to append to the store like this: >> $gatekeeper -o $store -T -F $frg_path/run15.frg >> $gatekeeper -a -o $store -T -F $frg_path/run16.frg >> $gatekeeper -a -o $store -T -F $frg_path/run17.frg >> >> The first one succeeds, but the append fails immediately. >> >> Seems like the store is somehow marked read-only. I didn’t expect that since the first command succeeded. >> Should appending to a store work? >> Should I try an older gatekeeper? Will that cause trouble later if i try to use 8.3 for following steps? >> >> Here’s the log: >> >> Starting file '/mnt/galaxy/data/langhorst/deer_unitigs/run15.frg'. >> >> Processing INNIE SANGER QV encoding reads from: >> '/mnt/ngswork/langhorst/deer_assembly/ovi_run15.1.fastq' >> and '/mnt/ngswork/langhorst/deer_assembly/ovi_run15.2.fastq' >> >> >> GKP finished with 1 alerts or errors: >> 1 # LIB Alert: stddev too big for mean; reset stddev to 0.1 * mean. >> >> >> Starting file '/mnt/galaxy/data/langhorst/deer_unitigs/run16.frg'. >> gatekeeper: AS_PER_genericStore.C:425: int64 appendStringStore(StoreStruct*, char*, uint32): Assertion `s->readOnly == false' failed. >> >> … >> >> [0] /home/NEB/langhorst/wgs-8.3rc1/Linux-amd64/bin/gatekeeper::AS_UTL_catchCrash(int, siginfo*, void*) + 0x2a [0x42587a] >> [1] /lib/x86_64-linux-gnu/libpthread.so.0::(null) + 0x10340 [0x7ffb82749340] >> [2] /lib/x86_64-linux-gnu/libc.so.6::(null) + 0x39 [0x7ffb823aacc9] >> [3] /lib/x86_64-linux-gnu/libc.so.6::(null) + 0x148 [0x7ffb823ae0d8] >> [4] /lib/x86_64-linux-gnu/libc.so.6::(null) + 0x2fb86 [0x7ffb823a3b86] >> [5] /lib/x86_64-linux-gnu/libc.so.6::(null) + 0x2fc32 [0x7ffb823a3c32] >> [6] /home/NEB/langhorst/wgs-8.3rc1/Linux-amd64/bin/gatekeeper() [0x43129d] >> [7] /home/NEB/langhorst/wgs-8.3rc1/Linux-amd64/bin/gatekeeper::gkStore::gkStore_addUID(char*) + 0x13f [0x436d6f] >> [8] /home/NEB/langhorst/wgs-8.3rc1/Linux-amd64/bin/gatekeeper::AS_UID_load(char*) + 0x196 [0x4254b6] >> [9] /home/NEB/langhorst/wgs-8.3rc1/Linux-amd64/bin/gatekeeper::GetUID(char*, _IO_FILE*) + 0x11 [0x4264d1] >> [10] /home/NEB/langhorst/wgs-8.3rc1/Linux-amd64/bin/gatekeeper() [0x42f902] >> [11] /home/NEB/langhorst/wgs-8.3rc1/Linux-amd64/bin/gatekeeper::ReadProtoMesg_AS(_IO_FILE*, GenericMesg**) + 0x4aa [0x42719a] >> [12] /home/NEB/langhorst/wgs-8.3rc1/Linux-amd64/bin/gatekeeper::(null) + 0x5c1 [0x4087b1] >> [13] /lib/x86_64-linux-gnu/libc.so.6::(null) + 0xf5 [0x7ffb82395ec5] >> [14] /home/NEB/langhorst/wgs-8.3rc1/Linux-amd64/bin/gatekeeper::(null) + 0xf1 [0x406949] >> >> ------------------------------------------------------------------------------ >> Dive into the World of Parallel Programming The Go Parallel Website, sponsored >> by Intel and developed in partnership with Slashdot Media, is your hub for all >> things parallel software development, from weekly thought leadership blogs to >> news, videos, case studies, tutorials and more. Take a look and join the >> conversation now. http://goparallel.sourceforge.net/ >> _______________________________________________ >> wgs-assembler-users mailing list >> wgs...@li... >> https://lists.sourceforge.net/lists/listinfo/wgs-assembler-users > |
From: Walenz, B. <wa...@nb...> - 2015-03-20 21:03:18
|
Unfortunately, appending broke, and we never had a need to fix it. Do you have any details on the out of memory error? It shouldn't be using much memory for fastq files (opposed to the 'fry' format which WILL use gobs of memory to detect duplicate IDs). The read metadata is kept in core, but just about every process in the assembler will need to do that too. 1.5 billion reads will be difficult to get assembled with overlaps. Have you looked into masurca? This will build 'super reads' from the PE, effectively compressing the data before assembling. b ________________________________________ From: Langhorst, Brad [Lan...@ne...] Sent: Friday, March 20, 2015 1:43 PM To: wgs...@li... Subject: [wgs-assembler-users] appending to a gkpStore? Hi: I ran into a problem creating a gkpStore from 3 frg files (pointing to PE fastq files, each about 500M reads) It failed due to a memory allocation error when i tried to import all 3 at once, so I thought I’d try to append to the store like this: $gatekeeper -o $store -T -F $frg_path/run15.frg $gatekeeper -a -o $store -T -F $frg_path/run16.frg $gatekeeper -a -o $store -T -F $frg_path/run17.frg The first one succeeds, but the append fails immediately. Seems like the store is somehow marked read-only. I didn’t expect that since the first command succeeded. Should appending to a store work? Should I try an older gatekeeper? Will that cause trouble later if i try to use 8.3 for following steps? Here’s the log: Starting file '/mnt/galaxy/data/langhorst/deer_unitigs/run15.frg'. Processing INNIE SANGER QV encoding reads from: '/mnt/ngswork/langhorst/deer_assembly/ovi_run15.1.fastq' and '/mnt/ngswork/langhorst/deer_assembly/ovi_run15.2.fastq' GKP finished with 1 alerts or errors: 1 # LIB Alert: stddev too big for mean; reset stddev to 0.1 * mean. Starting file '/mnt/galaxy/data/langhorst/deer_unitigs/run16.frg'. gatekeeper: AS_PER_genericStore.C:425: int64 appendStringStore(StoreStruct*, char*, uint32): Assertion `s->readOnly == false' failed. … [0] /home/NEB/langhorst/wgs-8.3rc1/Linux-amd64/bin/gatekeeper::AS_UTL_catchCrash(int, siginfo*, void*) + 0x2a [0x42587a] [1] /lib/x86_64-linux-gnu/libpthread.so.0::(null) + 0x10340 [0x7ffb82749340] [2] /lib/x86_64-linux-gnu/libc.so.6::(null) + 0x39 [0x7ffb823aacc9] [3] /lib/x86_64-linux-gnu/libc.so.6::(null) + 0x148 [0x7ffb823ae0d8] [4] /lib/x86_64-linux-gnu/libc.so.6::(null) + 0x2fb86 [0x7ffb823a3b86] [5] /lib/x86_64-linux-gnu/libc.so.6::(null) + 0x2fc32 [0x7ffb823a3c32] [6] /home/NEB/langhorst/wgs-8.3rc1/Linux-amd64/bin/gatekeeper() [0x43129d] [7] /home/NEB/langhorst/wgs-8.3rc1/Linux-amd64/bin/gatekeeper::gkStore::gkStore_addUID(char*) + 0x13f [0x436d6f] [8] /home/NEB/langhorst/wgs-8.3rc1/Linux-amd64/bin/gatekeeper::AS_UID_load(char*) + 0x196 [0x4254b6] [9] /home/NEB/langhorst/wgs-8.3rc1/Linux-amd64/bin/gatekeeper::GetUID(char*, _IO_FILE*) + 0x11 [0x4264d1] [10] /home/NEB/langhorst/wgs-8.3rc1/Linux-amd64/bin/gatekeeper() [0x42f902] [11] /home/NEB/langhorst/wgs-8.3rc1/Linux-amd64/bin/gatekeeper::ReadProtoMesg_AS(_IO_FILE*, GenericMesg**) + 0x4aa [0x42719a] [12] /home/NEB/langhorst/wgs-8.3rc1/Linux-amd64/bin/gatekeeper::(null) + 0x5c1 [0x4087b1] [13] /lib/x86_64-linux-gnu/libc.so.6::(null) + 0xf5 [0x7ffb82395ec5] [14] /home/NEB/langhorst/wgs-8.3rc1/Linux-amd64/bin/gatekeeper::(null) + 0xf1 [0x406949] ------------------------------------------------------------------------------ Dive into the World of Parallel Programming The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net/ _______________________________________________ wgs-assembler-users mailing list wgs...@li... https://lists.sourceforge.net/lists/listinfo/wgs-assembler-users |