From: Christian D. <chr...@gm...> - 2015-07-23 14:29:23
|
Hi, I run wgs-8.2beta until the assembler went idle on one of the overlap correction steps (frgcorr.sh). Obviously, one of the early fragments didn't finish, as frgcorr.sh for this fragment was running for 15h and the log file contained only the first few row up to ### Using 20 pthreads. The assembler stopped the correction only a few fragments after the idle one. I killed the idle process and executed the fragcorr.sh command for this fragment manually. After that, I run runCA again with the original command and immediatly got the failure message: "gatekeeper failed to add fragments" As this didn't seem to work, I renamed the folder 3-overlapcorrection and run runCA again leading to the same error message. I thought that starting with the step before the error correction could work and run: /software/wgs-8.2beta/Linux-amd64/bin/overlapStoreBuild -o /cabog/CA/genome.ovlStore.BUILDING -g /cabog/CA/genome.gkpStore -M 8192 -L /cabog/CA/genome.ovlStore.list > /cabog/CA/genome.ovlStore.err 2>&1 The log file genome.ovlStore.err contains the following: gkStore_open()-- ERROR! Incorrect element sizes; code and store are incompatible. gkLibrary: store 216 code 216 bytes gkPackedFragment: store 24 code 24 bytes gkNormalFragment: store 48 code 48 bytes gkStrobeFragment: store 48 code 48 bytes AS_READ_MAX_NORMAL_LEN_BITS: store 16 code 18 Is it possible to restart the assembly at this point? What steps do I have to take to "rescue" the assembly results up to this point (>20 days of calculation time) Thanks Chris |
From: Brian W. <th...@gm...> - 2015-07-24 14:42:28
|
Yes, you can safely remove the 3- directory and restart. You've got FAR too many jobs created here. Increase frgCorrBatchSize and ovlCorrBatchSize. The 'frgCorr' step will use 13 bytes of memory per base in the batch PLUS any overlaps loaded. Assuming 250bp reads, try frgCorrBatchSize=50000000 (50 million). That should use 160gb memory for data, leaving lots of space for overlaps. Also set frgCorrConcurrency=1 and frgCorrThreads=64. This will run one job at a time, using 64 compute threads. The ovlCorr step isn't as demanding on memory, so lets try ovlCorrBatchSize=10000000 (10 million), ovlCorrConcurrency=8 ovlCorrThreads=8. b On Thu, Jul 23, 2015 at 11:50 AM, Christian Dreischer < chr...@gm...> wrote: > Good hint. After 8.2beta didn't work, I switched to 8.3rc2. Running runCA > again with 8.2beta continued the assembly with "cat-corrects". As I > mentioned before, frgcorr.sh seems to have run only for a fraction of the > fragments (judging from the message "Created 36656 overlap jobs. Last > batch '037', last job '036656'" in the log). Is it safe to delete the > 3-overlapper directory and to start runCA again? > Here's my config file: > > overlapper = ovl > unitigger = bogart > utgBubblePopping = 1 > merSize = 14 > merylMemory = 128000 > merylThreads = 16 > ovlStoreMemory = 8192 > # grid info > useGrid = 0 > scriptOnGrid = 0 > frgCorrOnGrid = 0 > ovlCorrOnGrid = 0 > #ovlMemory=8GB --hashload 0.7 > ovlHashBits = 25 > ovlThreads = 6 > ovlHashBlockLength = 20000000 > ovlRefBlockSize = 5000000 > # for mer overlapper > merCompression = 1 > merOverlapperSeedBatchSize = 500000 > merOverlapperExtendBatchSize = 250000 > frgCorrThreads = 20 > frgCorrBatchSize = 500000 > ovlCorrBatchSize = 100000 > # non-Grid settings, if you set useGrid to 0 above these will be used > merylMemory = 128000 > merylThreads = 12 > ovlStoreMemory = 8192 > ovlConcurrency = 8 > merOverlapperThreads = 6 > merOverlapperSeedConcurrency = 2 > merOverlapperExtendConcurrency = 2 > frgCorrConcurrency = 8 > ovlCorrConcurrency = 16 > cnsConcurrency = 16 > doToggle=0 > toggleNumInstances = 0 > toggleUnitigLength = 2000 > doOverlapBasedTrimming = 1 > doExtendClearRanges = 2 > > I'm running the assembly on a Ubuntu 14.04.2 LTS, 64 core (AMD Opteron) > server with 512GB memory. > > Chris > > > 2015-07-23 17:21 GMT+02:00 Brian Walenz <th...@gm...>: > >> Did you do a code update between starting the assembly and now? If you >> have the source code, change AS_READ_MAX_NORMAL_LEN_BITS in file >> AS_global.H from 18 to 16. >> >> The ovlStore is likely OK. The issue is hopefully configuration of the >> frgCorr (and ovlCorr) stages. These are both I/O intense and big memory, >> and finding a tradeoff is sometimes hard. Post your config, please (and >> describe the hardware you're running on). >> >> Failing that, you can turn this step off: doFragmentCorrection=0 (IIRC). >> I'd suggest a slight increase in unitigger (bogart, the bat* parameters) >> error rates to adjust for the uncorrected overlaps. >> >> b >> >> >> On Thu, Jul 23, 2015 at 10:29 AM, Christian Dreischer < >> chr...@gm...> wrote: >> >>> >>> Hi, >>> >>> I run wgs-8.2beta until the assembler went idle on one of the overlap >>> correction steps (frgcorr.sh). Obviously, one of the early fragments didn't >>> finish, as frgcorr.sh for this fragment was running for 15h and the log >>> file contained only the first few row up to ### Using 20 pthreads. The >>> assembler stopped the correction only a few fragments after the idle one. >>> I killed the idle process and executed the fragcorr.sh command for this >>> fragment manually. After that, I run runCA again with the original command >>> and immediatly got the failure message: >>> >>> "gatekeeper failed to add fragments" >>> >>> As this didn't seem to work, I renamed the folder 3-overlapcorrection >>> and run runCA again leading to the same error message. >>> I thought that starting with the step before the error correction could >>> work and run: >>> >>> /software/wgs-8.2beta/Linux-amd64/bin/overlapStoreBuild -o >>> /cabog/CA/genome.ovlStore.BUILDING -g /cabog/CA/genome.gkpStore -M 8192 >>> -L /cabog/CA/genome.ovlStore.list > /cabog/CA/genome.ovlStore.err 2>&1 >>> >>> The log file genome.ovlStore.err contains the following: >>> >>> gkStore_open()-- ERROR! Incorrect element sizes; code and store are >>> incompatible. >>> gkLibrary: store 216 code 216 bytes >>> gkPackedFragment: store 24 code 24 bytes >>> gkNormalFragment: store 48 code 48 bytes >>> gkStrobeFragment: store 48 code 48 bytes >>> AS_READ_MAX_NORMAL_LEN_BITS: store 16 code 18 >>> >>> Is it possible to restart the assembly at this point? >>> What steps do I have to take to "rescue" the assembly results up to >>> this point (>20 days of calculation time) >>> >>> Thanks >>> Chris >>> >>> >>> >>> ------------------------------------------------------------------------------ >>> >>> _______________________________________________ >>> wgs-assembler-users mailing list >>> wgs...@li... >>> https://lists.sourceforge.net/lists/listinfo/wgs-assembler-users >>> >>> >> > |
From: Brian W. <th...@gm...> - 2015-07-23 15:21:54
|
Did you do a code update between starting the assembly and now? If you have the source code, change AS_READ_MAX_NORMAL_LEN_BITS in file AS_global.H from 18 to 16. The ovlStore is likely OK. The issue is hopefully configuration of the frgCorr (and ovlCorr) stages. These are both I/O intense and big memory, and finding a tradeoff is sometimes hard. Post your config, please (and describe the hardware you're running on). Failing that, you can turn this step off: doFragmentCorrection=0 (IIRC). I'd suggest a slight increase in unitigger (bogart, the bat* parameters) error rates to adjust for the uncorrected overlaps. b On Thu, Jul 23, 2015 at 10:29 AM, Christian Dreischer < chr...@gm...> wrote: > > Hi, > > I run wgs-8.2beta until the assembler went idle on one of the overlap > correction steps (frgcorr.sh). Obviously, one of the early fragments didn't > finish, as frgcorr.sh for this fragment was running for 15h and the log > file contained only the first few row up to ### Using 20 pthreads. The > assembler stopped the correction only a few fragments after the idle one. > I killed the idle process and executed the fragcorr.sh command for this > fragment manually. After that, I run runCA again with the original command > and immediatly got the failure message: > > "gatekeeper failed to add fragments" > > As this didn't seem to work, I renamed the folder 3-overlapcorrection and > run runCA again leading to the same error message. > I thought that starting with the step before the error correction could > work and run: > > /software/wgs-8.2beta/Linux-amd64/bin/overlapStoreBuild -o > /cabog/CA/genome.ovlStore.BUILDING -g /cabog/CA/genome.gkpStore -M 8192 > -L /cabog/CA/genome.ovlStore.list > /cabog/CA/genome.ovlStore.err 2>&1 > > The log file genome.ovlStore.err contains the following: > > gkStore_open()-- ERROR! Incorrect element sizes; code and store are > incompatible. > gkLibrary: store 216 code 216 bytes > gkPackedFragment: store 24 code 24 bytes > gkNormalFragment: store 48 code 48 bytes > gkStrobeFragment: store 48 code 48 bytes > AS_READ_MAX_NORMAL_LEN_BITS: store 16 code 18 > > Is it possible to restart the assembly at this point? > What steps do I have to take to "rescue" the assembly results up to this > point (>20 days of calculation time) > > Thanks > Chris > > > > ------------------------------------------------------------------------------ > > _______________________________________________ > wgs-assembler-users mailing list > wgs...@li... > https://lists.sourceforge.net/lists/listinfo/wgs-assembler-users > > |