Whole-Genome Shotgun Assembler / Support Requests / #19 Question on ExtendClearRanges

Question on ExtendClearRanges

#19 Question on ExtendClearRanges

Milestone: v1.0_(example)

Status: open

Owner: nobody

Labels: None

Priority: 5

Updated: 2015-03-09

Created: 2015-03-08

Creator: Ted Kalbfleisch

Private: No

Hello,

I am Ted Kalbfleisch with the University of Louisville.

I am running wgs-8.2. My command line inputs were

/home/tskalb01/wgs-8.2/Linux-amd64/bin/runCA ovlMerThreshold=75 gkpFixInsertSizes=0 ovlMerSize=30 cgwErrorRate=0.15 ovlHashBits=24 ovlHashBlockLength=110000000 ovlCorrBatchSi
ze=145904 unitigger=bog -p genome -d CA merylThreads=32 merylMemory=668467 frgCorrThreads=1 frgCorrConcurrency=32 cnsConcurrency=9 ovlCorrConcurrency=32 ovlConcurrency=32 ovl
Threads=1 doFragmentCorrection=1 doOverlapBasedTrimming=1 doExtendClearRanges=1 ovlMerSize=22 /scratch/large/tskalb01/Twilight/TwilightSanger/10k.frg /scratch/large/tskalb01/
Twilight/TwilightSanger/40k.frg /scratch/large/tskalb01/Twilight/TwilightSanger/4k.frg /scratch/large/tskalb01/Twilight/TwilightSanger/singletons.frg /scratch/large/tskalb01/
Twilight/assembly/2015_02_10/superReadSequences_shr.frg

I am running it on a fat node with 32 cores and 720Gb of ram. For the last week, I have been in the extendClearRange step. The tail of the genome.tigstore directory is below.

tskalb01@public$ ls -lrt /scratch/large/tskalb01/Twilight/assembly/2015_02_13/CA/genome.tigStore/ | tail
-rw------- 1 tskalb01 unixuser 156402308 Feb 27 17:50 seqDB.v014.utg
-rw------- 1 tskalb01 unixuser 157446068 Feb 27 17:50 seqDB.v014.ctg
-rw------- 1 tskalb01 unixuser 223433774 Feb 27 17:50 seqDB.v014.dat
-rw------- 1 tskalb01 unixuser 156402308 Mar 3 05:20 seqDB.v015.utg
-rw------- 1 tskalb01 unixuser 157482116 Mar 3 05:20 seqDB.v015.ctg
-rw------- 1 tskalb01 unixuser 1549260086 Mar 3 05:20 seqDB.v015.dat
-rw------- 1 tskalb01 unixuser 156402308 Mar 6 15:11 seqDB.v016.utg
-rw------- 1 tskalb01 unixuser 157513268 Mar 6 15:11 seqDB.v016.ctg
-rw------- 1 tskalb01 unixuser 1842914932 Mar 6 15:11 seqDB.v016.dat
-rw------- 1 tskalb01 unixuser 442769720 Mar 8 12:02 seqDB.v017.dat

What I see in top on the fat node is

top - 12:09:46 up 78 days, 16:02, 1 user, load average: 1.00, 1.00, 1.00
Tasks: 486 total, 1 running, 485 sleeping, 0 stopped, 0 zombie
Cpu(s): 0.0%us, 0.5%sy, 0.0%ni, 99.5%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 794025324k total, 17937868k used, 776087456k free, 1968k buffers
Swap: 0k total, 0k used, 0k free, 2053244k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
20830 tskalb01 20 0 9.9g 9.7g 2088 D 6.6 1.3 293:03.07 extendClearRang
26303 tskalb01 20 0 17384 1664 1016 R 0.3 0.0 0:00.05 top
4959 tskalb01 20 0 13396 1556 1272 S 0.0 0.0 0:00.00 bash
4988 tskalb01 20 0 9196 1248 1044 S 0.0 0.0 0:00.00 12393601.queues
4989 tskalb01 20 0 42400 9048 2264 S 0.0 0.0 0:00.06 perl
20828 tskalb01 20 0 9196 1232 1040 S 0.0 0.0 0:00.00 sh
20829 tskalb01 20 0 9196 1232 1032 S 0.0 0.0 0:00.00 extendClearRang

Which seems like very low cpu usage. Any suggestions you could provide would be greatly appreciated.

Best regards,

Ted

Ted Kalbfleisch
Assistant Professor
School of Medicine
University of Louisville
221 F Baxter II
580 South Preston Street
Louisville, KY
40202

ted.kalbfleisch@louisville.edu

502-852-7495

Discussion

Brian Walenz - 2015-03-08

Hi-

It seems to be suffering from a poorly cached gkpStore. To load it into cache:

% cat gkpStore/s?? > /dev/null
% cat gkpStore/q?? > /dev/null

It should get up to one CPU - sadly, this isn't multi-threaded.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Ted Kalbfleisch - 2015-03-08

Is this equivalent?

cat genome.gkpStore/s > /dev/null
cat genome.gkpStore/q > /dev/null

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Ted Kalbfleisch - 2015-03-08

The formatter deleted the asterisk behind the /s and /q. Sorry about that.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Ted Kalbfleisch - 2015-03-09

This still isn't working (see top listing below). How much bang for the buck am I getting out of the ExtendClearRanges step anyway? I would certainly prefer that it ran through to completion, but is it reasonable for step seven to take on the order of two weeks?

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
20830 tskalb01 20 0 10.5g 10g 2088 D 2.7 1.4 407:31.74 extendClearRang

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Question on ExtendClearRanges

Group

Searches

Help

#19 Question on ExtendClearRanges

Discussion