Thread: [wgs-assembler-users] mer, mertrim running single threaded on large SMP machine

Brought to you by: brianwalenz, jasonmiller9704, mcschatz, skoren

wgs-assembler-users

[wgs-assembler-users] mer, mertrim running single threaded on large SMP machine

From: mathog <ma...@ca...> - 2015-01-14 17:33:14

It looks like I set the wgs spec file parameters wrong, because both mer 
and mertrim run single threaded (most of the time) on a machine with 48 
cpus and 530G of RAM. The data is 4 sets of illumina data.  The data was 
loaded into gatekeeper OK, and that was single threaded too, but I can 
see why that might be.  However, after that it began doing these very, 
very slowly:

/home/wgs_project/do_illumina_wgs/./0-mertrim/mertrim.sh 48 \
   > /home/wgs_project/do_illumina_wgs/./0-mertrim/..0048.err 2>&1

Most of the times when I have checked this either mer or mertrim is 
running at 99% cpu.  On some occasions mertrim has gone a bit higher:

14097 wgsuser    39  19 18.7g  16g 1304 S 401.7  3.3  49:43.93 merTrim

That is still poor use of this machine, leaving it mostly idle.

Here is the spec file, minus all comments:

utgErrorRate=0.03
utgErrorLimit=2.5
ovlErrorRate=0.06
cnsErrorRate=0.10
cgwErrorRate=0.10
merSize = 22
overlapper=ovl
unitigger = bog
utgBubblePopping = 1
merylMemory   = 128000
merylThreads    = 25
ovlHashBits=25
ovlHashBlockLength=180000000
ovlThreads          = 2
ovlConcurrency                = 20
ovlRefBlockSize  = 32000000
ovlStoreMemory = 8192 # Mbp
frgCorrThreads    = 10
frgCorrConcurrency = 3
ovlCorrBatchSize  = 1000000
ovlCorrConcurrency = 25
cnsConcurrency   = 16
useGrid          = 0
scriptOnGrid     = 0
s_300_qseq.frg
s_1000_qseq.frg
s_3000_qseq.frg
s_5000_qseq.frg

What needs to be tweaked to get mer, mertrim to use more of the 
machine's resources?  This spec file was based on a couple found here 
and there on the web, and I'm sure that many of the memory/threads 
parameters are not optimal.

Thank you,

David Mathog
ma...@ca...
Manager, Sequence Analysis Facility, Biology Division, Caltech

Re: [wgs-assembler-users] mer, mertrim running single threaded on large SMP machine

From: Brian W. <th...@gm...> - 2015-01-15 17:03:02

The option you're looking for is mbtThreads, with a default of 4.

Also look into option mbtBatchSize, which sets how many reads to process
per job.  The default is 1 million, and you've already got at least 48
jobs, so this is probably not an issue.

You can increase the number of jobs running at once with mbtConcurrency.
You should be able to run 20 with the current job size.  Dropping the batch
size should decrease the memory used per job, and so you can then run more
jobs.

On the current jobs, are the WORKING files non-zero size?  If so, then the
compute should be in the multi-threaded stage, and it should be using 4
CPUs.  Check the mertrim.sh (or similar) script in the 0-mertrim directory
to verify that it has "-t 4".  Adding "-v" will make it report the number
of reads processed during the compute, but it won't tell you the number of
threads.  Both of these are to check that the job is done with the data
structure building -- after two days, it definitely should be.

So, in summary, I don't know why you're not getting multiple CPUs on
these.  You can work around the problem by dropping the batch size to make
jobs with about 8gb memory (smaller than 512/48), then run 48 jobs in
parallel.

b

On Wed, Jan 14, 2015 at 12:33 PM, mathog <ma...@ca...> wrote:

> It looks like I set the wgs spec file parameters wrong, because both mer
> and mertrim run single threaded (most of the time) on a machine with 48
> cpus and 530G of RAM. The data is 4 sets of illumina data.  The data was
> loaded into gatekeeper OK, and that was single threaded too, but I can
> see why that might be.  However, after that it began doing these very,
> very slowly:
>
> /home/wgs_project/do_illumina_wgs/./0-mertrim/mertrim.sh 48 \
>    > /home/wgs_project/do_illumina_wgs/./0-mertrim/..0048.err 2>&1
>
> Most of the times when I have checked this either mer or mertrim is
> running at 99% cpu.  On some occasions mertrim has gone a bit higher:
>
> 14097 wgsuser    39  19 18.7g  16g 1304 S 401.7  3.3  49:43.93 merTrim
>
> That is still poor use of this machine, leaving it mostly idle.
>
> Here is the spec file, minus all comments:
>
> utgErrorRate=0.03
> utgErrorLimit=2.5
> ovlErrorRate=0.06
> cnsErrorRate=0.10
> cgwErrorRate=0.10
> merSize = 22
> overlapper=ovl
> unitigger = bog
> utgBubblePopping = 1
> merylMemory   = 128000
> merylThreads    = 25
> ovlHashBits=25
> ovlHashBlockLength=180000000
> ovlThreads          = 2
> ovlConcurrency                = 20
> ovlRefBlockSize  = 32000000
> ovlStoreMemory = 8192 # Mbp
> frgCorrThreads    = 10
> frgCorrConcurrency = 3
> ovlCorrBatchSize  = 1000000
> ovlCorrConcurrency = 25
> cnsConcurrency   = 16
> useGrid          = 0
> scriptOnGrid     = 0
> s_300_qseq.frg
> s_1000_qseq.frg
> s_3000_qseq.frg
> s_5000_qseq.frg
>
> What needs to be tweaked to get mer, mertrim to use more of the
> machine's resources?  This spec file was based on a couple found here
> and there on the web, and I'm sure that many of the memory/threads
> parameters are not optimal.
>
> Thank you,
>
> David Mathog
> ma...@ca...
> Manager, Sequence Analysis Facility, Biology Division, Caltech
>
>
> ------------------------------------------------------------------------------
> New Year. New Location. New Benefits. New Data Center in Ashburn, VA.
> GigeNET is offering a free month of service with a new server in Ashburn.
> Choose from 2 high performing configs, both with 100TB of bandwidth.
> Higher redundancy.Lower latency.Increased capacity.Completely compliant.
> http://p.sf.net/sfu/gigenet
> _______________________________________________
> wgs-assembler-users mailing list
> wgs...@li...
> https://lists.sourceforge.net/lists/listinfo/wgs-assembler-users
>

Re: [wgs-assembler-users] mer, mertrim running single threaded on large SMP machine

From: mathog <ma...@ca...> - 2015-01-15 18:00:08

On 15-Jan-2015 09:02, Brian Walenz wrote:
> The option you're looking for is mbtThreads, with a default of 4.
> 
> Also look into option mbtBatchSize, which sets how many reads to 
> process
> per job.  The default is 1 million, and you've already got at least 48
> jobs, so this is probably not an issue.

(snip)

> So, in summary, I don't know why you're not getting multiple CPUs on
> these.  You can work around the problem by dropping the batch size to 
> make
> jobs with about 8gb memory (smaller than 512/48), then run 48 jobs in
> parallel.

So many options, so little time.  I don't suppose anybody has put 
together a script that asks for the relevant system and data information 
and then emits a SPEC file to run at something approximating optimal 
speed on the equipment at hand?  The input would be something like (no 
doubt I'm leaving out key information):

primary node:
    RAM=, CPU=, DISK= #fill in the max to use, actual could be more
cluster: Y # N if none
    type=older N=10, RAM=, CPU=, DISK=
    type=newer N=20, RAM=, CPU=, DISK=
    queue_system=SGE
FRG types:  2 #at least 1
    Illumina N=3, totalreads=
    Sanger N=2, totalreads=

As it is now, there are a lot of parameters to fiddle with

    runCA -options | wc
     184    <- !!!!

which probably all make perfect sense to people experienced with this 
software but which are fairly mysterious when first encountered

In any case, I did try modifying the -t parameter on 
0-mertrim/mertrim.sh while the jobs were running, and the new settings 
"took" as each new job started.  The run times were:

-t   ~minutes
4    22
16   14
40   12-13

So there isn't much to be gained by pushing that parameter up.

> You can increase the number of jobs running at once with 
> mbtConcurrency.

Kind of my point about the script, I overlooked that one.  I did use
merylThreads, but didn't realize that trim and count used different 
parameters.  Concurrency x Threads, that is simultaneous jobs x 
cpus/job?
There are 7 of the former parameters and 6 of the latter.  Presumably if 
I spent a couple of hours reading all the documentation (which for some 
reason has been loading really, really slowly from sourceforge) I could 
make a guess at what would probably work best.  The hypothetical script 
I alluded to would be a lot more convenient!

Thanks,

David Mathog
ma...@ca...
Manager, Sequence Analysis Facility, Biology Division, Caltech

Re: [wgs-assembler-users] mer, mertrim running single threaded on large SMP machine

From: Brian W. <th...@gm...> - 2015-01-15 23:35:16

I can't argue with the option bloat in CA.  There are a lot of options that
should be removed or shouldn't have been exposed in the first place.

This is the first time I've seen merTrim be a bottleneck.  I suspect it
might be spending lots of time building data structures.  I'll admit that
runCA support for this part is weak; on large assemblies, I run the
trimming by hand.  The merTrim binary has a '-enablecache' option that will
build, dump, and reuse the data structures between jobs.  There isn't runCA
support for it though.

Ah!  if that is your bottleneck, then we are moving the wrong way by making
jobs smaller.  We want to be generating one job with 48 threads enabled.
So, build data structures once, then let 48 threads process all the reads
in the same job.  I was thinking that you're not getting multiple threads
for some reason.

I'm also none too pleased with sourceforge performance.  They killed off
support for mediawiki, forcing everyone to either use rewrite pages for
their inferior wiki (no tables in the markup!) or install individual
mediawiki instances.  It's free, so I can't really complain too much.

On Thu, Jan 15, 2015 at 12:59 PM, mathog <ma...@ca...> wrote:

> On 15-Jan-2015 09:02, Brian Walenz wrote:
>
>> The option you're looking for is mbtThreads, with a default of 4.
>>
>> Also look into option mbtBatchSize, which sets how many reads to process
>> per job.  The default is 1 million, and you've already got at least 48
>> jobs, so this is probably not an issue.
>>
>
> (snip)
>
>  So, in summary, I don't know why you're not getting multiple CPUs on
>> these.  You can work around the problem by dropping the batch size to make
>> jobs with about 8gb memory (smaller than 512/48), then run 48 jobs in
>> parallel.
>>
>
> So many options, so little time.  I don't suppose anybody has put together
> a script that asks for the relevant system and data information and then
> emits a SPEC file to run at something approximating optimal speed on the
> equipment at hand?  The input would be something like (no doubt I'm leaving
> out key information):
>
> primary node:
>    RAM=, CPU=, DISK= #fill in the max to use, actual could be more
> cluster: Y # N if none
>    type=older N=10, RAM=, CPU=, DISK=
>    type=newer N=20, RAM=, CPU=, DISK=
>    queue_system=SGE
> FRG types:  2 #at least 1
>    Illumina N=3, totalreads=
>    Sanger N=2, totalreads=
>
> As it is now, there are a lot of parameters to fiddle with
>
>    runCA -options | wc
>     184    <- !!!!
>
> which probably all make perfect sense to people experienced with this
> software but which are fairly mysterious when first encountered
>
> In any case, I did try modifying the -t parameter on 0-mertrim/mertrim.sh
> while the jobs were running, and the new settings "took" as each new job
> started.  The run times were:
>
> -t   ~minutes
> 4    22
> 16   14
> 40   12-13
>
> So there isn't much to be gained by pushing that parameter up.
>
>  You can increase the number of jobs running at once with mbtConcurrency.
>>
>
> Kind of my point about the script, I overlooked that one.  I did use
> merylThreads, but didn't realize that trim and count used different
> parameters.  Concurrency x Threads, that is simultaneous jobs x cpus/job?
> There are 7 of the former parameters and 6 of the latter.  Presumably if I
> spent a couple of hours reading all the documentation (which for some
> reason has been loading really, really slowly from sourceforge) I could
> make a guess at what would probably work best.  The hypothetical script I
> alluded to would be a lot more convenient!
>
> Thanks,
>
> David Mathog
> ma...@ca...
> Manager, Sequence Analysis Facility, Biology Division, Caltech
>

[wgs-assembler-users] S. purpuratus parameters

From: mathog <ma...@ca...> - 2015-01-20 20:37:18

(This is a followup to:  Re: [wgs-assembler-users] mer, mertrim running 
single threaded on large SMP machine)

On 19-Jan-2015 18:52, Brian Walenz wrote:


> I didn't poke through the data much, just enough to see it was 
> Illumina.
> My immediate reaction is to suggest trying masurca.  It handles 
> illumina
> much much better than plain CA, but does probably require more reads
> because more crap gets filtered out.

Will look into that.  Also found Meraculous, also for Illumina.  (So 
many assemblers, so little time...)

> With your current assembly, I see two things I don't like:  1) bog 
> instead
> of bogart, 2) 3% error rate.
> 
> 
> You can do some experiments with the current assembly without too much
> pain.  All we're going to do is run bogart a few times, and look at the
> resulting unitigs.  No consensus generation, just unitig layouts.
> 
> On a COPY of the gkpStore, run
> 
> gatekeeper --revertclear OBTCHIMERA *gkpStore

Did this:

cp -r ..gkpStore                  copygkpStore
cp    ..gkpStore.err              copygkpStore.err
cp    ..gkpStore.errorLog         copygkpStore.errorLog
cp    ..gkpStore.fastqUIDmap      copygkpStore.fastqUIDmap
cp    ..gkpStore.info             copygkpStore.info
export PATH=$PATH:/home/wgs_project/wgs/Linux-amd64/bin
#
gatekeeper --revertclear OBTCHIMERA copygkpStore

> 
> This will restore the clear ranges to the state they had just after
> trimming, and just before unitigging.
> 
> Then a bunch of iterations of bogart:
> 
> bogart -G *.gkpStore -O *.ovlStore -T e10.tigStore -o test.bogart -eg 
> 0.10
> -Eg 2.5 -em 0.10 -Em 2.5
> 
> Where the eg and em parameter is varied between 2 and 6 (percent 
> error).
> By default, overlaps are generated to only 6% error, not that higher 
> would
> be feasible with short reads.  The Eg and Em parameters measure overlap
> error as 'number of errors', to get around the problem of a 50-base 
> overlap
> with one error resulting in 2% error.  You can mostly ignore this for 
> the
> higher error rates.

Sorry, the wild card in that line is throwing me. Also I'm confused if 
you mean big Eg,Em (where 2.5 is in the range specified) or little eg,em 
(where values are not in that range).  Given what I called the copy, is 
this what you want to run?

VAL=2.5 #2.5 percent
bogart -G copygkpStore -O copyovlStore -T e10.tigStore -o test.bogart \
   -eg 0.10 -Eg $VAL -em 0.10 -Em $VAL
tigStore -g copygkpStore -t e10.tigStore 1 -U -d sizes -s 800000000
VAL=3.0 #3.0 percent
bogart -G copygkpStore -O copyovlStore -T e10.tigStore -o test.bogart \
   -eg 0.10 -Eg $VAL -em 0.10 -Em $VAL
tigStore -g copygkpStore -t e10.tigStore 1 -U -d sizes -s 800000000
# etc.

The bogart command fails because "'copyovlStore' is not an 
ovelrapStore".
Use the overlapStore from the first run in that command?

    (note the typo in the error message, that's what it says)

Erase the e10.tigStore between runs?
Do something to the overlapStore between runs?

running tigStore on the original (not so useful) run gave this:

tigStore -g ..gkpStore -t ..tigStore 1 -U -d sizes -s copygkpStore.info
utgLenUnassigned n10 siz        528 sum  304316578 idx     479977
utgLenUnassigned n20 siz        400 sum  608633078 idx    1148939
utgLenUnassigned n30 siz        291 sum  912949618 idx    2026098
utgLenUnassigned n40 siz        179 sum 1217266213 idx    3353557
utgLenUnassigned n50 siz        150 sum 1521582630 idx    5307416
utgLenUnassigned n60 siz        145 sum 1825899170 idx    7367619
utgLenUnassigned n70 siz        126 sum 2130215760 idx    9584603
utgLenUnassigned n80 siz        122 sum 2434532234 idx   12033900
utgLenUnassigned n90 siz        102 sum 2738848751 idx   14689647
utgLenUnassigned sum 3043165239 (genomeSize 0)
utgLenUnassigned num   18384123
utgLenUnassigned ave        165
tigLenSingleton n10 siz        150 sum  142617831 idx     907450
tigLenSingleton n20 siz        148 sum  285235697 idx    1865321
tigLenSingleton n30 siz        145 sum  427853436 idx    2837943
tigLenSingleton n40 siz        134 sum  570471289 idx    3850926
tigLenSingleton n50 siz        125 sum  713089018 idx    4969720
tigLenSingleton n60 siz        123 sum  855706883 idx    6116341
tigLenSingleton n70 siz        121 sum  998324590 idx    7282617
tigLenSingleton n80 siz        108 sum 1140942414 idx    8518814
tigLenSingleton n90 siz         87 sum 1283560221 idx    9981733
tigLenSingleton sum 1426177984 (genomeSize 0)
tigLenSingleton num   11893391
tigLenSingleton ave        119
tigLenAssembled n10 siz        630 sum  161699171 idx     231237
tigLenAssembled n20 siz        517 sum  323397821 idx     516513
tigLenAssembled n30 siz        443 sum  485096301 idx     855316
tigLenAssembled n40 siz        389 sum  646795227 idx    1245703
tigLenAssembled n50 siz        335 sum  808493956 idx    1690952
tigLenAssembled n60 siz        266 sum  970192570 idx    2232349
tigLenAssembled n70 siz        205 sum 1131891234 idx    2921817
tigLenAssembled n80 siz        157 sum 1293589836 idx    3836637
tigLenAssembled n90 siz        136 sum 1455288608 idx    4933675
tigLenAssembled sum 1616987255 (genomeSize 0)
tigLenAssembled num    6490732
tigLenAssembled ave        249

Presumably we want to see many more of the tigLenAssembled and fewer of 
the utgLenUnassigned and tigLenSingleton.

Thanks,

David Mathog
ma...@ca...
Manager, Sequence Analysis Facility, Biology Division, Caltech

Re: [wgs-assembler-users] S. purpuratus parameters

From: mathog <ma...@ca...> - 2015-01-21 00:03:14

On 20-Jan-2015 12:37, mathog wrote:

> VAL=2.5 #2.5 percent
> bogart -G copygkpStore -O copyovlStore -T e10.tigStore -o test.bogart \
>    -eg 0.10 -Eg $VAL -em 0.10 -Em $VAL
> tigStore -g copygkpStore -t e10.tigStore 1 -U -d sizes -s 800000000

Tried this:

VAL=2.5
bogart -G copygkpStore -O ..ovlStore -T e10.tigStore -o test.bogart  \
  -eg 0.10 -Eg $VAL -em 0.10 -Em $VAL 2>&1 | tee bogart_25.log

and it ran along happily until dropping dead here:

...
OverlapCache()-- Loading overlap information: overlaps processed   
4128333921 (098.08%) loaded   4128333921 (098.08%) (at read iid 
152548896)
OverlapCache()-- Loading overlap information: overlaps processed   
4158431504 (098.79%) loaded   4158431504 (098.79%) (at read iid 
153676157)
OverlapCache()-- Loading overlap information: overlaps processed   
4188318291 (099.50%) loaded   4188318291 (099.50%) (at read iid 
154804535)
OverlapCache()-- Loading overlap information: overlaps processed   
4209225138 (100.00%) loaded   4209225138 (100.00%)
setLogFile()-- Now logging to 'test.bogart.002.bestOverlapGraph'
setLogFile()-- Now logging to 'test.bogart.004.ChunkGraph'
setLogFile()-- Now logging to 'test.bogart.005.buildUnitigs'
setLogFile()-- Now logging to 'test.bogart.006.placeContains'
setLogFile()-- Now logging to 'test.bogart.007.placeZombies'
setLogFile()-- Now logging to 'test.bogart.008.mergeSplitJoin'
setLogFile()-- Now logging to 'test.bogart.009.popBubbles'
setLogFile()-- Now logging to 'test.bogart.010.mergeSplitJoin'
setLogFile()-- Now logging to 'test.bogart.011.cleanup'
setLogFile()-- Now logging to 'test.bogart.012.setParentAndHang'
setLogFile()-- Now logging to 'test.bogart.013.output'
MultiAlignStore::openDB()-- Failed to open 
'e10.tigStore/seqDB.v001.p1010.dat': Too many open files
MultiAlignStore::openDB()-- Trying again.
MultiAlignStore::openDB()-- Failed to open 
'e10.tigStore/seqDB.v001.p1010.dat': Too many open files
WARNING: open file 'test.bogart.013.output.thr000'

Not suprisingly, tigStore wouldn't work with what was left:

% tigStore -g copygkpStore -t e10.tigStore 1 -U -d sizes -s 800000000
MultiAlignStore::MultiAlignStore()-- ERROR, didn't find any unitigs or 
contigs in the store.
MultiAlignStore::MultiAlignStore()--        asked for store 
'e10.tigStore', correct?
MultiAlignStore::MultiAlignStore()--        asked for version '1', 
correct?
MultiAlignStore::MultiAlignStore()--        asked for partition unitig=0 
contig=0, correct?
MultiAlignStore::MultiAlignStore()--        asked for writable=0 
inplace=0 append=0, correct?

System information:
% cat /etc/centos-release
CentOS release 6.6 (Final)
% ulimit
unlimited
% ulimit -n
1024
% cat /proc/sys/fs/file-max
52605611

The version of wgs is trunk downloaded and built on July 3, 2014.

Suggestions?

Thanks,

David Mathog
ma...@ca...
Manager, Sequence Analysis Facility, Biology Division, Caltech

Re: [wgs-assembler-users] S. purpuratus parameters

From: Ludovic M. <lud...@un...> - 2015-01-21 08:40:56

on debian-like at least, be root:

#setting the maximum number of file open
sed -i 's/#<domain>      <type>  <item> <value>/#<domain>      <type> 
<item>         <value>\n\*               soft nofile          65536\n#/' 
/etc/security/limits.conf

though it might be tweaked for RH flavors.
Should be run on every node.
best,


On 21/01/15 01:03, mathog wrote:
> On 20-Jan-2015 12:37, mathog wrote:
>
>> VAL=2.5 #2.5 percent
>> bogart -G copygkpStore -O copyovlStore -T e10.tigStore -o test.bogart \
>>     -eg 0.10 -Eg $VAL -em 0.10 -Em $VAL
>> tigStore -g copygkpStore -t e10.tigStore 1 -U -d sizes -s 800000000
> Tried this:
>
> VAL=2.5
> bogart -G copygkpStore -O ..ovlStore -T e10.tigStore -o test.bogart  \
>    -eg 0.10 -Eg $VAL -em 0.10 -Em $VAL 2>&1 | tee bogart_25.log
>
> and it ran along happily until dropping dead here:
>
> ...
> OverlapCache()-- Loading overlap information: overlaps processed
> 4128333921 (098.08%) loaded   4128333921 (098.08%) (at read iid
> 152548896)
> OverlapCache()-- Loading overlap information: overlaps processed
> 4158431504 (098.79%) loaded   4158431504 (098.79%) (at read iid
> 153676157)
> OverlapCache()-- Loading overlap information: overlaps processed
> 4188318291 (099.50%) loaded   4188318291 (099.50%) (at read iid
> 154804535)
> OverlapCache()-- Loading overlap information: overlaps processed
> 4209225138 (100.00%) loaded   4209225138 (100.00%)
> setLogFile()-- Now logging to 'test.bogart.002.bestOverlapGraph'
> setLogFile()-- Now logging to 'test.bogart.004.ChunkGraph'
> setLogFile()-- Now logging to 'test.bogart.005.buildUnitigs'
> setLogFile()-- Now logging to 'test.bogart.006.placeContains'
> setLogFile()-- Now logging to 'test.bogart.007.placeZombies'
> setLogFile()-- Now logging to 'test.bogart.008.mergeSplitJoin'
> setLogFile()-- Now logging to 'test.bogart.009.popBubbles'
> setLogFile()-- Now logging to 'test.bogart.010.mergeSplitJoin'
> setLogFile()-- Now logging to 'test.bogart.011.cleanup'
> setLogFile()-- Now logging to 'test.bogart.012.setParentAndHang'
> setLogFile()-- Now logging to 'test.bogart.013.output'
> MultiAlignStore::openDB()-- Failed to open
> 'e10.tigStore/seqDB.v001.p1010.dat': Too many open files
> MultiAlignStore::openDB()-- Trying again.
> MultiAlignStore::openDB()-- Failed to open
> 'e10.tigStore/seqDB.v001.p1010.dat': Too many open files
> WARNING: open file 'test.bogart.013.output.thr000'
>
> Not suprisingly, tigStore wouldn't work with what was left:
>
> % tigStore -g copygkpStore -t e10.tigStore 1 -U -d sizes -s 800000000
> MultiAlignStore::MultiAlignStore()-- ERROR, didn't find any unitigs or
> contigs in the store.
> MultiAlignStore::MultiAlignStore()--        asked for store
> 'e10.tigStore', correct?
> MultiAlignStore::MultiAlignStore()--        asked for version '1',
> correct?
> MultiAlignStore::MultiAlignStore()--        asked for partition unitig=0
> contig=0, correct?
> MultiAlignStore::MultiAlignStore()--        asked for writable=0
> inplace=0 append=0, correct?
>
> System information:
> % cat /etc/centos-release
> CentOS release 6.6 (Final)
> % ulimit
> unlimited
> % ulimit -n
> 1024
> % cat /proc/sys/fs/file-max
> 52605611
>
> The version of wgs is trunk downloaded and built on July 3, 2014.
>
> Suggestions?
>
> Thanks,
>
> David Mathog
> ma...@ca...
> Manager, Sequence Analysis Facility, Biology Division, Caltech
>
> ------------------------------------------------------------------------------
> New Year. New Location. New Benefits. New Data Center in Ashburn, VA.
> GigeNET is offering a free month of service with a new server in Ashburn.
> Choose from 2 high performing configs, both with 100TB of bandwidth.
> Higher redundancy.Lower latency.Increased capacity.Completely compliant.
> http://p.sf.net/sfu/gigenet
> _______________________________________________
> wgs-assembler-users mailing list
> wgs...@li...
> https://lists.sourceforge.net/lists/listinfo/wgs-assembler-users

Re: [wgs-assembler-users] S. purpuratus parameters

From: mathog <ma...@ca...> - 2015-01-21 18:51:55

On 21-Jan-2015 00:39, Ludovic Mallet wrote:
> on debian-like at least, be root:
> 
> #setting the maximum number of file open
> sed -i 's/#<domain>      <type>  <item> <value>/#<domain>      <type>
> <item>         <value>\n\*               soft nofile          
> 65536\n#/'
> /etc/security/limits.conf
> 
> though it might be tweaked for RH flavors.
> Should be run on every node.

Added to limits.conf:
mathog           hard    nofiles         60000
mathog           soft    nofiles         60000

logged out, logged back in, and saw:
% ulimit -Sn
  1024
% ulimit -Hn
  4096
% ulimit -n
  1024
% ulimit -n 60000
  bash: ulimit: open files: cannot modify limit: Operation not permitted
% ulimit -n 4096
% ulimit -n
4096

The 4096 limit seems to be coming from /proc/1/limits which has:
Max open files            1024                 4096                 
files

root can set ulimit -n as high as it wants, while also running in bash.

Not sure where the 4096 being applied to normal processes is coming 
from.

Regards,

David Mathog
ma...@ca...
Manager, Sequence Analysis Facility, Biology Division, Caltech