From: mathog <ma...@ca...> - 2015-01-14 17:33:14
|
It looks like I set the wgs spec file parameters wrong, because both mer and mertrim run single threaded (most of the time) on a machine with 48 cpus and 530G of RAM. The data is 4 sets of illumina data. The data was loaded into gatekeeper OK, and that was single threaded too, but I can see why that might be. However, after that it began doing these very, very slowly: /home/wgs_project/do_illumina_wgs/./0-mertrim/mertrim.sh 48 \ > /home/wgs_project/do_illumina_wgs/./0-mertrim/..0048.err 2>&1 Most of the times when I have checked this either mer or mertrim is running at 99% cpu. On some occasions mertrim has gone a bit higher: 14097 wgsuser 39 19 18.7g 16g 1304 S 401.7 3.3 49:43.93 merTrim That is still poor use of this machine, leaving it mostly idle. Here is the spec file, minus all comments: utgErrorRate=0.03 utgErrorLimit=2.5 ovlErrorRate=0.06 cnsErrorRate=0.10 cgwErrorRate=0.10 merSize = 22 overlapper=ovl unitigger = bog utgBubblePopping = 1 merylMemory = 128000 merylThreads = 25 ovlHashBits=25 ovlHashBlockLength=180000000 ovlThreads = 2 ovlConcurrency = 20 ovlRefBlockSize = 32000000 ovlStoreMemory = 8192 # Mbp frgCorrThreads = 10 frgCorrConcurrency = 3 ovlCorrBatchSize = 1000000 ovlCorrConcurrency = 25 cnsConcurrency = 16 useGrid = 0 scriptOnGrid = 0 s_300_qseq.frg s_1000_qseq.frg s_3000_qseq.frg s_5000_qseq.frg What needs to be tweaked to get mer, mertrim to use more of the machine's resources? This spec file was based on a couple found here and there on the web, and I'm sure that many of the memory/threads parameters are not optimal. Thank you, David Mathog ma...@ca... Manager, Sequence Analysis Facility, Biology Division, Caltech |
From: Brian W. <th...@gm...> - 2015-01-15 17:03:02
|
The option you're looking for is mbtThreads, with a default of 4. Also look into option mbtBatchSize, which sets how many reads to process per job. The default is 1 million, and you've already got at least 48 jobs, so this is probably not an issue. You can increase the number of jobs running at once with mbtConcurrency. You should be able to run 20 with the current job size. Dropping the batch size should decrease the memory used per job, and so you can then run more jobs. On the current jobs, are the WORKING files non-zero size? If so, then the compute should be in the multi-threaded stage, and it should be using 4 CPUs. Check the mertrim.sh (or similar) script in the 0-mertrim directory to verify that it has "-t 4". Adding "-v" will make it report the number of reads processed during the compute, but it won't tell you the number of threads. Both of these are to check that the job is done with the data structure building -- after two days, it definitely should be. So, in summary, I don't know why you're not getting multiple CPUs on these. You can work around the problem by dropping the batch size to make jobs with about 8gb memory (smaller than 512/48), then run 48 jobs in parallel. b On Wed, Jan 14, 2015 at 12:33 PM, mathog <ma...@ca...> wrote: > It looks like I set the wgs spec file parameters wrong, because both mer > and mertrim run single threaded (most of the time) on a machine with 48 > cpus and 530G of RAM. The data is 4 sets of illumina data. The data was > loaded into gatekeeper OK, and that was single threaded too, but I can > see why that might be. However, after that it began doing these very, > very slowly: > > /home/wgs_project/do_illumina_wgs/./0-mertrim/mertrim.sh 48 \ > > /home/wgs_project/do_illumina_wgs/./0-mertrim/..0048.err 2>&1 > > Most of the times when I have checked this either mer or mertrim is > running at 99% cpu. On some occasions mertrim has gone a bit higher: > > 14097 wgsuser 39 19 18.7g 16g 1304 S 401.7 3.3 49:43.93 merTrim > > That is still poor use of this machine, leaving it mostly idle. > > Here is the spec file, minus all comments: > > utgErrorRate=0.03 > utgErrorLimit=2.5 > ovlErrorRate=0.06 > cnsErrorRate=0.10 > cgwErrorRate=0.10 > merSize = 22 > overlapper=ovl > unitigger = bog > utgBubblePopping = 1 > merylMemory = 128000 > merylThreads = 25 > ovlHashBits=25 > ovlHashBlockLength=180000000 > ovlThreads = 2 > ovlConcurrency = 20 > ovlRefBlockSize = 32000000 > ovlStoreMemory = 8192 # Mbp > frgCorrThreads = 10 > frgCorrConcurrency = 3 > ovlCorrBatchSize = 1000000 > ovlCorrConcurrency = 25 > cnsConcurrency = 16 > useGrid = 0 > scriptOnGrid = 0 > s_300_qseq.frg > s_1000_qseq.frg > s_3000_qseq.frg > s_5000_qseq.frg > > What needs to be tweaked to get mer, mertrim to use more of the > machine's resources? This spec file was based on a couple found here > and there on the web, and I'm sure that many of the memory/threads > parameters are not optimal. > > Thank you, > > David Mathog > ma...@ca... > Manager, Sequence Analysis Facility, Biology Division, Caltech > > > ------------------------------------------------------------------------------ > New Year. New Location. New Benefits. New Data Center in Ashburn, VA. > GigeNET is offering a free month of service with a new server in Ashburn. > Choose from 2 high performing configs, both with 100TB of bandwidth. > Higher redundancy.Lower latency.Increased capacity.Completely compliant. > http://p.sf.net/sfu/gigenet > _______________________________________________ > wgs-assembler-users mailing list > wgs...@li... > https://lists.sourceforge.net/lists/listinfo/wgs-assembler-users > |
From: mathog <ma...@ca...> - 2015-01-15 18:00:08
|
On 15-Jan-2015 09:02, Brian Walenz wrote: > The option you're looking for is mbtThreads, with a default of 4. > > Also look into option mbtBatchSize, which sets how many reads to > process > per job. The default is 1 million, and you've already got at least 48 > jobs, so this is probably not an issue. (snip) > So, in summary, I don't know why you're not getting multiple CPUs on > these. You can work around the problem by dropping the batch size to > make > jobs with about 8gb memory (smaller than 512/48), then run 48 jobs in > parallel. So many options, so little time. I don't suppose anybody has put together a script that asks for the relevant system and data information and then emits a SPEC file to run at something approximating optimal speed on the equipment at hand? The input would be something like (no doubt I'm leaving out key information): primary node: RAM=, CPU=, DISK= #fill in the max to use, actual could be more cluster: Y # N if none type=older N=10, RAM=, CPU=, DISK= type=newer N=20, RAM=, CPU=, DISK= queue_system=SGE FRG types: 2 #at least 1 Illumina N=3, totalreads= Sanger N=2, totalreads= As it is now, there are a lot of parameters to fiddle with runCA -options | wc 184 <- !!!! which probably all make perfect sense to people experienced with this software but which are fairly mysterious when first encountered In any case, I did try modifying the -t parameter on 0-mertrim/mertrim.sh while the jobs were running, and the new settings "took" as each new job started. The run times were: -t ~minutes 4 22 16 14 40 12-13 So there isn't much to be gained by pushing that parameter up. > You can increase the number of jobs running at once with > mbtConcurrency. Kind of my point about the script, I overlooked that one. I did use merylThreads, but didn't realize that trim and count used different parameters. Concurrency x Threads, that is simultaneous jobs x cpus/job? There are 7 of the former parameters and 6 of the latter. Presumably if I spent a couple of hours reading all the documentation (which for some reason has been loading really, really slowly from sourceforge) I could make a guess at what would probably work best. The hypothetical script I alluded to would be a lot more convenient! Thanks, David Mathog ma...@ca... Manager, Sequence Analysis Facility, Biology Division, Caltech |
From: Brian W. <th...@gm...> - 2015-01-15 23:35:16
|
I can't argue with the option bloat in CA. There are a lot of options that should be removed or shouldn't have been exposed in the first place. This is the first time I've seen merTrim be a bottleneck. I suspect it might be spending lots of time building data structures. I'll admit that runCA support for this part is weak; on large assemblies, I run the trimming by hand. The merTrim binary has a '-enablecache' option that will build, dump, and reuse the data structures between jobs. There isn't runCA support for it though. Ah! if that is your bottleneck, then we are moving the wrong way by making jobs smaller. We want to be generating one job with 48 threads enabled. So, build data structures once, then let 48 threads process all the reads in the same job. I was thinking that you're not getting multiple threads for some reason. I'm also none too pleased with sourceforge performance. They killed off support for mediawiki, forcing everyone to either use rewrite pages for their inferior wiki (no tables in the markup!) or install individual mediawiki instances. It's free, so I can't really complain too much. On Thu, Jan 15, 2015 at 12:59 PM, mathog <ma...@ca...> wrote: > On 15-Jan-2015 09:02, Brian Walenz wrote: > >> The option you're looking for is mbtThreads, with a default of 4. >> >> Also look into option mbtBatchSize, which sets how many reads to process >> per job. The default is 1 million, and you've already got at least 48 >> jobs, so this is probably not an issue. >> > > (snip) > > So, in summary, I don't know why you're not getting multiple CPUs on >> these. You can work around the problem by dropping the batch size to make >> jobs with about 8gb memory (smaller than 512/48), then run 48 jobs in >> parallel. >> > > So many options, so little time. I don't suppose anybody has put together > a script that asks for the relevant system and data information and then > emits a SPEC file to run at something approximating optimal speed on the > equipment at hand? The input would be something like (no doubt I'm leaving > out key information): > > primary node: > RAM=, CPU=, DISK= #fill in the max to use, actual could be more > cluster: Y # N if none > type=older N=10, RAM=, CPU=, DISK= > type=newer N=20, RAM=, CPU=, DISK= > queue_system=SGE > FRG types: 2 #at least 1 > Illumina N=3, totalreads= > Sanger N=2, totalreads= > > As it is now, there are a lot of parameters to fiddle with > > runCA -options | wc > 184 <- !!!! > > which probably all make perfect sense to people experienced with this > software but which are fairly mysterious when first encountered > > In any case, I did try modifying the -t parameter on 0-mertrim/mertrim.sh > while the jobs were running, and the new settings "took" as each new job > started. The run times were: > > -t ~minutes > 4 22 > 16 14 > 40 12-13 > > So there isn't much to be gained by pushing that parameter up. > > You can increase the number of jobs running at once with mbtConcurrency. >> > > Kind of my point about the script, I overlooked that one. I did use > merylThreads, but didn't realize that trim and count used different > parameters. Concurrency x Threads, that is simultaneous jobs x cpus/job? > There are 7 of the former parameters and 6 of the latter. Presumably if I > spent a couple of hours reading all the documentation (which for some > reason has been loading really, really slowly from sourceforge) I could > make a guess at what would probably work best. The hypothetical script I > alluded to would be a lot more convenient! > > Thanks, > > David Mathog > ma...@ca... > Manager, Sequence Analysis Facility, Biology Division, Caltech > |
From: mathog <ma...@ca...> - 2015-01-20 20:37:18
|
(This is a followup to: Re: [wgs-assembler-users] mer, mertrim running single threaded on large SMP machine) On 19-Jan-2015 18:52, Brian Walenz wrote: > I didn't poke through the data much, just enough to see it was > Illumina. > My immediate reaction is to suggest trying masurca. It handles > illumina > much much better than plain CA, but does probably require more reads > because more crap gets filtered out. Will look into that. Also found Meraculous, also for Illumina. (So many assemblers, so little time...) > With your current assembly, I see two things I don't like: 1) bog > instead > of bogart, 2) 3% error rate. > > > You can do some experiments with the current assembly without too much > pain. All we're going to do is run bogart a few times, and look at the > resulting unitigs. No consensus generation, just unitig layouts. > > On a COPY of the gkpStore, run > > gatekeeper --revertclear OBTCHIMERA *gkpStore Did this: cp -r ..gkpStore copygkpStore cp ..gkpStore.err copygkpStore.err cp ..gkpStore.errorLog copygkpStore.errorLog cp ..gkpStore.fastqUIDmap copygkpStore.fastqUIDmap cp ..gkpStore.info copygkpStore.info export PATH=$PATH:/home/wgs_project/wgs/Linux-amd64/bin # gatekeeper --revertclear OBTCHIMERA copygkpStore > > This will restore the clear ranges to the state they had just after > trimming, and just before unitigging. > > Then a bunch of iterations of bogart: > > bogart -G *.gkpStore -O *.ovlStore -T e10.tigStore -o test.bogart -eg > 0.10 > -Eg 2.5 -em 0.10 -Em 2.5 > > Where the eg and em parameter is varied between 2 and 6 (percent > error). > By default, overlaps are generated to only 6% error, not that higher > would > be feasible with short reads. The Eg and Em parameters measure overlap > error as 'number of errors', to get around the problem of a 50-base > overlap > with one error resulting in 2% error. You can mostly ignore this for > the > higher error rates. Sorry, the wild card in that line is throwing me. Also I'm confused if you mean big Eg,Em (where 2.5 is in the range specified) or little eg,em (where values are not in that range). Given what I called the copy, is this what you want to run? VAL=2.5 #2.5 percent bogart -G copygkpStore -O copyovlStore -T e10.tigStore -o test.bogart \ -eg 0.10 -Eg $VAL -em 0.10 -Em $VAL tigStore -g copygkpStore -t e10.tigStore 1 -U -d sizes -s 800000000 VAL=3.0 #3.0 percent bogart -G copygkpStore -O copyovlStore -T e10.tigStore -o test.bogart \ -eg 0.10 -Eg $VAL -em 0.10 -Em $VAL tigStore -g copygkpStore -t e10.tigStore 1 -U -d sizes -s 800000000 # etc. The bogart command fails because "'copyovlStore' is not an ovelrapStore". Use the overlapStore from the first run in that command? (note the typo in the error message, that's what it says) Erase the e10.tigStore between runs? Do something to the overlapStore between runs? running tigStore on the original (not so useful) run gave this: tigStore -g ..gkpStore -t ..tigStore 1 -U -d sizes -s copygkpStore.info utgLenUnassigned n10 siz 528 sum 304316578 idx 479977 utgLenUnassigned n20 siz 400 sum 608633078 idx 1148939 utgLenUnassigned n30 siz 291 sum 912949618 idx 2026098 utgLenUnassigned n40 siz 179 sum 1217266213 idx 3353557 utgLenUnassigned n50 siz 150 sum 1521582630 idx 5307416 utgLenUnassigned n60 siz 145 sum 1825899170 idx 7367619 utgLenUnassigned n70 siz 126 sum 2130215760 idx 9584603 utgLenUnassigned n80 siz 122 sum 2434532234 idx 12033900 utgLenUnassigned n90 siz 102 sum 2738848751 idx 14689647 utgLenUnassigned sum 3043165239 (genomeSize 0) utgLenUnassigned num 18384123 utgLenUnassigned ave 165 tigLenSingleton n10 siz 150 sum 142617831 idx 907450 tigLenSingleton n20 siz 148 sum 285235697 idx 1865321 tigLenSingleton n30 siz 145 sum 427853436 idx 2837943 tigLenSingleton n40 siz 134 sum 570471289 idx 3850926 tigLenSingleton n50 siz 125 sum 713089018 idx 4969720 tigLenSingleton n60 siz 123 sum 855706883 idx 6116341 tigLenSingleton n70 siz 121 sum 998324590 idx 7282617 tigLenSingleton n80 siz 108 sum 1140942414 idx 8518814 tigLenSingleton n90 siz 87 sum 1283560221 idx 9981733 tigLenSingleton sum 1426177984 (genomeSize 0) tigLenSingleton num 11893391 tigLenSingleton ave 119 tigLenAssembled n10 siz 630 sum 161699171 idx 231237 tigLenAssembled n20 siz 517 sum 323397821 idx 516513 tigLenAssembled n30 siz 443 sum 485096301 idx 855316 tigLenAssembled n40 siz 389 sum 646795227 idx 1245703 tigLenAssembled n50 siz 335 sum 808493956 idx 1690952 tigLenAssembled n60 siz 266 sum 970192570 idx 2232349 tigLenAssembled n70 siz 205 sum 1131891234 idx 2921817 tigLenAssembled n80 siz 157 sum 1293589836 idx 3836637 tigLenAssembled n90 siz 136 sum 1455288608 idx 4933675 tigLenAssembled sum 1616987255 (genomeSize 0) tigLenAssembled num 6490732 tigLenAssembled ave 249 Presumably we want to see many more of the tigLenAssembled and fewer of the utgLenUnassigned and tigLenSingleton. Thanks, David Mathog ma...@ca... Manager, Sequence Analysis Facility, Biology Division, Caltech |
From: mathog <ma...@ca...> - 2015-01-21 00:03:14
|
On 20-Jan-2015 12:37, mathog wrote: > VAL=2.5 #2.5 percent > bogart -G copygkpStore -O copyovlStore -T e10.tigStore -o test.bogart \ > -eg 0.10 -Eg $VAL -em 0.10 -Em $VAL > tigStore -g copygkpStore -t e10.tigStore 1 -U -d sizes -s 800000000 Tried this: VAL=2.5 bogart -G copygkpStore -O ..ovlStore -T e10.tigStore -o test.bogart \ -eg 0.10 -Eg $VAL -em 0.10 -Em $VAL 2>&1 | tee bogart_25.log and it ran along happily until dropping dead here: ... OverlapCache()-- Loading overlap information: overlaps processed 4128333921 (098.08%) loaded 4128333921 (098.08%) (at read iid 152548896) OverlapCache()-- Loading overlap information: overlaps processed 4158431504 (098.79%) loaded 4158431504 (098.79%) (at read iid 153676157) OverlapCache()-- Loading overlap information: overlaps processed 4188318291 (099.50%) loaded 4188318291 (099.50%) (at read iid 154804535) OverlapCache()-- Loading overlap information: overlaps processed 4209225138 (100.00%) loaded 4209225138 (100.00%) setLogFile()-- Now logging to 'test.bogart.002.bestOverlapGraph' setLogFile()-- Now logging to 'test.bogart.004.ChunkGraph' setLogFile()-- Now logging to 'test.bogart.005.buildUnitigs' setLogFile()-- Now logging to 'test.bogart.006.placeContains' setLogFile()-- Now logging to 'test.bogart.007.placeZombies' setLogFile()-- Now logging to 'test.bogart.008.mergeSplitJoin' setLogFile()-- Now logging to 'test.bogart.009.popBubbles' setLogFile()-- Now logging to 'test.bogart.010.mergeSplitJoin' setLogFile()-- Now logging to 'test.bogart.011.cleanup' setLogFile()-- Now logging to 'test.bogart.012.setParentAndHang' setLogFile()-- Now logging to 'test.bogart.013.output' MultiAlignStore::openDB()-- Failed to open 'e10.tigStore/seqDB.v001.p1010.dat': Too many open files MultiAlignStore::openDB()-- Trying again. MultiAlignStore::openDB()-- Failed to open 'e10.tigStore/seqDB.v001.p1010.dat': Too many open files WARNING: open file 'test.bogart.013.output.thr000' Not suprisingly, tigStore wouldn't work with what was left: % tigStore -g copygkpStore -t e10.tigStore 1 -U -d sizes -s 800000000 MultiAlignStore::MultiAlignStore()-- ERROR, didn't find any unitigs or contigs in the store. MultiAlignStore::MultiAlignStore()-- asked for store 'e10.tigStore', correct? MultiAlignStore::MultiAlignStore()-- asked for version '1', correct? MultiAlignStore::MultiAlignStore()-- asked for partition unitig=0 contig=0, correct? MultiAlignStore::MultiAlignStore()-- asked for writable=0 inplace=0 append=0, correct? System information: % cat /etc/centos-release CentOS release 6.6 (Final) % ulimit unlimited % ulimit -n 1024 % cat /proc/sys/fs/file-max 52605611 The version of wgs is trunk downloaded and built on July 3, 2014. Suggestions? Thanks, David Mathog ma...@ca... Manager, Sequence Analysis Facility, Biology Division, Caltech |
From: Ludovic M. <lud...@un...> - 2015-01-21 08:40:56
|
on debian-like at least, be root: #setting the maximum number of file open sed -i 's/#<domain> <type> <item> <value>/#<domain> <type> <item> <value>\n\* soft nofile 65536\n#/' /etc/security/limits.conf though it might be tweaked for RH flavors. Should be run on every node. best, On 21/01/15 01:03, mathog wrote: > On 20-Jan-2015 12:37, mathog wrote: > >> VAL=2.5 #2.5 percent >> bogart -G copygkpStore -O copyovlStore -T e10.tigStore -o test.bogart \ >> -eg 0.10 -Eg $VAL -em 0.10 -Em $VAL >> tigStore -g copygkpStore -t e10.tigStore 1 -U -d sizes -s 800000000 > Tried this: > > VAL=2.5 > bogart -G copygkpStore -O ..ovlStore -T e10.tigStore -o test.bogart \ > -eg 0.10 -Eg $VAL -em 0.10 -Em $VAL 2>&1 | tee bogart_25.log > > and it ran along happily until dropping dead here: > > ... > OverlapCache()-- Loading overlap information: overlaps processed > 4128333921 (098.08%) loaded 4128333921 (098.08%) (at read iid > 152548896) > OverlapCache()-- Loading overlap information: overlaps processed > 4158431504 (098.79%) loaded 4158431504 (098.79%) (at read iid > 153676157) > OverlapCache()-- Loading overlap information: overlaps processed > 4188318291 (099.50%) loaded 4188318291 (099.50%) (at read iid > 154804535) > OverlapCache()-- Loading overlap information: overlaps processed > 4209225138 (100.00%) loaded 4209225138 (100.00%) > setLogFile()-- Now logging to 'test.bogart.002.bestOverlapGraph' > setLogFile()-- Now logging to 'test.bogart.004.ChunkGraph' > setLogFile()-- Now logging to 'test.bogart.005.buildUnitigs' > setLogFile()-- Now logging to 'test.bogart.006.placeContains' > setLogFile()-- Now logging to 'test.bogart.007.placeZombies' > setLogFile()-- Now logging to 'test.bogart.008.mergeSplitJoin' > setLogFile()-- Now logging to 'test.bogart.009.popBubbles' > setLogFile()-- Now logging to 'test.bogart.010.mergeSplitJoin' > setLogFile()-- Now logging to 'test.bogart.011.cleanup' > setLogFile()-- Now logging to 'test.bogart.012.setParentAndHang' > setLogFile()-- Now logging to 'test.bogart.013.output' > MultiAlignStore::openDB()-- Failed to open > 'e10.tigStore/seqDB.v001.p1010.dat': Too many open files > MultiAlignStore::openDB()-- Trying again. > MultiAlignStore::openDB()-- Failed to open > 'e10.tigStore/seqDB.v001.p1010.dat': Too many open files > WARNING: open file 'test.bogart.013.output.thr000' > > Not suprisingly, tigStore wouldn't work with what was left: > > % tigStore -g copygkpStore -t e10.tigStore 1 -U -d sizes -s 800000000 > MultiAlignStore::MultiAlignStore()-- ERROR, didn't find any unitigs or > contigs in the store. > MultiAlignStore::MultiAlignStore()-- asked for store > 'e10.tigStore', correct? > MultiAlignStore::MultiAlignStore()-- asked for version '1', > correct? > MultiAlignStore::MultiAlignStore()-- asked for partition unitig=0 > contig=0, correct? > MultiAlignStore::MultiAlignStore()-- asked for writable=0 > inplace=0 append=0, correct? > > System information: > % cat /etc/centos-release > CentOS release 6.6 (Final) > % ulimit > unlimited > % ulimit -n > 1024 > % cat /proc/sys/fs/file-max > 52605611 > > The version of wgs is trunk downloaded and built on July 3, 2014. > > Suggestions? > > Thanks, > > David Mathog > ma...@ca... > Manager, Sequence Analysis Facility, Biology Division, Caltech > > ------------------------------------------------------------------------------ > New Year. New Location. New Benefits. New Data Center in Ashburn, VA. > GigeNET is offering a free month of service with a new server in Ashburn. > Choose from 2 high performing configs, both with 100TB of bandwidth. > Higher redundancy.Lower latency.Increased capacity.Completely compliant. > http://p.sf.net/sfu/gigenet > _______________________________________________ > wgs-assembler-users mailing list > wgs...@li... > https://lists.sourceforge.net/lists/listinfo/wgs-assembler-users |
From: mathog <ma...@ca...> - 2015-01-21 18:51:55
|
On 21-Jan-2015 00:39, Ludovic Mallet wrote: > on debian-like at least, be root: > > #setting the maximum number of file open > sed -i 's/#<domain> <type> <item> <value>/#<domain> <type> > <item> <value>\n\* soft nofile > 65536\n#/' > /etc/security/limits.conf > > though it might be tweaked for RH flavors. > Should be run on every node. Added to limits.conf: mathog hard nofiles 60000 mathog soft nofiles 60000 logged out, logged back in, and saw: % ulimit -Sn 1024 % ulimit -Hn 4096 % ulimit -n 1024 % ulimit -n 60000 bash: ulimit: open files: cannot modify limit: Operation not permitted % ulimit -n 4096 % ulimit -n 4096 The 4096 limit seems to be coming from /proc/1/limits which has: Max open files 1024 4096 files root can set ulimit -n as high as it wants, while also running in bash. Not sure where the 4096 being applied to normal processes is coming from. Regards, David Mathog ma...@ca... Manager, Sequence Analysis Facility, Biology Division, Caltech |