I do not know if the way JDFTx is running is efficient right now. I have a total of 48 processors available of which currently 40 are in use and 8 are ideally idle. When I run JDFTx on 16 processors, always, few processors consume the idle processors' resource (here 8 (48-40)) as a result of hyper-threading? Is it an optimal run? Can I instruct JDFTx to use explicitly ~100% CPU so that idle processors are not exploited and how will it affect the present way of execution?
JDFTx will try to use all cores on the system unless otherwise specified. If you run in MPI mode, it will divide up the cores equally amongst all the processes, so it won't overcommit if that is the only job running on a node (which is usually the case in HPC clusters; hence the behaviour).
However, it won't (and can't in any easy way) account for other jobs running on the node. SO if you want it to share resources with other programs you need to instruct it to explicitly use a certain number of threads per process using the -c option on the commandline.
Shankar
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I do not know if the way JDFTx is running is efficient right now. I have a total of 48 processors available of which currently 40 are in use and 8 are ideally idle. When I run JDFTx on 16 processors, always, few processors consume the idle processors' resource (here 8 (48-40)) as a result of hyper-threading? Is it an optimal run? Can I instruct JDFTx to use explicitly ~100% CPU so that idle processors are not exploited and how will it affect the present way of execution?
top - 01:06:33 up 76 days, 16:02, 3 users, load average: 57.76, 52.38, 45.51
Tasks: 533 total, 41 running, 492 sleeping, 0 stopped, 0 zombie
Cpu(s): 61.1%us, 2.3%sy, 0.0%ni, 36.7%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 132023948k total, 70955632k used, 61068316k free, 445668k buffers
Swap: 101775356k total, 38280k used, 101737076k free, 35611156k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
28730 satish 20 0 1170m 323m 11m R 261 0.3 9:07.46 jdftx
28733 satish 20 0 1169m 315m 12m R 242 0.2 9:00.96 jdftx
28743 satish 20 0 1174m 327m 11m R 208 0.3 9:09.87 jdftx
28731 satish 20 0 1169m 322m 12m R 194 0.3 9:06.87 jdftx
28736 satish 20 0 1169m 322m 11m R 192 0.3 9:12.82 jdftx
28734 satish 20 0 1168m 318m 12m R 177 0.2 9:10.45 jdftx
28738 satish 20 0 1169m 320m 11m R 168 0.2 9:06.96 jdftx
28728 satish 20 0 1169m 319m 11m R 166 0.2 9:10.23 jdftx
28741 satish 20 0 1169m 322m 12m R 164 0.3 9:11.74 jdftx
28739 satish 20 0 1169m 319m 12m R 151 0.2 9:13.06 jdftx
34360 user1111 20 0 1493m 1.2g 44m R 101 0.9 22947:17 cp2k
34365 user1111 20 0 1575m 1.3g 51m R 101 1.0 22947:48 cp2k
34367 user1111 20 0 1511m 1.2g 48m R 101 1.0 22947:43 cp2k
34368 user1111 20 0 1549m 1.2g 46m R 101 1.0 22945:57 cp2k
34375 user1111 20 0 1462m 1.2g 47m R 101 0.9 22947:01 cp2k
34377 user1111 20 0 1549m 1.2g 46m R 101 1.0 22947:03 cp2k
28737 satish 20 0 925m 53m 11m R 99 0.0 3:46.54 jdftx
34354 user1111 20 0 1470m 1.2g 50m R 99 0.9 22948:43 cp2k
34355 user1111 20 0 1511m 1.2g 53m R 99 1.0 22947:20 cp2k
34356 user1111 20 0 1542m 1.2g 51m R 99 1.0 22947:22 cp2k
34358 user1111 20 0 1496m 1.2g 53m R 99 1.0 22948:08 cp2k
34359 user1111 20 0 1477m 1.2g 49m R 99 0.9 22948:21 cp2k
34361 user1111 20 0 1557m 1.3g 44m R 99 1.0 22947:02 cp2k
34362 user1111 20 0 1535m 1.2g 47m R 99 1.0 22948:25 cp2k
34363 user1111 20 0 1523m 1.2g 53m R 99 1.0 22947:51 cp2k
34364 user1111 20 0 1460m 1.2g 47m R 99 0.9 22948:05 cp2k
34366 user1111 20 0 1543m 1.3g 51m R 99 1.0 22949:08 cp2k
34369 user1111 20 0 1460m 1.2g 47m R 99 0.9 22947:15 cp2k
34370 user1111 20 0 1553m 1.3g 49m R 99 1.0 22945:46 cp2k
34371 user1111 20 0 1582m 1.3g 43m R 99 1.0 22946:50 cp2k
34372 user1111 20 0 1534m 1.2g 52m R 99 1.0 22946:35 cp2k
34373 user1111 20 0 1535m 1.2g 51m R 99 1.0 22948:29 cp2k
34374 user1111 20 0 1473m 1.2g 43m R 99 0.9 22947:20 cp2k
34376 user1111 20 0 1552m 1.3g 55m R 99 1.0 22918:07 cp2k
34357 user1111 20 0 1523m 1.2g 54m R 97 1.0 22947:52 cp2k
28742 satish 20 0 925m 53m 11m R 84 0.0 3:45.22 jdftx
28735 satish 20 0 925m 54m 11m R 82 0.0 3:44.55 jdftx
28732 satish 20 0 925m 53m 11m R 72 0.0 3:47.06 jdftx
28729 satish 20 0 907m 35m 11m R 63 0.0 3:47.02 jdftx
28740 satish 20 0 916m 43m 11m R 50 0.0 3:47.09 jdftx
29280 satish 20 0 17600 1632 932 R 6 0.0 0:00.04 top
Following is cpuinfo of 1 of 48 processors.
processor : 0
vendor_id : AuthenticAMD
cpu family : 16
model : 9
model name : AMD Opteron(tm) Processor 6168
stepping : 1
microcode : 0x10000c4
cpu MHz : 1900.100
cache size : 512 KB
physical id : 0
siblings : 12
core id : 0
cpu cores : 12
apicid : 0
initial apicid : 0
fpu : yes
fpu_exception : yes
cpuid level : 5
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc rep_good nopl nonstop_tsc extd_apicid amd_dcm pni monitor cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt nodeid_msr hw_pstate npt lbrv svm_lock nrip_save pausefilter
bogomips : 3800.20
TLB size : 1024 4K pages
clflush size : 64
cache_alignment : 64
address sizes : 48 bits physical, 48 bits virtual
power management: ts ttp tm stc 100mhzsteps hwpstate
Thank you.
JDFTx will try to use all cores on the system unless otherwise specified. If you run in MPI mode, it will divide up the cores equally amongst all the processes, so it won't overcommit if that is the only job running on a node (which is usually the case in HPC clusters; hence the behaviour).
However, it won't (and can't in any easy way) account for other jobs running on the node. SO if you want it to share resources with other programs you need to instruct it to explicitly use a certain number of threads per process using the -c option on the commandline.
Shankar