I recently brought up a new head node and installed the slurm Rocks roll (release-7.0.0-19.05.02). Things were working, but node configs seemed out of sync. The nodes are dual AMD EPYC 7451 24-Core, with SMT on.
Originally they were configured as 96 CPU.
I ran
rocks report slurm_hwinfo| sh
rocks sync slurm
This configured them as 8 socket, 6 core, 2 thread.
I recently brought up a new head node and installed the slurm Rocks roll (release-7.0.0-19.05.02). Things were working, but node configs seemed out of sync. The nodes are dual AMD EPYC 7451 24-Core, with SMT on.
Originally they were configured as 96 CPU.
I ran
rocks report slurm_hwinfo| sh
rocks sync slurm
This configured them as 8 socket, 6 core, 2 thread.
epyc-compute-1-1 1 CLUSTER drained 96 8:6:2 128587 197999 205757 rack-1,9 Low socketcore coun
They all show up as Low socket*core count and I can no longer run jobs.
$ sinfo -R
REASON USER TIMESTAMP NODELIST
Low socket*core coun root 2019-11-11T21:52:11 epyc-compute-1-[1-5]
Last edit: Raymond Muno 2019-11-12