Raymond Muno - 2019-11-12

I recently brought up a new head node and installed the slurm Rocks roll (release-7.0.0-19.05.02). Things were working, but node configs seemed out of sync. The nodes are dual AMD EPYC 7451 24-Core, with SMT on.
Originally they were configured as 96 CPU.

I ran

rocks report slurm_hwinfo| sh
rocks sync slurm

This configured them as 8 socket, 6 core, 2 thread.

epyc-compute-1-1 1 CLUSTER drained 96 8:6:2 128587 197999 205757 rack-1,9 Low socketcore coun

They all show up as Low socket*core count and I can no longer run jobs.

$ sinfo -R
REASON USER TIMESTAMP NODELIST
Low socket*core coun root 2019-11-11T21:52:11 epyc-compute-1-[1-5]

 

Last edit: Raymond Muno 2019-11-12