GPU Hot is an open-source, lightweight monitoring dashboard designed to provide real-time visibility into NVIDIA GPU performance across single machines or entire clusters. The project offers a self-hosted web interface that streams hardware metrics directly from GPU servers, enabling developers, ML engineers, and system administrators to observe GPU utilization and system behavior in real time through a browser. The dashboard collects and displays a wide range of performance metrics including temperature, memory usage, power consumption, clock speeds, fan speed, and active processes. It can scale from monitoring a single GPU workstation to large distributed environments with dozens or even hundreds of GPUs by running lightweight containers on each node and aggregating the data centrally.
Features
- Real-time GPU monitoring with sub-second metric updates
- Automatic detection and visualization of multiple GPUs on a system
- Cluster-wide aggregation to monitor dozens or hundreds of GPU nodes from one dashboard
- Historical charts for utilization, temperature, power draw, and clock speeds
- Process-level monitoring showing active GPU workloads and memory consumption
- Integrated system metrics including CPU usage, RAM consumption, and network statistics