cpu high monitor free download

PowerInfer

High-speed Large Language Model Serving for Local Deployment

PowerInfer is a high-performance inference engine designed to run large language models efficiently on personal computers equipped with consumer-grade GPUs. The project focuses on improving the performance of local AI inference by optimizing how neural network computations are distributed between CPU and GPU resources. Its architecture exploits the observation that only a subset of neurons in large models are frequently activated, allowing the system to preload frequently used neurons into GPU memory while processing less common activations on the CPU.

Downloads: 0 This Week

Last Update: 2026-05-11

See Project

llama.cpp

LLM inference in C/C++

llama.cpp is a high-performance C and C++ project for running large language models locally and in the cloud with minimal setup. It is built around efficient inference, broad hardware support, and the GGUF model format. The project supports many model families and has become a major foundation for local AI tools, model serving, and embedded inference workflows.

Downloads: 12 This Week

Last Update: 11 hours ago

See Project

Mosec

A high-performance ML model serving framework, offers dynamic batching

Mosec is a high-performance and flexible model-serving framework for building ML model-enabled backend and microservices. It bridges the gap between any machine learning models you just trained and the efficient online service API.

Downloads: 0 This Week

Last Update: 2026-04-15

See Project

node-llama-cpp

Run AI models locally on your machine with node.js bindings for llama

...The system automatically detects the available hardware on a machine and selects the most appropriate compute backend, including CPU or GPU acceleration. Developers can use the library to perform tasks such as text generation, conversational chat, embedding generation, and structured output generation. Because it runs models locally, the platform is particularly useful for privacy-sensitive environments or offline AI deployments.

Downloads: 8 This Week

Last Update: 2026-03-17

See Project

GPU Hot

Real-time NVIDIA GPU dashboard

GPU Hot is an open-source, lightweight monitoring dashboard designed to provide real-time visibility into NVIDIA GPU performance across single machines or entire clusters. The project offers a self-hosted web interface that streams hardware metrics directly from GPU servers, enabling developers, ML engineers, and system administrators to observe GPU utilization and system behavior in real time through a browser. The dashboard collects and displays a wide range of performance metrics...

Downloads: 1 This Week

Last Update: 2026-05-28

See Project

Infinity

Low-latency REST API for serving text-embeddings

Infinity is a high-throughput, low-latency REST API for serving vector embeddings, supporting all sentence-transformer models and frameworks. Infinity is developed under MIT License. Infinity powers inference behind Gradient.ai and other Embedding API providers.

Downloads: 0 This Week

Last Update: 2025-08-22

See Project

Chinese-LLaMA-Alpaca-3

Chinese Llama-3 LLMs) developed from Meta Llama 3

...It includes scripts and tooling that let researchers or developers run training, fine-tuning, quantization, and deployment on local machines (CPU or GPU), making experimentation and testing accessible without requiring large clusters.

Downloads: 0 This Week

Last Update: 2026-01-15

See Project

Search Results for "cpu high monitor"

Showing 7 open source projects for "cpu high monitor"

PowerInfer

llama.cpp

Mosec

node-llama-cpp

GPU Hot

Infinity

Chinese-LLaMA-Alpaca-3

Search Results for "cpu high monitor"

Showing 7 open source projects for "cpu high monitor"

PowerInfer

llama.cpp

Mosec

node-llama-cpp

GPU Hot

Infinity

Chinese-LLaMA-Alpaca-3

Related Searches

Related Categories