cpu memory usage free download

Showing 140 open source projects for "cpu memory usage"

View related business solutions

Artificial Intelligence Windows Clear Filters & Widen Search

Go From AI Idea to AI App Fast
One platform to build, fine-tune, and deploy ML models. No MLOps team required.

Access Gemini 3 and 200+ models. Build chatbots, agents, or custom models with built-in monitoring and scaling.

Try Free
Auth0 B2B Essentials: SSO, MFA, and RBAC Built In
Unlimited organizations, 3 enterprise SSO connections, role-based access control, and pro MFA included. Dev and prod tenants out of the box.

Auth0's B2B Essentials plan gives you everything you need to ship secure multi-tenant apps. Unlimited orgs, enterprise SSO, RBAC, audit log streaming, and higher auth and API limits included. Add on M2M tokens, enterprise MFA, or additional SSO connections as you scale.

Sign Up Free
1

FastSD CPU

Fast stable diffusion on CPU and AI PC

...The repository contains multiple interfaces including a desktop GUI for simple generation, an advanced web-based UI with support for extensions like LoRA and ControlNet, and a command-line interface for scripted usage or server deployments. With support for performance-oriented libraries such as OpenVINO and hardware acceleration on platforms like Intel AI PCs, FastSD CPU aims to shrink generation times dramatically compared with naive CPU implementations.

Downloads: 43 This Week

Last Update: 2026-05-02
See Project
2

CTranslate2

Fast inference engine for Transformer models

...The project implements a custom runtime that applies many performance optimization techniques such as weights quantization, layers fusion, batch reordering, etc., to accelerate and reduce the memory usage of Transformer models on CPU and GPU. The execution is significantly faster and requires less resources than general-purpose deep learning frameworks on supported models and tasks thanks to many advanced optimizations: layer fusion, padding removal, batch reordering, in-place operations, caching mechanism, etc. The model serialization and computation support weights with reduced precision: 16-bit floating points (FP16), 16-bit integers (INT16), and 8-bit integers (INT8). ...

Downloads: 9 This Week

Last Update: 2026-05-19
See Project
3

whisper.cpp

Port of OpenAI's Whisper model in C/C++

whisper.cpp is a lightweight, C/C++ reimplementation of OpenAI’s Whisper automatic speech recognition (ASR) model—designed for efficient, standalone transcription without external dependencies. The entire high-level implementation of the model is contained in whisper.h and whisper.cpp. The rest of the code is part of the ggml machine learning library. The command downloads the base.en model converted to custom ggml format and runs the inference on all .wav samples in the folder samples....

Downloads: 450 This Week

Last Update: 2026-03-19
See Project
4

PowerInfer

High-speed Large Language Model Serving for Local Deployment

PowerInfer is a high-performance inference engine designed to run large language models efficiently on personal computers equipped with consumer-grade GPUs. The project focuses on improving the performance of local AI inference by optimizing how neural network computations are distributed between CPU and GPU resources. Its architecture exploits the observation that only a subset of neurons in large models are frequently activated, allowing the system to preload frequently used neurons into GPU memory while processing less common activations on the CPU. This hybrid execution strategy significantly reduces memory bottlenecks and improves overall inference speed. ...

Downloads: 0 This Week

Last Update: 2026-05-11
See Project
Earn up to 16% annual interest with Nexo.
Let your crypto work for you

Put idle assets to work with competitive interest rates, borrow without selling, and trade with precision. All in one platform. Geographic restrictions, eligibility, and terms apply.

Get started with Nexo.
5

MCP Monitor

A system monitoring tool that exposes system metrics

The MCP System Monitor is a tool that exposes system metrics via the Model Context Protocol (MCP), allowing Large Language Models (LLMs) to retrieve real-time system information through an MCP-compatible interface.

Downloads: 0 This Week

Last Update: 2025-08-02
See Project
6

MegEngine

Easy-to-use deep learning framework with 3 key features

...After training, just put everything into your model and inference it on any platform at ease. Speed and precision problems won't bother you anymore due to the same core inside. In training, GPU memory usage could go down to one-third at the cost of only one additional line, which enables the DTR algorithm. Gain the lowest memory usage when inferencing a model by leveraging our unique pushdown memory planner. NOTE: MegEngine now supports Python installation on Linux-64bit/Windows-64bit/MacOS(CPU-Only)-10.14+/Android 7+(CPU-Only) platforms with Python from 3.5 to 3.8. ...

Downloads: 3 This Week

Last Update: 2024-04-30
See Project
7

FlexLLMGen

Running large language models on a single GPU

...The system focuses on high-throughput generation workloads where large batches of text must be processed quickly, such as large-scale data extraction or document analysis tasks. Instead of requiring expensive multi-GPU systems, the framework uses techniques such as memory offloading, compression, and optimized batching to run large models on commodity hardware. The architecture distributes computation and memory usage across the GPU, CPU, and disk in order to maximize the number of tokens processed during inference. This design allows organizations to deploy powerful language models for high-volume tasks without the infrastructure costs typically associated with large-scale AI systems. ...

Downloads: 0 This Week

Last Update: 2026-03-10
See Project
8

PicoClaw

Ultra-Efficient AI Assistant in Go

PicoClaw is an ultra-lightweight, open-source personal AI assistant written in Go, architected from the ground up to operate with extremely low memory usage (under 10 MB) and fast boot times, making it suitable for inexpensive hardware platforms and embedded devices. Inspired by earlier AI assistant projects like “nanobot,” it was refactored to emphasize resource efficiency while still supporting meaningful AI-driven interactions such as conversational workflows, planning tasks, and automation. ...

Downloads: 11 This Week

Last Update: 2026-04-30
See Project
9

GPU Hot

Real-time NVIDIA GPU dashboard

...The project offers a self-hosted web interface that streams hardware metrics directly from GPU servers, enabling developers, ML engineers, and system administrators to observe GPU utilization and system behavior in real time through a browser. The dashboard collects and displays a wide range of performance metrics including temperature, memory usage, power consumption, clock speeds, fan speed, and active processes. It can scale from monitoring a single GPU workstation to large distributed environments with dozens or even hundreds of GPUs by running lightweight containers on each node and aggregating the data centrally.

Downloads: 3 This Week

Last Update: 2026-04-11
See Project
Forever Free Full-Stack Observability | Grafana Cloud
Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.

Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.

Create free account
10

shimmy

Python-free Rust inference server

...This compatibility enables developers to replace remote AI services with locally hosted models while keeping their existing software architecture intact. Shimmy focuses on performance and simplicity, using efficient runtime components to minimize memory usage and startup time compared to heavier inference frameworks. It supports modern model formats such as GGUF and SafeTensors and can automatically discover models stored locally or in common directories used by other AI tools. Advanced capabilities include CPU offloading for Mixture-of-Experts models and GPU acceleration, enabling large models to run on consumer hardware with limited VRAM.

Downloads: 5 This Week

Last Update: 1 day ago
See Project
11

LuxTTS

A high-quality rapid TTS voice cloning model

...Its design emphasizes efficiency and practicality, fitting within modest GPU memory footprints.

Downloads: 4 This Week

Last Update: 2026-03-12
See Project
12

Claude-Mem

Claude Code plugin that automatically captures everything Claude does

Claude-Mem is a persistent memory compression system built specifically for Claude Code to preserve context across coding sessions. It automatically captures Claude’s tool usage, observations, and decisions, then compresses them into semantic memories that carry forward into future sessions. By enabling long-term continuity, Claude-Mem helps Claude “remember” project history, past fixes, and prior reasoning even after restarts or reconnects.

Downloads: 7 This Week

Last Update: 7 days ago
See Project
13

AirLLM

AirLLM 70B inference with single 4GB GPU

AirLLM is an open source Python library that enables extremely large language models to run on consumer hardware with very limited GPU memory. The project addresses one of the main barriers to local LLM experimentation by introducing a memory-efficient inference technique that loads model layers sequentially rather than storing the entire model in GPU memory. This layer-wise inference approach allows models with tens of billions of parameters to run on devices with only a few gigabytes of...

Downloads: 6 This Week

Last Update: 2026-03-10
See Project
14

HunyuanVideo

HunyuanVideo: A Systematic Framework For Large Video Generation Model

...The framework aims to push the boundaries of video generation quality, incorporating multiple innovative approaches to improve the realism and coherence of the generated content. Release of FP8 model weights to reduce GPU memory usage / improve efficiency. Parallel inference code to speed up sampling, utilities and tests included.

1 Review

Downloads: 4 This Week

Last Update: 2025-09-23
See Project
15

Model Explorer

A modern model graph visualizer and debugger

Model Explorer is a visual tool for exploring, debugging, and optimizing ML models deployed on edge devices. Developed by Google AI Edge, it offers a browser-based interface to inspect layer-wise performance, memory usage, and inference timing of TensorFlow Lite and other supported models. It’s a powerful utility for developers optimizing models for constrained environments.

Downloads: 0 This Week

Last Update: 2026-02-09
See Project
16

Hello-Agents

Building an Intelligent Agent from Scratch

Hello Agents is an open educational project designed to teach developers how to understand, design, and build AI-native agents from the ground up through structured tutorials and practical examples. The project focuses on guiding learners beyond superficial framework usage toward deeper comprehension of agent architecture, reasoning loops, and real-world implementation patterns. It walks users through core concepts such as ReAct-style reasoning, tool usage, memory handling, and multi-step task execution, enabling hands-on experimentation with modern LLM-powered agent systems. The repository is structured as a progressive learning path, combining theory, exercises, and runnable code so users can incrementally build more capable agents. ...

Downloads: 2 This Week

Last Update: 2026-02-25
See Project
17

Hermes Web UI

The best way to use Hermes Agent from the web or from your phone

...It offers a clean, multi-panel layout that includes chat interaction, session management, and workspace file browsing. The interface allows users to manage agent sessions, configure models, and interact with persistent memory systems directly from a web environment. It is built using simple technologies like Python and vanilla JavaScript, avoiding complex frontend frameworks. The UI supports real-time interaction, context tracking, and visualization of token usage. It connects to a self-hosted agent that continuously learns and evolves over time. The project emphasizes usability, accessibility, and seamless integration with existing workflows.

Downloads: 12 This Week

Last Update: 5 hours ago
See Project
18

Pocket TTS

A TTS that fits in your CPU (and pocket)

Pocket TTS is a lightweight text-to-speech project designed to run efficiently on CPUs, targeting developers who want local speech generation without depending on GPUs or hosted web APIs. It is built to feel practical in everyday applications, where installation and usage should be as simple as adding a dependency and calling a function. The project focuses on keeping the runtime footprint manageable while still producing natural-sounding speech, which makes it attractive for offline tools, prototypes, and privacy-sensitive workflows. Because it is CPU-oriented, it fits well in server environments where GPU access is limited, in desktop apps, or in edge deployments where simplicity matters more than maximum throughput. ...

Downloads: 13 This Week

Last Update: 2026-05-04
See Project
19

NullClaw

Fastest, smallest, and fully autonomous AI assistant infrastructure

NullClaw is the smallest fully autonomous AI assistant infrastructure, built entirely in Zig as a single static binary with zero runtime dependencies. At just 678 KB with ~1 MB peak RAM usage, it boots in under 2 milliseconds and runs on virtually any hardware, including low-cost ARM boards. Despite its size, it delivers a complete AI stack with 22+ model providers, 18+ communication channels, integrated tools, hybrid memory, and sandboxed runtime support. Its architecture is fully modular, using vtable interfaces that allow providers, channels, tools, memory backends, and runtimes to be swapped without code changes. ...

Downloads: 5 This Week

Last Update: 2026-05-04
See Project
20

Kitten TTS

State-of-the-art TTS model under 25MB

KittenTTS is an open-source, ultra-lightweight, and high-quality text-to-speech model featuring just 15 million parameters and a binary size under 25 MB. It is designed for real-time CPU-based deployment across diverse platforms. Ultra-lightweight, model size less than 25MB. CPU-optimized, runs without GPU on any device. High-quality voices, several premium voice options available. Fast inference, optimized for real-time speech synthesis.

Downloads: 10 This Week

Last Update: 2026-02-24
See Project
21

Magika

Fast and accurate AI powered file content types detection

...It also emphasizes reproducibility and developer ergonomics with clear install and usage instructions for common platforms. A public site complements the repo with background, examples, and guidance for integrating Magika into existing workflows.

Downloads: 1 This Week

Last Update: 2026-04-24
See Project
22

whisper-timestamped

Multilingual Automatic Speech Recognition with word-level timestamps

Multilingual Automatic Speech Recognition with word-level timestamps and confidence. Whisper is a set of multi-lingual, robust speech recognition models trained by OpenAI that achieve state-of-the-art results in many languages. Whisper models were trained to predict approximate timestamps on speech segments (most of the time with 1-second accuracy), but they cannot originally predict word timestamps. This repository proposes an implementation to predict word timestamps and provide a more...

Downloads: 1 This Week

Last Update: 2025-09-09
See Project
23

bitsandbytes

Accessible large language models via k-bit quantization for PyTorch

bitsandbytes is an open-source library designed to make training and inference of large neural networks more efficient by dramatically reducing memory usage. Built primarily for the PyTorch ecosystem, the library introduces advanced quantization techniques that allow models to operate using reduced numerical precision while maintaining high accuracy. These optimizations enable large language models and other deep learning architectures to run on hardware with limited memory resources, including consumer-grade GPUs. ...

Downloads: 2 This Week

Last Update: 2026-03-04
See Project
24

MemU

MemU is an open-source memory framework for AI companions

MemU is an agentic memory layer for LLM applications, specifically designed for AI companions. Transform your memory into an intelligent file system that automatically organizes, connects, and evolves with your memories. Simple, fast, and reliable memory infrastructure for AI applications. Powerful tools and dedicated support to scale your AI applications with confidence.

Downloads: 1 This Week

Last Update: 2026-03-23
See Project
25

bitnet.cpp

Official inference framework for 1-bit LLMs

bitnet.cpp is the official open-source inference framework and ecosystem designed to enable ultra-efficient execution of 1-bit large language models (LLMs), which quantize most model parameters to ternary values (-1, 0, +1) while maintaining competitive performance with full-precision counterparts. At its core is bitnet.cpp, a highly optimized C++ backend that supports fast, low-memory inference on both CPUs and GPUs, enabling models such as BitNet b1.58 to run without requiring enormous...

Downloads: 8 This Week

Last Update: 2026-03-10
See Project