Showing 142 open source projects for "cpu memory usage"

View related business solutions
  • Enterprise-grade ITSM, for every business Icon
    Enterprise-grade ITSM, for every business

    Give your IT, operations, and business teams the ability to deliver exceptional services—without the complexity.

    Freshservice is an intuitive, AI-powered platform that helps IT, operations, and business teams deliver exceptional service without the usual complexity. Automate repetitive tasks, resolve issues faster, and provide seamless support across the organization. From managing incidents and assets to driving smarter decisions, Freshservice makes it easy to stay efficient and scale with confidence.
    Try it Free
  • Build Securely on AWS with Proven Frameworks Icon
    Build Securely on AWS with Proven Frameworks

    Lay a foundation for success with Tested Reference Architectures developed by Fortinet’s experts. Learn more in this white paper.

    Moving to the cloud brings new challenges. How can you manage a larger attack surface while ensuring great network performance? Turn to Fortinet’s Tested Reference Architectures, blueprints for designing and securing cloud environments built by cybersecurity experts. Learn more and explore use cases in this white paper.
    Download Now
  • 1
    Pocket TTS

    Pocket TTS

    A TTS that fits in your CPU (and pocket)

    Pocket TTS is a lightweight text-to-speech project designed to run efficiently on CPUs, targeting developers who want local speech generation without depending on GPUs or hosted web APIs. It is built to feel practical in everyday applications, where installation and usage should be as simple as adding a dependency and calling a function. The project focuses on keeping the runtime footprint manageable while still producing natural-sounding speech, which makes it attractive for offline tools, prototypes, and privacy-sensitive workflows. Because it is CPU-oriented, it fits well in server environments where GPU access is limited, in desktop apps, or in edge deployments where simplicity matters more than maximum throughput. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 2
    xLSTM

    xLSTM

    Neural Network architecture based on ideas of the original LSTM

    ...The architecture aims to provide competitive performance with transformer-based models while maintaining advantages such as linear computational scaling and efficient memory usage for long sequences. Researchers have demonstrated that xLSTM models can scale to billions of parameters and large training datasets while maintaining efficient inference speeds.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    llmfit

    llmfit

    157 models, 30 providers, one command to find what runs on hardware

    llmfit is a terminal-based utility that helps developers determine which large language models can realistically run on their local hardware by analyzing system resources and model requirements. The tool automatically detects CPU, RAM, GPU, and VRAM specifications, then ranks available models based on performance factors such as speed, quality, and memory fit. It provides both an interactive terminal user interface and a traditional CLI mode, enabling flexible workflows for different user preferences. llmfit also supports advanced configurations including multi-GPU setups, mixture-of-experts architectures, and dynamic quantization recommendations. ...
    Downloads: 27 This Week
    Last Update:
    See Project
  • 4
    ZeroClaw

    ZeroClaw

    Fast, small, and fully autonomous AI assistant infrastructure

    ZeroClaw is a Rust-native autonomous AI agent framework engineered for teams and developers who need highly efficient, secure, and modular AI automation infrastructure that can run reliably in both production and self-hosted environments. It is designed around a trait-based architecture so that model providers, communication channels, memory systems, and tooling integrations can be swapped or extended without rewriting core components, giving engineers flexibility and long-term...
    Downloads: 10 This Week
    Last Update:
    See Project
  • Full-stack observability with actually useful AI | Grafana Cloud Icon
    Full-stack observability with actually useful AI | Grafana Cloud

    Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.

    Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.
    Create free account
  • 5
    MemU

    MemU

    MemU is an open-source memory framework for AI companions

    MemU is an agentic memory layer for LLM applications, specifically designed for AI companions. Transform your memory into an intelligent file system that automatically organizes, connects, and evolves with your memories. Simple, fast, and reliable memory infrastructure for AI applications. Powerful tools and dedicated support to scale your AI applications with confidence.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    PicoLM

    PicoLM

    Run a 1-billion parameter LLM on a $10 board with 256MB RAM

    PicoLM is an open-source inference framework designed to run large language models on extremely constrained hardware environments such as inexpensive single-board computers and embedded systems. The project focuses on enabling efficient local inference by optimizing memory usage, computation, and system dependencies so that relatively large models can operate on devices with minimal RAM. It is written primarily in C and designed with a minimalist architecture that removes unnecessary dependencies and external libraries. The runtime is capable of running language models with billions of parameters on devices with only a few hundred megabytes of memory, which is significantly lower than typical LLM infrastructure requirements. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 7
    Hermes Desktop

    Hermes Desktop

    Desktop Companion for Hermes Agent

    Hermes Desktop is a native desktop application that simplifies the installation, configuration, and daily use of Hermes Agent, an AI assistant with advanced tool usage and self-improving capabilities. It provides an intuitive graphical interface that eliminates the need to manage Hermes through command-line tools. Users can connect to local or remote Hermes instances while accessing chat, memory management, skills, tools, profiles, and scheduling from a single workspace. The platform supports multiple AI providers, including OpenAI, Anthropic, Google Gemini, Grok, OpenRouter, and local model endpoints. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    OpenAI CS Agents Demo

    OpenAI CS Agents Demo

    Demo of a customer service use case implemented with the OpenAI Agents

    ...It also demonstrates guardrails to validate or constrain responses, memory usage to maintain context, and tracing to help debugging of workflows.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    Tencent-Hunyuan-Large

    Tencent-Hunyuan-Large

    Open-source large language model family from Tencent Hunyuan

    ...It is designed with long-context capabilities, quantization support, and high performance on benchmarks across general reasoning, mathematics, language understanding, and Chinese / multilingual tasks. It aims to provide competitive capability with efficient deployment and inference. FP8 quantization support to reduce memory usage (~50%) while maintaining precision. High benchmarking performance on tasks like MMLU, MATH, CMMLU, C-Eval, etc.
    Downloads: 2 This Week
    Last Update:
    See Project
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • 10
    bitsandbytes

    bitsandbytes

    Accessible large language models via k-bit quantization for PyTorch

    bitsandbytes is an open-source library designed to make training and inference of large neural networks more efficient by dramatically reducing memory usage. Built primarily for the PyTorch ecosystem, the library introduces advanced quantization techniques that allow models to operate using reduced numerical precision while maintaining high accuracy. These optimizations enable large language models and other deep learning architectures to run on hardware with limited memory resources, including consumer-grade GPUs. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    R-KV

    R-KV

    Redundancy-aware KV Cache Compression for Reasoning Models

    R-KV is an open-source research project that focuses on improving the efficiency of large language model inference through key-value cache compression techniques. Modern transformer models rely heavily on KV caches during autoregressive decoding, which store intermediate attention states to accelerate generation. However, these caches can consume large amounts of memory, especially in reasoning-oriented models with long context windows. R-KV introduces a method for compressing the KV cache...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    AI Agent Deep Dive

    AI Agent Deep Dive

    AI Agent Source Code Deep Research Report

    AI Agent Deep Dive is a comprehensive educational repository designed to provide a deep and structured understanding of how modern AI agents work, focusing on architecture, workflows, and real-world implementation patterns. It breaks down complex concepts such as planning, tool usage, memory management, and multi-step reasoning into digestible explanations and practical examples. The project is organized as a learning resource rather than a standalone framework, making it particularly useful for developers who want to move beyond surface-level prompt engineering into full agent system design. It explores how agents interact with environments, execute tasks, and maintain context over time, highlighting both strengths and limitations of current approaches. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    Build Your Own OpenClaw

    Build Your Own OpenClaw

    A step-by-step guide to build your own AI agent

    Build Your Own OpenClaw is a step-by-step educational framework that teaches developers how to construct a fully functional AI agent system from scratch, gradually evolving from a simple chat loop into a multi-agent, production-ready architecture. The project is structured into 18 progressive stages, each introducing a new concept such as tool usage, memory persistence, event-driven design, and multi-agent coordination, with each step including both explanatory documentation and runnable code. It begins with foundational concepts like conversational loops and tool integration, then expands into more advanced capabilities such as dynamic skill loading, web interaction, and context management. ...
    Downloads: 7 This Week
    Last Update:
    See Project
  • 14
    LMCache

    LMCache

    Supercharge Your LLM with the Fastest KV Cache Layer

    LMCache is an extension layer for LLM serving engines that accelerates inference, especially with long contexts, by storing and reusing key-value (KV) attention caches across requests. Instead of rebuilding KV states for repeated or shared text segments, LMCache persists and retrieves them from multiple tiers—GPU memory, CPU DRAM, and local disk—then injects them into subsequent requests to reduce TTFT and increase throughput. Its design supports reuse beyond strict prefix matching and enables sharing across serving instances, improving efficiency under real multi-tenant traffic. The broader project includes examples, tests, a server component, and public posts describing cross-engine sharing and inter-GPU KV transfers. ...
    Downloads: 5 This Week
    Last Update:
    See Project
  • 15
    ChatGLM.cpp

    ChatGLM.cpp

    C++ implementation of ChatGLM-6B & ChatGLM2-6B & ChatGLM3 & GLM4(V)

    ChatGLM.cpp is a C++ implementation of the ChatGLM-6B model, enabling efficient local inference without requiring a Python environment. It is optimized for running on consumer hardware.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 16
    Mistral Finetune

    Mistral Finetune

    Memory-efficient and performant finetuning of Mistral's models

    mistral-finetune is an official lightweight codebase designed for memory-efficient and performant finetuning of Mistral’s open models (e.g. 7B, instruct variants). It builds on techniques like LoRA (Low-Rank Adaptation) to allow customizing models without full parameter updates, which reduces GPU memory footprint and training cost. The repo includes utilities for data preprocessing (e.g. reformat_data.py), validation scripts, and example YAML configs for training variants like 7B base or...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    Pedalboard

    Pedalboard

    A Python library for audio

    pedalboard is a Python library for working with audio: reading, writing, rendering, adding effects, and more. It supports the most popular audio file formats and a number of common audio effects out of the box and also allows the use of VST3® and Audio Unit formats for loading third-party software instruments and effects. pedalboard was built by Spotify’s Audio Intelligence Lab to enable using studio-quality audio effects from within Python and TensorFlow. Internally at Spotify, pedalboard...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 18
    Faster Whisper

    Faster Whisper

    Faster Whisper transcription with CTranslate2

    Faster Whisper is an optimized implementation of the Whisper speech recognition model designed to deliver significantly faster inference while maintaining comparable accuracy. It leverages efficient inference engines and optimized computation strategies to reduce latency and resource consumption. The system is particularly useful for real-time or large-scale transcription tasks where performance is critical. It supports multiple model sizes, allowing users to balance speed and accuracy based...
    Downloads: 27 This Week
    Last Update:
    See Project
  • 19
    Unsloth Studio

    Unsloth Studio

    Unified web UI for training and running open models locally

    Unsloth Studio is a web-based interface for running and training AI models locally with a unified and user-friendly experience. It allows users to work with a wide range of models for text, audio, vision, embeddings, and more without relying heavily on cloud infrastructure. Built on top of the Unsloth framework, it focuses on high-performance training with reduced VRAM usage and faster speeds compared to traditional methods. The platform supports fine-tuning, pretraining, and reinforcement...
    Downloads: 26 This Week
    Last Update:
    See Project
  • 20
    Wink-NLP

    Wink-NLP

    Developer friendly Natural Language Processing

    Wink-NLP is a lightweight and fast natural language processing library for JavaScript, optimized for browser and Node.js environments.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 21
    ncnn

    ncnn

    High-performance neural network inference framework for mobile

    ncnn is a high-performance neural network inference computing framework designed specifically for mobile platforms. It brings artificial intelligence right at your fingertips with no third-party dependencies, and speeds faster than all other known open source frameworks for mobile phone cpu. ncnn allows developers to easily deploy deep learning algorithm models to the mobile platform and create intelligent APPs. It is cross-platform and supports most commonly used CNN networks, including...
    Downloads: 29 This Week
    Last Update:
    See Project
  • 22
    bitnet.cpp

    bitnet.cpp

    Official inference framework for 1-bit LLMs

    bitnet.cpp is the official open-source inference framework and ecosystem designed to enable ultra-efficient execution of 1-bit large language models (LLMs), which quantize most model parameters to ternary values (-1, 0, +1) while maintaining competitive performance with full-precision counterparts. At its core is bitnet.cpp, a highly optimized C++ backend that supports fast, low-memory inference on both CPUs and GPUs, enabling models such as BitNet b1.58 to run without requiring enormous...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    Qwen-Agent

    Qwen-Agent

    Agent framework and applications built upon Qwen>=3.0

    Qwen-Agent is a framework for building applications / agents using Qwen models (version 3.0+). It provides components for instruction following, tool usage (function calling), planning, memory, RAG (retrieval augmented generation), code interpreter, etc. It ships with example applications (Browser Assistant, Code Interpreter, Custom Assistant), supports GUI front-ends, backends, server setups. Agent workflow can maintain context / memory to perform multi-turn or more complex logic over time. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 24
    Claude Cognitive

    Claude Cognitive

    Persistent context and multi-instance coordination

    Claude Cognitive is an advanced memory and context-management extension designed to address the stateless limitations of Claude Code by giving the model a form of persistent “working memory” and multi-instance coordination. It introduces an attention-based context router that prioritizes files and content relevant to the current development discussion — tagging them as HOT, WARM, or COLD based on recency and keyword activation — so Claude Code doesn’t waste token budget rereading irrelevant code. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    MNN

    MNN

    MNN is a blazing fast, lightweight deep learning framework

    ...Android platform, core so size is about 400KB, OpenCL so is about 400KB, Vulkan so is about 400KB. Supports hybrid computing on multiple devices. Currently supports CPU and GPU.
    Downloads: 26 This Week
    Last Update:
    See Project