Showing 156 open source projects for "cpu memory usage"

View related business solutions
  • Gemini 3 and 200+ AI Models on One Platform Icon
    Gemini 3 and 200+ AI Models on One Platform

    Access Google's best plus Claude, Llama, and Gemma. Fine-tune and deploy from one console.

    Build, govern, and optimize agents and models with Gemini Enterprise Agent Platform.
    Start Free
  • Forever Free Full-Stack Observability | Grafana Cloud Icon
    Forever Free Full-Stack Observability | Grafana Cloud

    Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.

    Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.
    Create free account
  • 1
    whisper-timestamped

    whisper-timestamped

    Multilingual Automatic Speech Recognition with word-level timestamps

    Multilingual Automatic Speech Recognition with word-level timestamps and confidence. Whisper is a set of multi-lingual, robust speech recognition models trained by OpenAI that achieve state-of-the-art results in many languages. Whisper models were trained to predict approximate timestamps on speech segments (most of the time with 1-second accuracy), but they cannot originally predict word timestamps. This repository proposes an implementation to predict word timestamps and provide a more...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 2
    xLSTM

    xLSTM

    Neural Network architecture based on ideas of the original LSTM

    ...The architecture aims to provide competitive performance with transformer-based models while maintaining advantages such as linear computational scaling and efficient memory usage for long sequences. Researchers have demonstrated that xLSTM models can scale to billions of parameters and large training datasets while maintaining efficient inference speeds.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    OpenAI Agents (Python)

    OpenAI Agents (Python)

    A lightweight, powerful framework for multi-agent workflows

    openai-agents-python is a library developed by OpenAI to simplify the process of creating and running agents that interact with tools and APIs using OpenAI models. It provides abstractions for tool usage, memory management, and agent workflows, enabling developers to define function-calling agents that reason through multi-step tasks. Ideal for building custom AI workflows, the library supports dynamic tool definitions and contextual memory handling.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 4
    KVCache-Factory

    KVCache-Factory

    Unified KV Cache Compression Methods for Auto-Regressive Models

    ...In large language models, the key-value cache stores intermediate attention states that enable efficient token generation during inference, but these caches can consume large amounts of GPU memory when handling long contexts. KVCache-Factory provides a platform for implementing and evaluating multiple compression strategies that reduce memory usage while preserving model performance. The framework integrates several state-of-the-art methods such as PyramidKV, SnapKV, H2O, and StreamingLLM, allowing researchers to compare and experiment with different approaches within the same environment. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • 5
    Kitten TTS

    Kitten TTS

    State-of-the-art TTS model under 25MB

    KittenTTS is an open-source, ultra-lightweight, and high-quality text-to-speech model featuring just 15 million parameters and a binary size under 25 MB. It is designed for real-time CPU-based deployment across diverse platforms. Ultra-lightweight, model size less than 25MB. CPU-optimized, runs without GPU on any device. High-quality voices, several premium voice options available. Fast inference, optimized for real-time speech synthesis.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 6
    ZeroClaw

    ZeroClaw

    Fast, small, and fully autonomous AI assistant infrastructure

    ZeroClaw is a Rust-native autonomous AI agent framework engineered for teams and developers who need highly efficient, secure, and modular AI automation infrastructure that can run reliably in both production and self-hosted environments. It is designed around a trait-based architecture so that model providers, communication channels, memory systems, and tooling integrations can be swapped or extended without rewriting core components, giving engineers flexibility and long-term...
    Downloads: 13 This Week
    Last Update:
    See Project
  • 7
    PicoLM

    PicoLM

    Run a 1-billion parameter LLM on a $10 board with 256MB RAM

    PicoLM is an open-source inference framework designed to run large language models on extremely constrained hardware environments such as inexpensive single-board computers and embedded systems. The project focuses on enabling efficient local inference by optimizing memory usage, computation, and system dependencies so that relatively large models can operate on devices with minimal RAM. It is written primarily in C and designed with a minimalist architecture that removes unnecessary dependencies and external libraries. The runtime is capable of running language models with billions of parameters on devices with only a few hundred megabytes of memory, which is significantly lower than typical LLM infrastructure requirements. ...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 8
    Dayflow

    Dayflow

    Automatic AI-powered timeline of your daily work activity logs

    ...It continuously captures lightweight snapshots of the screen and processes them at intervals using AI to produce contextual summaries of what the user was actually doing. Unlike traditional time trackers that only log application usage, it focuses on understanding the intent behind activities, distinguishing productive work from distractions. It is built as a native SwiftUI application and emphasizes efficiency, using minimal CPU and memory while running in the background. A strong focus is placed on privacy, as all captured data remains local by default and users can choose their preferred AI provider, including local models or external services. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    OpenAI CS Agents Demo

    OpenAI CS Agents Demo

    Demo of a customer service use case implemented with the OpenAI Agents

    ...It also demonstrates guardrails to validate or constrain responses, memory usage to maintain context, and tracing to help debugging of workflows.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Custom VMs From 1 to 96 vCPUs With 99.95% Uptime Icon
    Custom VMs From 1 to 96 vCPUs With 99.95% Uptime

    General-purpose, compute-optimized, or GPU/TPU-accelerated. Built to your exact specs.

    Live migration and automatic failover keep workloads online through maintenance. One free e2-micro VM every month.
    Try Free
  • 10
    Tencent-Hunyuan-Large

    Tencent-Hunyuan-Large

    Open-source large language model family from Tencent Hunyuan

    ...It is designed with long-context capabilities, quantization support, and high performance on benchmarks across general reasoning, mathematics, language understanding, and Chinese / multilingual tasks. It aims to provide competitive capability with efficient deployment and inference. FP8 quantization support to reduce memory usage (~50%) while maintaining precision. High benchmarking performance on tasks like MMLU, MATH, CMMLU, C-Eval, etc.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 11
    ChatGLM.cpp

    ChatGLM.cpp

    C++ implementation of ChatGLM-6B & ChatGLM2-6B & ChatGLM3 & GLM4(V)

    ChatGLM.cpp is a C++ implementation of the ChatGLM-6B model, enabling efficient local inference without requiring a Python environment. It is optimized for running on consumer hardware.
    Downloads: 12 This Week
    Last Update:
    See Project
  • 12
    R-KV

    R-KV

    Redundancy-aware KV Cache Compression for Reasoning Models

    R-KV is an open-source research project that focuses on improving the efficiency of large language model inference through key-value cache compression techniques. Modern transformer models rely heavily on KV caches during autoregressive decoding, which store intermediate attention states to accelerate generation. However, these caches can consume large amounts of memory, especially in reasoning-oriented models with long context windows. R-KV introduces a method for compressing the KV cache...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    Cactus

    Cactus

    Low-latency AI inference engine optimized for mobile devices

    ...It provides a full-stack architecture composed of an inference engine, a computation graph system, and highly optimized hardware kernels tailored for ARM-based processors. Cactus emphasizes efficient memory usage through techniques such as zero-copy computation graphs and quantized model formats, allowing large models to run within the constraints of mobile hardware. It supports a wide range of AI tasks including text generation, speech-to-text, vision processing, and retrieval-augmented workflows through a unified API interface. A notable feature of Cactus is its hybrid execution model, which can dynamically route tasks between on-device processing and cloud services when additional compute is required.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 14
    AI Agent Deep Dive

    AI Agent Deep Dive

    AI Agent Source Code Deep Research Report

    AI Agent Deep Dive is a comprehensive educational repository designed to provide a deep and structured understanding of how modern AI agents work, focusing on architecture, workflows, and real-world implementation patterns. It breaks down complex concepts such as planning, tool usage, memory management, and multi-step reasoning into digestible explanations and practical examples. The project is organized as a learning resource rather than a standalone framework, making it particularly useful for developers who want to move beyond surface-level prompt engineering into full agent system design. It explores how agents interact with environments, execute tasks, and maintain context over time, highlighting both strengths and limitations of current approaches. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    ChatGLM2-6B

    ChatGLM2-6B

    ChatGLM2-6B: An Open Bilingual Chat LLM

    ...It upgrades the base model with GLM’s hybrid pretraining objective, 1.4 TB bilingual data, and preference alignment—delivering big gains on MMLU, CEval, GSM8K, and BBH. The context window extends up to 32K (FlashAttention), and Multi-Query Attention improves speed and memory use. The repo includes Python APIs, CLI & web demos, OpenAI-style/FASTAPI servers, and quantized checkpoints for lightweight local deployment on GPUs or CPU/MPS.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 16
    llmfit

    llmfit

    157 models, 30 providers, one command to find what runs on hardware

    llmfit is a terminal-based utility that helps developers determine which large language models can realistically run on their local hardware by analyzing system resources and model requirements. The tool automatically detects CPU, RAM, GPU, and VRAM specifications, then ranks available models based on performance factors such as speed, quality, and memory fit. It provides both an interactive terminal user interface and a traditional CLI mode, enabling flexible workflows for different user preferences. llmfit also supports advanced configurations including multi-GPU setups, mixture-of-experts architectures, and dynamic quantization recommendations. ...
    Downloads: 11 This Week
    Last Update:
    See Project
  • 17
    Build Your Own OpenClaw

    Build Your Own OpenClaw

    A step-by-step guide to build your own AI agent

    Build Your Own OpenClaw is a step-by-step educational framework that teaches developers how to construct a fully functional AI agent system from scratch, gradually evolving from a simple chat loop into a multi-agent, production-ready architecture. The project is structured into 18 progressive stages, each introducing a new concept such as tool usage, memory persistence, event-driven design, and multi-agent coordination, with each step including both explanatory documentation and runnable code. It begins with foundational concepts like conversational loops and tool integration, then expands into more advanced capabilities such as dynamic skill loading, web interaction, and context management. ...
    Downloads: 7 This Week
    Last Update:
    See Project
  • 18
    Vellum

    Vellum

    A personal AI assistant that evolves with you

    Vellum is an open-source personal AI assistant platform designed to function as a persistent, autonomous digital companion across desktop and messaging environments. Unlike traditional chatbot interfaces, the project focuses on long-term memory, identity, proactive behavior, and real-world tool usage, enabling assistants to evolve alongside the user over time. The system integrates with macOS, Telegram, Slack, SMS, and additional communication channels while maintaining shared memory and context across platforms. Its architecture combines local-first storage, tool orchestration, sandboxed execution, and extensible workflow automation to allow assistants to read files, manage schedules, send messages, browse the web, and control applications. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 19
    Faster Whisper

    Faster Whisper

    Faster Whisper transcription with CTranslate2

    Faster Whisper is an optimized implementation of the Whisper speech recognition model designed to deliver significantly faster inference while maintaining comparable accuracy. It leverages efficient inference engines and optimized computation strategies to reduce latency and resource consumption. The system is particularly useful for real-time or large-scale transcription tasks where performance is critical. It supports multiple model sizes, allowing users to balance speed and accuracy based...
    Downloads: 34 This Week
    Last Update:
    See Project
  • 20
    Mistral Finetune

    Mistral Finetune

    Memory-efficient and performant finetuning of Mistral's models

    mistral-finetune is an official lightweight codebase designed for memory-efficient and performant finetuning of Mistral’s open models (e.g. 7B, instruct variants). It builds on techniques like LoRA (Low-Rank Adaptation) to allow customizing models without full parameter updates, which reduces GPU memory footprint and training cost. The repo includes utilities for data preprocessing (e.g. reformat_data.py), validation scripts, and example YAML configs for training variants like 7B base or...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    ncnn

    ncnn

    High-performance neural network inference framework for mobile

    ncnn is a high-performance neural network inference computing framework designed specifically for mobile platforms. It brings artificial intelligence right at your fingertips with no third-party dependencies, and speeds faster than all other known open source frameworks for mobile phone cpu. ncnn allows developers to easily deploy deep learning algorithm models to the mobile platform and create intelligent APPs. It is cross-platform and supports most commonly used CNN networks, including...
    Downloads: 34 This Week
    Last Update:
    See Project
  • 22
    LMCache

    LMCache

    Supercharge Your LLM with the Fastest KV Cache Layer

    LMCache is an extension layer for LLM serving engines that accelerates inference, especially with long contexts, by storing and reusing key-value (KV) attention caches across requests. Instead of rebuilding KV states for repeated or shared text segments, LMCache persists and retrieves them from multiple tiers—GPU memory, CPU DRAM, and local disk—then injects them into subsequent requests to reduce TTFT and increase throughput. Its design supports reuse beyond strict prefix matching and enables sharing across serving instances, improving efficiency under real multi-tenant traffic. The broader project includes examples, tests, a server component, and public posts describing cross-engine sharing and inter-GPU KV transfers. ...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 23
    Claude Cognitive

    Claude Cognitive

    Persistent context and multi-instance coordination

    Claude Cognitive is an advanced memory and context-management extension designed to address the stateless limitations of Claude Code by giving the model a form of persistent “working memory” and multi-instance coordination. It introduces an attention-based context router that prioritizes files and content relevant to the current development discussion — tagging them as HOT, WARM, or COLD based on recency and keyword activation — so Claude Code doesn’t waste token budget rereading irrelevant code. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    Pedalboard

    Pedalboard

    A Python library for audio

    pedalboard is a Python library for working with audio: reading, writing, rendering, adding effects, and more. It supports the most popular audio file formats and a number of common audio effects out of the box and also allows the use of VST3® and Audio Unit formats for loading third-party software instruments and effects. pedalboard was built by Spotify’s Audio Intelligence Lab to enable using studio-quality audio effects from within Python and TensorFlow. Internally at Spotify, pedalboard...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    Wink-NLP

    Wink-NLP

    Developer friendly Natural Language Processing

    Wink-NLP is a lightweight and fast natural language processing library for JavaScript, optimized for browser and Node.js environments.
    Downloads: 0 This Week
    Last Update:
    See Project
MongoDB Logo MongoDB