Showing 88 open source projects for "cpu memory usage"

View related business solutions
  • Earn up to 16% annual interest with Nexo. Icon
    Earn up to 16% annual interest with Nexo.

    Access competitive interest rates on your digital assets.

    Generate interest, borrow against your crypto, and trade a range of cryptocurrencies — all in one platform. Geographic restrictions, eligibility, and terms apply.
    Get started with Nexo.
  • Streamline Azure Security with Palo Alto Networks VM-Series Icon
    Streamline Azure Security with Palo Alto Networks VM-Series

    Centrally manage physical and virtualized firewalls with Panorama

    Improve your security posture and reduce incident response time. Use the VM-Series to natively analyze Azure traffic and dynamically drive policy updates based on workload changes.
    Learn more
  • 1
    Continuous Claude v3

    Continuous Claude v3

    Context management for Claude Code. Hooks maintain state via ledgers

    ...It also includes a layered code analysis pipeline to reduce token usage and maintain relevant context efficiently. This continuous learning environment enables workflows such as bug fixing, refactoring, planning, and exploratory investigation while minimizing the need to re-explain context manually.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    OpenViking

    OpenViking

    Context database designed specifically for AI Agents

    OpenViking is an open-source context database engineered for efficient indexing and retrieval of large amounts of unstructured or semi-structured context data used by AI applications. It’s primarily designed to serve as a high-performance, scalable backend for storing app context, embeddings, conversational histories, and other textual artifacts that need rapid lookup and semantic search, which makes it especially useful for systems like chatbots or memory-augmented agents. The project is...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 3
    Qwen

    Qwen

    The official repo of Qwen chat & pretrained large language model

    Qwen is a series of large language models developed by Alibaba Cloud, consisting of various pretrained versions like Qwen-1.8B, Qwen-7B, Qwen-14B, and Qwen-72B. These models, which range from smaller to larger configurations, are designed for a wide range of natural language processing tasks. They are openly available for research and commercial use, with Qwen's code and model weights shared on GitHub. Qwen's capabilities include text generation, comprehension, and conversation, making it a...
    Downloads: 7 This Week
    Last Update:
    See Project
  • 4
    nanobot

    nanobot

    🐈 nanobot: The Ultra-Lightweight Clawdbot / OpenClaw

    nanobot is an ultra-lightweight personal AI assistant designed to deliver powerful agent capabilities without unnecessary complexity. Built in just ~4,000 lines of clean, readable code, it offers a minimalist alternative to heavyweight agent frameworks while retaining core intelligence and extensibility. nanobot is optimized for speed and efficiency, enabling fast startup times and low resource usage across environments. Its research-ready architecture makes it easy for developers to...
    Downloads: 6 This Week
    Last Update:
    See Project
  • Gemini 3 and 200+ AI Models on One Platform Icon
    Gemini 3 and 200+ AI Models on One Platform

    Access Google's best plus Claude, Llama, and Gemma. Fine-tune and deploy from one console.

    Build, govern, and optimize agents and models with Gemini Enterprise Agent Platform.
    Start Free
  • 5
    Z80-μLM

    Z80-μLM

    Z80-μLM is a 2-bit quantized language model

    ...The project sits at the intersection of machine learning and systems constraints, showing how model architecture, quantization, and inference code generation can be adapted to extreme memory and compute limits. It also functions as an educational reference for how to reduce inference to operations that fit an old-school instruction set and runtime environment.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    Core ML Tools

    Core ML Tools

    Core ML tools contain supporting tools for Core ML model conversion

    ...Your app uses Core ML APIs and user data to make predictions, and to fine-tune models, all on the user’s device. Core ML optimizes on-device performance by leveraging the CPU, GPU, and Neural Engine while minimizing its memory footprint and power consumption. Running a model strictly on the user’s device removes any need for a network connection, which helps keep the user’s data private and your app responsive.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    Open-LLM-VTuber

    Open-LLM-VTuber

    Open source AI VTuber platform with voice chat and Live2D avatars

    Open-LLM-VTuber is an open source platform designed to create AI-powered VTuber characters that can interact with users through voice and animated avatars. It enables hands-free conversations with large language models by combining speech recognition, language processing, and text-to-speech synthesis into a single system. Users can speak directly to the AI character, and the system can respond with a generated voice while animating a Live2D avatar to simulate a talking virtual personality....
    Downloads: 2 This Week
    Last Update:
    See Project
  • 8
    NVIDIA Model Optimizer

    NVIDIA Model Optimizer

    A unified library of SOTA model optimization techniques

    Model Optimizer is a unified library that provides state-of-the-art techniques for compressing and optimizing deep learning models to improve inference efficiency and deployment performance. It brings together multiple optimization strategies such as quantization, pruning, distillation, and speculative decoding into a single cohesive framework. The library is designed to reduce model size and computational requirements while maintaining accuracy, making it particularly valuable for deploying...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    Chitu

    Chitu

    High-performance inference framework for large language models

    ...Chitu is designed to scale from small single-machine deployments to large distributed clusters that handle high volumes of concurrent inference requests. The system also includes performance optimizations for large models, including support for quantized formats and efficient computation operators that reduce memory usage and latency. Its architecture aims to support enterprise adoption by ensuring stable long-term operation under production workloads.
    Downloads: 2 This Week
    Last Update:
    See Project
  • Try Google Cloud Risk-Free With $300 in Credit Icon
    Try Google Cloud Risk-Free With $300 in Credit

    No hidden charges. No surprise bills. Cancel anytime.

    Use your credit across every product. Compute, storage, AI, analytics. When it runs out, 20+ products stay free. You only pay when you choose to.
    Start Free
  • 10
    LLM-Pruner

    LLM-Pruner

    On the Structural Pruning of Large Language Models

    LLM-Pruner is an open-source framework designed to compress large language models through structured pruning techniques while maintaining their general capabilities. Large language models often require enormous computational resources, making them expensive to deploy and inefficient for many practical applications. LLM-Pruner addresses this issue by identifying and removing non-essential components within transformer architectures, such as redundant attention heads or feed-forward...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 11
    TensorFlow Model Optimization Toolkit

    TensorFlow Model Optimization Toolkit

    A toolkit to optimize ML models for deployment for Keras & TensorFlow

    ...Among many uses, the toolkit supports techniques used to reduce latency and inference costs for cloud and edge devices (e.g. mobile, IoT). Deploy models to edge devices with restrictions on processing, memory, power consumption, network usage, and model storage space. Enable execution on and optimize for existing hardware or new special purpose accelerators. Choose the model and optimization tool depending on your task. In many cases, pre-optimized models can improve the efficiency of your application. Try the post-training tools to optimize an already-trained TensorFlow model. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    Torch Pruning

    Torch Pruning

    DepGraph: Towards Any Structural Pruning

    Torch-Pruning is an open-source toolkit designed to optimize deep neural networks by performing structural pruning directly within PyTorch models. The library focuses on reducing the size and computational cost of neural networks by removing redundant parameters and channels while maintaining model performance. It introduces a graph-based algorithm called DepGraph that automatically identifies dependencies between layers, allowing parameters to be pruned safely across complex architectures....
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    tsai

    tsai

    Time series Timeseries Deep Learning Machine Learning Pytorch fastai

    ...If you require any of the dependencies that is not installed, tsai will ask you to install it when necessary) We've also added a new PredictionDynamics callback that will display the predictions during training. This is the type of output you would get in a classification task. New tutorial notebook on how to train your model with larger-than-memory datasets in less time achieving up to 100% GPU usage! See our new tutorial notebook on how to track your experiments with Weights & Biases
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    marqo

    marqo

    Tensor search for humans

    A tensor-based search and analytics engine that seamlessly integrates with your applications, websites, and workflows. Marqo is a versatile and robust search and analytics engine that can be integrated into any website or application. Due to horizontal scalability, Marqo provides lightning-fast query times, even with millions of documents. Marqo helps you configure deep-learning models like CLIP to pull semantic meaning from images. It can seamlessly handle image-to-image, image-to-text and...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    DocArray

    DocArray

    The data structure for multimodal data

    ...Data science powerhouse: greatly accelerate data scientists’ work on embedding, k-NN matching, querying, visualizing, evaluating via Torch/TensorFlow/ONNX/PaddlePaddle on CPU/GPU. Data in transit: optimized for network communication, ready-to-wire at anytime with fast and compressed serialization in Protobuf, bytes, base64, JSON, CSV, DataFrame. Perfect for streaming and out-of-memory data. One-stop k-NN: Unified and consistent API for mainstream vector databases.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    AIMET

    AIMET

    AIMET is a library that provides advanced quantization and compression

    ...Quantized inference is significantly faster than floating point inference. For example, models that we’ve run on the Qualcomm® Hexagon™ DSP rather than on the Qualcomm® Kryo™ CPU have resulted in a 5x to 15x speedup. Plus, an 8-bit model also has a 4x smaller memory footprint relative to a 32-bit model. However, often when quantizing a machine learning model (e.g., from 32-bit floating point to an 8-bit fixed point value), the model accuracy is sacrificed.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    FastChat

    FastChat

    Open platform for training, serving, and evaluating language models

    FastChat is an open platform for training, serving, and evaluating large language model-based chatbots. If you do not have enough memory, you can enable 8-bit compression by adding --load-8bit to the commands above. This can reduce memory usage by around half with slightly degraded model quality. It is compatible with the CPU, GPU, and Metal backend. Vicuna-13B with 8-bit compression can run on a single NVIDIA 3090/4080/T4/V100(16GB) GPU. In addition to that, you can add --cpu-offloading to commands above to offload weights that don't fit on your GPU onto the CPU memory. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 18
    Mini Agent

    Mini Agent

    A minimal yet professional single agent demo project

    Mini-Agent is a minimal yet production-minded demo project that shows how to build a serious command-line AI agent around the MiniMax-M2 model. It is designed both as a reference implementation and as a usable agent, demonstrating a full execution loop that includes planning, tool calls, and iterative refinement. The project exposes an Anthropic-compatible API interface and fully supports interleaved thinking, letting the agent alternate between reasoning steps and tool invocations during...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    Qwen2.5-Omni

    Qwen2.5-Omni

    Capable of understanding text, audio, vision, video

    Qwen2.5-Omni is an end-to-end multimodal flagship model in the Qwen series by Alibaba Cloud, designed to process multiple modalities (text, images, audio, video) and generate responses both as text and natural speech in streaming real-time. It supports “Thinker-Talker” architecture, and introduces innovations for aligning modalities over time (for example synchronizing video/audio), robust speech generation, and low-VRAM/quantized versions to make usage more accessible. It holds...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    KoboldCpp

    KoboldCpp

    Run GGUF models easily with a UI or API. One File. Zero Install.

    KoboldCpp is an easy-to-use AI text-generation software for GGML and GGUF models, inspired by the original KoboldAI. It's a single self-contained distributable that builds off llama.cpp and adds many additional powerful features.
    Leader badge
    Downloads: 285 This Week
    Last Update:
    See Project
  • 21
    HunyuanVideo-I2V

    HunyuanVideo-I2V

    A Customizable Image-to-Video Model based on HunyuanVideo

    HunyuanVideo-I2V is a customizable image-to-video generation framework developed by Tencent, extending the capabilities of HunyuanVideo. It allows for high-quality video creation from still images, using PyTorch and providing pre-trained model weights, inference code, and customizable training options. The system includes a LoRA training code for adding special effects and enhancing video realism, aiming to offer versatile and scalable solutions for generating videos from static image inputs.
    Downloads: 6 This Week
    Last Update:
    See Project
  • 22
    Warlock-Studio

    Warlock-Studio

    AI Suite for upscaling, interpolating & restoring images/videos

    v6.0. Warlock-Studio is a Windows application that uses Real-ESRGAN, BSRGAN, IRCNN, GFPGAN, RealESRNet, RealESRAnime and RIFE Artificial Intelligence models to upscale, restore faces, interpolate frames and reduce noise in images and videos. the application supports GPU acceleration (including multi-GPU setups) and offers batch processing for large workloads. It includes drag-and-drop handling for single or multiple files, optional pre-resize functions, and an automatic tiling system...
    Downloads: 17 This Week
    Last Update:
    See Project
  • 23
    AutoGPTQ

    AutoGPTQ

    An easy-to-use LLMs quantization package with user-friendly apis

    AutoGPTQ is an implementation of GPTQ (Quantized GPT) that optimizes large language models (LLMs) for faster inference by reducing their computational footprint while maintaining accuracy.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    Mixtral offloading

    Mixtral offloading

    Run Mixtral-8x7B models in Colab or consumer desktops

    Mixtral-Offloading is an open-source project designed to enable efficient inference of large Mixture-of-Experts language models such as Mixtral-8x7B on hardware with limited GPU memory. The project implements techniques that allow model components to be dynamically moved between CPU memory and GPU memory during inference, significantly reducing the amount of GPU VRAM required to run the model. This approach takes advantage of the sparse activation properties of mixture-of-experts architectures, where only a subset of expert networks are used for each token during generation. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    SuperAGI

    SuperAGI

    A dev-first open source autonomous AI agent framework

    ...Connect to multiple Vector DBs to enhance your agent’s performance. Each agent is unique, use different models of your choice. Get insights into your agent’s performance and optimize accordingly. Control token usage to manage costs effectively. Enable your agents to learn and adapt by storing their memory. Get notified when agents get stuck in the loop, and provide proactive resolution. Read and store files generated by Agents.
    Downloads: 0 This Week
    Last Update:
    See Project
MongoDB Logo MongoDB