Showing 25 open source projects for "cpu memory usage"

View related business solutions
  • Go From AI Idea to AI App Fast Icon
    Go From AI Idea to AI App Fast

    One platform to build, fine-tune, and deploy ML models. No MLOps team required.

    Access Gemini 3 and 200+ models. Build chatbots, agents, or custom models with built-in monitoring and scaling.
    Try Free
  • Earn up to 16% annual interest with Nexo. Icon
    Earn up to 16% annual interest with Nexo.

    More flexibility. More control.

    Generate interest, access liquidity without selling, and execute trades seamlessly. All in one platform. Geographic restrictions, eligibility, and terms apply.
    Get started with Nexo.
  • 1
    FlexLLMGen

    FlexLLMGen

    Running large language models on a single GPU

    ...The system focuses on high-throughput generation workloads where large batches of text must be processed quickly, such as large-scale data extraction or document analysis tasks. Instead of requiring expensive multi-GPU systems, the framework uses techniques such as memory offloading, compression, and optimized batching to run large models on commodity hardware. The architecture distributes computation and memory usage across the GPU, CPU, and disk in order to maximize the number of tokens processed during inference. This design allows organizations to deploy powerful language models for high-volume tasks without the infrastructure costs typically associated with large-scale AI systems. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    MegEngine

    MegEngine

    Easy-to-use deep learning framework with 3 key features

    ...After training, just put everything into your model and inference it on any platform at ease. Speed and precision problems won't bother you anymore due to the same core inside. In training, GPU memory usage could go down to one-third at the cost of only one additional line, which enables the DTR algorithm. Gain the lowest memory usage when inferencing a model by leveraging our unique pushdown memory planner. NOTE: MegEngine now supports Python installation on Linux-64bit/Windows-64bit/MacOS(CPU-Only)-10.14+/Android 7+(CPU-Only) platforms with Python from 3.5 to 3.8. ...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 3
    shimmy

    shimmy

    Python-free Rust inference server

    ...This compatibility enables developers to replace remote AI services with locally hosted models while keeping their existing software architecture intact. Shimmy focuses on performance and simplicity, using efficient runtime components to minimize memory usage and startup time compared to heavier inference frameworks. It supports modern model formats such as GGUF and SafeTensors and can automatically discover models stored locally or in common directories used by other AI tools. Advanced capabilities include CPU offloading for Mixture-of-Experts models and GPU acceleration, enabling large models to run on consumer hardware with limited VRAM.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 4
    whisper-timestamped

    whisper-timestamped

    Multilingual Automatic Speech Recognition with word-level timestamps

    Multilingual Automatic Speech Recognition with word-level timestamps and confidence. Whisper is a set of multi-lingual, robust speech recognition models trained by OpenAI that achieve state-of-the-art results in many languages. Whisper models were trained to predict approximate timestamps on speech segments (most of the time with 1-second accuracy), but they cannot originally predict word timestamps. This repository proposes an implementation to predict word timestamps and provide a more...
    Downloads: 1 This Week
    Last Update:
    See Project
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • 5
    Pedalboard

    Pedalboard

    A Python library for audio

    pedalboard is a Python library for working with audio: reading, writing, rendering, adding effects, and more. It supports the most popular audio file formats and a number of common audio effects out of the box and also allows the use of VST3® and Audio Unit formats for loading third-party software instruments and effects. pedalboard was built by Spotify’s Audio Intelligence Lab to enable using studio-quality audio effects from within Python and TensorFlow. Internally at Spotify, pedalboard...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    MNN

    MNN

    MNN is a blazing fast, lightweight deep learning framework

    ...Android platform, core so size is about 400KB, OpenCL so is about 400KB, Vulkan so is about 400KB. Supports hybrid computing on multiple devices. Currently supports CPU and GPU.
    Downloads: 8 This Week
    Last Update:
    See Project
  • 7

    LightGBM

    Gradient boosting framework based on decision tree algorithms

    LightGBM or Light Gradient Boosting Machine is a high-performance, open source gradient boosting framework based on decision tree algorithms. Compared to other boosting frameworks, LightGBM offers several advantages in terms of speed, efficiency and accuracy. Parallel experiments have shown that LightGBM can attain linear speed-up through multiple machines for training in specific settings, all while consuming less memory. LightGBM supports parallel and GPU learning, and can handle...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 8
    BitNet

    BitNet

    BitNet: Scaling 1-bit Transformers for Large Language Models

    ...The project implements the BitNet architecture described in research on scaling transformer models using extremely low-bit quantization techniques. In this approach, neural network weights are quantized to approximately one bit per parameter, allowing models to operate with far lower memory usage than traditional 16-bit or 32-bit neural networks. The architecture introduces specialized layers such as BitLinear, which replace standard linear projections in transformer networks with quantized operations. By limiting weight precision while maintaining efficient scaling and normalization strategies, the architecture aims to retain competitive performance while significantly reducing hardware requirements.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 9
    TensorRT

    TensorRT

    C++ library for high performance inference on NVIDIA GPUs

    NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference. It includes a deep learning inference optimizer and runtime that delivers low latency and high throughput for deep learning inference applications. TensorRT-based applications perform up to 40X faster than CPU-only platforms during inference. With TensorRT, you can optimize neural network models trained in all major frameworks, calibrate for lower precision with high accuracy, and deploy to hyperscale data centers,...
    Downloads: 36 This Week
    Last Update:
    See Project
  • Full-stack observability with actually useful AI | Grafana Cloud Icon
    Full-stack observability with actually useful AI | Grafana Cloud

    Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.

    Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.
    Create free account
  • 10
    DGL

    DGL

    Python package built to ease deep learning on graph

    Build your models with PyTorch, TensorFlow or Apache MXNet. Fast and memory-efficient message passing primitives for training Graph Neural Networks. Scale to giant graphs via multi-GPU acceleration and distributed training infrastructure. DGL empowers a variety of domain-specific projects including DGL-KE for learning large-scale knowledge graph embeddings, DGL-LifeSci for bioinformatics and cheminformatics, and many others.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    Core ML Tools

    Core ML Tools

    Core ML tools contain supporting tools for Core ML model conversion

    ...Your app uses Core ML APIs and user data to make predictions, and to fine-tune models, all on the user’s device. Core ML optimizes on-device performance by leveraging the CPU, GPU, and Neural Engine while minimizing its memory footprint and power consumption. Running a model strictly on the user’s device removes any need for a network connection, which helps keep the user’s data private and your app responsive.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    TensorFlow Model Optimization Toolkit

    TensorFlow Model Optimization Toolkit

    A toolkit to optimize ML models for deployment for Keras & TensorFlow

    ...Among many uses, the toolkit supports techniques used to reduce latency and inference costs for cloud and edge devices (e.g. mobile, IoT). Deploy models to edge devices with restrictions on processing, memory, power consumption, network usage, and model storage space. Enable execution on and optimize for existing hardware or new special purpose accelerators. Choose the model and optimization tool depending on your task. In many cases, pre-optimized models can improve the efficiency of your application. Try the post-training tools to optimize an already-trained TensorFlow model. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    tsai

    tsai

    Time series Timeseries Deep Learning Machine Learning Pytorch fastai

    ...If you require any of the dependencies that is not installed, tsai will ask you to install it when necessary) We've also added a new PredictionDynamics callback that will display the predictions during training. This is the type of output you would get in a classification task. New tutorial notebook on how to train your model with larger-than-memory datasets in less time achieving up to 100% GPU usage! See our new tutorial notebook on how to track your experiments with Weights & Biases
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    Smile

    Smile

    Statistical machine intelligence and learning engine

    ...Compared to this third-party benchmark, Smile outperforms R, Python, Spark, H2O, xgboost significantly. Smile is a couple of times faster than the closest competitor. The memory usage is also very efficient. If we can train advanced machine learning models on a PC, why buy a cluster? Write applications quickly in Java, Scala, or any JVM languages. Data scientists and developers can speak the same language now! Smile provides hundreds advanced algorithms with clean interface. Scala API also offers high-level operators that make it easy to build machine learning apps. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    AIMET

    AIMET

    AIMET is a library that provides advanced quantization and compression

    ...Quantized inference is significantly faster than floating point inference. For example, models that we’ve run on the Qualcomm® Hexagon™ DSP rather than on the Qualcomm® Kryo™ CPU have resulted in a 5x to 15x speedup. Plus, an 8-bit model also has a 4x smaller memory footprint relative to a 32-bit model. However, often when quantizing a machine learning model (e.g., from 32-bit floating point to an 8-bit fixed point value), the model accuracy is sacrificed.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    OnnxStream

    OnnxStream

    Lightweight inference library for ONNX files, written in C++

    ...Generally, major machine learning frameworks and libraries are focused on minimizing inference latency and/or maximizing throughput, all of which at the cost of RAM usage. So I decided to write a super small and hackable inference library specifically focused on minimizing memory consumption: OnnxStream. OnnxStream is based on the idea of decoupling the inference engine from the component responsible for providing the model weights, which is a class derived from WeightsProvider. A WeightsProvider specialization can implement any type of loading, caching, and prefetching of the model parameters.
    Downloads: 9 This Week
    Last Update:
    See Project
  • 17
    NanoDet-Plus

    NanoDet-Plus

    Lightweight anchor-free object detection model

    Super fast and high accuracy lightweight anchor-free object detection model. Real-time on mobile devices. NanoDet is a FCOS-style one-stage anchor-free object detection model which using Generalized Focal Loss as classification and regression loss. In NanoDet-Plus, we propose a novel label assignment strategy with a simple assign guidance module (AGM) and a dynamic soft label assigner (DSLA) to solve the optimal label assignment problem in lightweight model training. We also introduce a...
    Downloads: 10 This Week
    Last Update:
    See Project
  • 18
    KoboldAI

    KoboldAI

    Your gateway to GPT writing

    ...No matter if you want to use the free, fast power of Google Colab, your own high end graphics card, an online service you have an API key for (Like OpenAI or Inferkit) or if you rather just run it slower on your CPU you will be able to find a way to use KoboldAI that works for you.
    Leader badge
    Downloads: 123 This Week
    Last Update:
    See Project
  • 19
    MACE

    MACE

    Deep learning inference framework optimized for mobile platforms

    Mobile AI Compute Engine (or MACE for short) is a deep learning inference framework optimized for mobile heterogeneous computing on Android, iOS, Linux and Windows devices. Runtime is optimized with NEON, OpenCL and Hexagon, and Winograd algorithm is introduced to speed up convolution operations. The initialization is also optimized to be faster. Chip-dependent power options like big.LITTLE scheduling, Adreno GPU hints are included as advanced APIs. UI responsiveness guarantee is sometimes...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    Texar

    Texar

    Toolkit for Machine Learning, Natural Language Processing

    Texar is a toolkit aiming to support a broad set of machine learning, especially natural language processing and text generation tasks. Texar provides a library of easy-to-use ML modules and functionalities for composing whatever models and algorithms. The tool is designed for both researchers and practitioners for fast prototyping and experimentation. Texar was originally developed and is actively contributed by Petuum and CMU in collaboration with other institutes. A mirror of this...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    X-DeepLearning

    X-DeepLearning

    An industrial deep learning framework for high-dimension sparse data

    X-DeepLearning (XDL for short) is a complete set of deep optimization solutions for high-dimensional sparse data scenarios (such as advertising/recommendation/search, etc.). XDL version 1.2 has been released recently. Performance optimization for large batch/low concurrency scenarios, 50-100% performance improvement in such scenarios. Storage and communication optimization, parameters are automatically allocated globally without manual intervention, and requests are merged to completely...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    PyTorch Book

    PyTorch Book

    PyTorch tutorials and fun projects including neural talk

    This is the corresponding code for the book "The Deep Learning Framework PyTorch: Getting Started and Practical", but it can also be used as a standalone PyTorch Getting Started Guide and Tutorial. The current version of the code is based on pytorch 1.0.1, if you want to use an older version please git checkout v0.4or git checkout v0.3. Legacy code has better python2/python3 compatibility, CPU/GPU compatibility test. The new version of the code has not been fully tested, it has been tested...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    NNVM

    NNVM

    Open deep learning compiler stack for cpu, gpu

    The vision of the Apache NNVM Project is to host a diverse community of experts and practitioners in machine learning, compilers, and systems architecture to build an accessible, extensible, and automated open-source framework that optimizes current and emerging machine learning models for any hardware platform. Compilation of deep learning models into minimum deployable modules. Infrastructure to automatically generates and optimize models on more backend with better performance....
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    Bolt ML

    Bolt ML

    10x faster matrix and vector operations

    ...The core idea behind Bolt is to compress large collections of dense numeric vectors and perform mathematical operations directly on the compressed representations instead of decompressing them first. This approach significantly reduces both memory usage and computational overhead when working with high-dimensional data commonly used in machine learning systems. Bolt is particularly useful in applications such as similarity search, approximate nearest neighbor queries, and large-scale matrix computations where millions of vectors must be processed efficiently. The project includes algorithms designed to accelerate operations such as dot products and distance calculations, which are central to many machine learning tasks.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    CRFSharp

    CRFSharp

    CRFSharp is a .NET(C#) implementation of Conditional Random Field

    ...CRF#'s mainly algorithm is the same as CRF++ written by Taku Kudo. It encodes model parameters by L-BFGS. Moreover, it has many significant improvement than CRF++, such as totally parallel encoding, optimizing memory usage and so on. Currently, when training corpus, compared with CRF++, CRF# can make full use of multi-core CPUs and only uses very low memory, and memory grow is very smoothly and slowly while amount of training corpus, tags increase. with multi-threads process, CRF# is more suitable for large data and tags training than CRF++ now. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • Next
MongoDB Logo MongoDB