Showing 7 open source projects for "cpu usage"

View related business solutions
  • $300 Free Credits for Your Google Cloud Projects Icon
    $300 Free Credits for Your Google Cloud Projects

    Start building on Google Cloud with $300 in free credits. No commitment, no credit card required until you're ready to scale.

    Launch your next project with $300 in free Google Cloud credits—no strings attached. Test, build, and deploy without risk. Use your credits across the entire Google Cloud platform to find what works best for your needs. After your credits are used, continue with always-free tier services. Only pay when you're ready to scale. Sign up in minutes and start exploring.
    Start Free Trial
  • Stop Storing Third-Party Tokens in Your Database Icon
    Stop Storing Third-Party Tokens in Your Database

    Auth0 Token Vault handles secure token storage, exchange, and refresh for external providers so you don't have to build it yourself.

    Rolling your own OAuth token storage can be a security liability. Token Vault securely stores access and refresh tokens from federated providers and handles exchange and renewal automatically. Connected accounts, refresh exchange, and privileged worker flows included.
    Try Auth0 for Free
  • 1
    CTranslate2

    CTranslate2

    Fast inference engine for Transformer models

    ...The project implements a custom runtime that applies many performance optimization techniques such as weights quantization, layers fusion, batch reordering, etc., to accelerate and reduce the memory usage of Transformer models on CPU and GPU. The execution is significantly faster and requires less resources than general-purpose deep learning frameworks on supported models and tasks thanks to many advanced optimizations: layer fusion, padding removal, batch reordering, in-place operations, caching mechanism, etc. The model serialization and computation support weights with reduced precision: 16-bit floating points (FP16), 16-bit integers (INT16), and 8-bit integers (INT8). ...
    Downloads: 10 This Week
    Last Update:
    See Project
  • 2
    llama.cpp

    llama.cpp

    LLM inference in C/C++

    llama.cpp is a high-performance C and C++ project for running large language models locally and in the cloud with minimal setup. It is built around efficient inference, broad hardware support, and the GGUF model format. The project supports many model families and has become a major foundation for local AI tools, model serving, and embedded inference workflows. It provides command-line tools, a server mode with an OpenAI-compatible API style, model conversion utilities, and extensive backend...
    Downloads: 9 This Week
    Last Update:
    See Project
  • 3
    whisper.cpp

    whisper.cpp

    Port of OpenAI's Whisper model in C/C++

    whisper.cpp is a lightweight, C/C++ reimplementation of OpenAI’s Whisper automatic speech recognition (ASR) model—designed for efficient, standalone transcription without external dependencies. The entire high-level implementation of the model is contained in whisper.h and whisper.cpp. The rest of the code is part of the ggml machine learning library. The command downloads the base.en model converted to custom ggml format and runs the inference on all .wav samples in the folder samples....
    Downloads: 455 This Week
    Last Update:
    See Project
  • 4
    PowerInfer

    PowerInfer

    High-speed Large Language Model Serving for Local Deployment

    PowerInfer is a high-performance inference engine designed to run large language models efficiently on personal computers equipped with consumer-grade GPUs. The project focuses on improving the performance of local AI inference by optimizing how neural network computations are distributed between CPU and GPU resources. Its architecture exploits the observation that only a subset of neurons in large models are frequently activated, allowing the system to preload frequently used neurons into...
    Downloads: 0 This Week
    Last Update:
    See Project
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • 5
    MNN

    MNN

    MNN is a blazing fast, lightweight deep learning framework

    ...Android platform, core so size is about 400KB, OpenCL so is about 400KB, Vulkan so is about 400KB. Supports hybrid computing on multiple devices. Currently supports CPU and GPU.
    Downloads: 26 This Week
    Last Update:
    See Project
  • 6
    MegEngine

    MegEngine

    Easy-to-use deep learning framework with 3 key features

    ...After training, just put everything into your model and inference it on any platform at ease. Speed and precision problems won't bother you anymore due to the same core inside. In training, GPU memory usage could go down to one-third at the cost of only one additional line, which enables the DTR algorithm. Gain the lowest memory usage when inferencing a model by leveraging our unique pushdown memory planner. NOTE: MegEngine now supports Python installation on Linux-64bit/Windows-64bit/MacOS(CPU-Only)-10.14+/Android 7+(CPU-Only) platforms with Python from 3.5 to 3.8. ...
    Downloads: 6 This Week
    Last Update:
    See Project
  • 7
    TurboTransformers

    TurboTransformers

    Fast and user-friendly runtime for transformer inference

    TurboTransformers is a high-performance inference framework optimized for running Transformer models efficiently on CPUs and GPUs. It improves latency and throughput for NLP applications.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • Next
Auth0 Logo