Showing 6 open source projects for "throughput"

View related business solutions
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • Stop Storing Third-Party Tokens in Your Database Icon
    Stop Storing Third-Party Tokens in Your Database

    Auth0 Token Vault handles secure token storage, exchange, and refresh for external providers so you don't have to build it yourself.

    Rolling your own OAuth token storage can be a security liability. Token Vault securely stores access and refresh tokens from federated providers and handles exchange and renewal automatically. Connected accounts, refresh exchange, and privileged worker flows included.
    Try Auth0 for Free
  • 1
    TensorRT

    TensorRT

    C++ library for high performance inference on NVIDIA GPUs

    NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference. It includes a deep learning inference optimizer and runtime that delivers low latency and high throughput for deep learning inference applications. TensorRT-based applications perform up to 40X faster than CPU-only platforms during inference. With TensorRT, you can optimize neural network models trained in all major frameworks, calibrate for lower precision with high accuracy, and deploy to hyperscale data centers, embedded, or automotive product platforms. ...
    Downloads: 14 This Week
    Last Update:
    See Project
  • 2
    OpenMLDB

    OpenMLDB

    OpenMLDB is an open-source machine learning database

    ...However, a feature engineering script developed by data scientists (Python scripts in most cases) cannot be directly deployed into production for online inference because it usually cannot meet the engineering requirements, such as low latency, high throughput and high availability.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    OpenVINO Model Server

    OpenVINO Model Server

    A scalable inference server for models optimized with OpenVINO

    ...It’s implemented in C++ for scalability and efficiency, making it suitable for both edge and cloud deployments where inference workloads must be reliable and high throughput. The server exposes model inference via standard network protocols like REST and gRPC, allowing any client that speaks those protocols to request predictions remotely, abstracting away the complexity of where and how the model runs. It supports model deployment in diverse environments including Docker, bare-metal machines, and Kubernetes clusters, and is especially useful in microservices architectures where AI services need to scale independently. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    DALI

    DALI

    A GPU-accelerated library containing highly optimized building blocks

    ...DALI addresses the problem of the CPU bottleneck by offloading data preprocessing to the GPU. Additionally, DALI relies on its own execution engine, built to maximize the throughput of the input pipeline.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Try Google Cloud Risk-Free With $300 in Credit Icon
    Try Google Cloud Risk-Free With $300 in Credit

    No hidden charges. No surprise bills. Cancel anytime.

    Use your credit across every product. Compute, storage, AI, analytics. When it runs out, 20+ products stay free. You only pay when you choose to.
    Start Free
  • 5
    OnnxStream

    OnnxStream

    Lightweight inference library for ONNX files, written in C++

    ...The recommended minimum RAM/VRAM for Stable Diffusion 1.5 is typically 8GB. Generally, major machine learning frameworks and libraries are focused on minimizing inference latency and/or maximizing throughput, all of which at the cost of RAM usage. So I decided to write a super small and hackable inference library specifically focused on minimizing memory consumption: OnnxStream. OnnxStream is based on the idea of decoupling the inference engine from the component responsible for providing the model weights, which is a class derived from WeightsProvider. ...
    Downloads: 24 This Week
    Last Update:
    See Project
  • 6
    TurboTransformers

    TurboTransformers

    Fast and user-friendly runtime for transformer inference

    TurboTransformers is a high-performance inference framework optimized for running Transformer models efficiently on CPUs and GPUs. It improves latency and throughput for NLP applications.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • Next
MongoDB Logo MongoDB