Showing 27 open source projects for "tensorrt"

View related business solutions
  • $300 Free Credits for Your Google Cloud Projects Icon
    $300 Free Credits for Your Google Cloud Projects

    Start building on Google Cloud with $300 in free credits. No commitment, no credit card required until you're ready to scale.

    Launch your next project with $300 in free Google Cloud credits—no strings attached. Test, build, and deploy without risk. Use your credits across the entire Google Cloud platform to find what works best for your needs. After your credits are used, continue with always-free tier services. Only pay when you're ready to scale. Sign up in minutes and start exploring.
    Start Free Trial
  • Streamline Azure Security with Palo Alto Networks VM-Series Icon
    Streamline Azure Security with Palo Alto Networks VM-Series

    Centrally manage physical and virtualized firewalls with Panorama

    Improve your security posture and reduce incident response time. Use the VM-Series to natively analyze Azure traffic and dynamically drive policy updates based on workload changes.
    Learn more
  • 1
    TensorRT LLM

    TensorRT LLM

    TensorRT LLM provides users with an easy-to-use Python API

    TensorRT-LLM is an open-source high-performance inference library specifically designed to optimize and accelerate large language model deployment on NVIDIA GPUs. It provides a Python-based API built on top of PyTorch that allows developers to define, customize, and deploy LLMs efficiently across a variety of hardware configurations, from single GPUs to large multi-node clusters.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 2
    Torch-TensorRT

    Torch-TensorRT

    PyTorch/TorchScript/FX compiler for NVIDIA GPUs using TensorRT

    Torch-TensorRT is a compiler for PyTorch/TorchScript, targeting NVIDIA GPUs via NVIDIA’s TensorRT Deep Learning Optimizer and Runtime. Unlike PyTorch’s Just-In-Time (JIT) compiler, Torch-TensorRT is an Ahead-of-Time (AOT) compiler, meaning that before you deploy your TorchScript code, you go through an explicit compile step to convert a standard TorchScript program into a module targeting a TensorRT engine.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    TensorRT Node for ComfyUI

    TensorRT Node for ComfyUI

    Enables the best performance on NVIDIA RTX Graphics Cards

    ComfyUI_TensorRT is an extension that lets ComfyUI run AI inference through NVIDIA’s TensorRT, aiming to get faster, more efficient execution on supported GPUs. It bridges the gap between ComfyUI’s flexible, node-based workflows and TensorRT’s highly optimized engine format. The result is that complex diffusion or image-processing graphs can be accelerated without the user having to rewrite the pipeline. The repo typically includes instructions for converting models to TensorRT engines and for wiring those engines into ComfyUI nodes. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    TensorRT Backend For ONNX

    TensorRT Backend For ONNX

    ONNX-TensorRT: TensorRT backend for ONNX

    Parses ONNX models for execution with TensorRT. Development on the main branch is for the latest version of TensorRT 8.4.1.5 with full dimensions and dynamic shape support. For previous versions of TensorRT, refer to their respective branches. Building INetwork objects in full dimensions mode with dynamic shape support requires calling the C++ and Python API. Current supported ONNX operators are found in the operator support matrix.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Earn up to 16% annual interest with Nexo. Icon
    Earn up to 16% annual interest with Nexo.

    Access competitive interest rates on your digital assets.

    Generate interest, borrow against your crypto, and trade a range of cryptocurrencies — all in one platform. Geographic restrictions, eligibility, and terms apply.
    Get started with Nexo.
  • 5
    AimAhead

    AimAhead

    The fastest AI powered Aimbot

    ...It captures the screen, processes the image through a selected AI model to detect enemies, and then aims towards them. Optimized for NVIDIA graphics cards, AimAhead converts ONNX models to TensorRT engine files for enhanced performance, achieving between 100 to 200 cycles per second depending on the model used.
    Downloads: 182 This Week
    Last Update:
    See Project
  • 6
    TokenSpeed

    TokenSpeed

    TokenSpeed is a speed-of-light LLM inference engine

    TokenSpeed is an LLM inference engine designed for high-performance production agent workloads. It aims to combine TensorRT-LLM-level speed with vLLM-level usability, making it relevant for teams that need fast generation without sacrificing developer ergonomics. The project is focused on the specific needs of agentic systems, where latency, throughput, and efficient scheduling matter across many short or tool-heavy requests. It builds on ideas and components from the broader open-source inference ecosystem while presenting its own execution stack. ...
    Downloads: 9 This Week
    Last Update:
    See Project
  • 7
    WhisperLive

    WhisperLive

    A nearly-live implementation of OpenAI's Whisper

    ...It runs as a server–client system in which the server hosts a Whisper backend and clients stream audio to be transcribed with very low delay. The project supports multiple inference backends, including Faster-Whisper, NVIDIA TensorRT, and OpenVINO, allowing you to target GPUs and different CPU architectures efficiently. It can handle microphone input, pre-recorded audio files, and network streams such as RTSP and HLS, making it flexible for live events, monitoring, or accessibility workflows. Configuration options let you control the number of clients, maximum connection time, and threading behavior so the server can be tuned for different deployment environments. ...
    Downloads: 9 This Week
    Last Update:
    See Project
  • 8
    NVIDIA Model Optimizer

    NVIDIA Model Optimizer

    A unified library of SOTA model optimization techniques

    ...It supports a wide range of model types, including large language models, diffusion models, and vision-language models, and integrates with deployment frameworks such as TensorRT and vLLM. By providing standardized workflows and APIs, it enables developers to experiment with different optimization strategies and select the best approach for their use case.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    CUDA Containers for Edge AI & Robotics

    CUDA Containers for Edge AI & Robotics

    Machine Learning Containers for NVIDIA Jetson and JetPack-L4T

    ...The repository contains container configurations that package the latest AI frameworks and dependencies optimized for Jetson hardware. These containers simplify the deployment of complex machine learning environments by bundling libraries such as CUDA, TensorRT, and deep learning frameworks into reproducible container images. The project is particularly useful for developers building edge AI and robotics systems that rely on GPU-accelerated inference and real-time computer vision. By using containerized environments, developers can ensure that their applications run consistently across different Jetson platforms and JetPack versions. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Go from Code to Production URL in Seconds Icon
    Go from Code to Production URL in Seconds

    Cloud Run deploys apps in any language instantly. Scales to zero. Pay only when code runs.

    Skip the Kubernetes configs. Cloud Run handles HTTPS, scaling, and infrastructure automatically. Two million requests free per month.
    Try it free
  • 10
    ChatGLM3

    ChatGLM3

    ChatGLM3 series: Open Bilingual Chat LLMs | Open Source Bilingual Chat

    ...The repo ships Python APIs, CLI and web demos (Gradio/Streamlit), an OpenAI-format API server, and a compact fine-tuning kit. Quantization (4/8-bit), CPU/MPS support, and accelerator backends (TensorRT-LLM, OpenVINO, chatglm.cpp) enable lightweight local or edge deployment.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    Ultralytics

    Ultralytics

    Ultralytics YOLO

    Ultralytics is a comprehensive computer vision framework that provides state-of-the-art implementations of the YOLO (You Only Look Once) family of models, enabling developers to perform tasks such as object detection, segmentation, classification, tracking, and pose estimation within a unified system. It is designed to be fast, accurate, and easy to use, offering both command-line and Python-based interfaces for training, validation, and deployment of machine learning models. The framework...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 12
    Infinity

    Infinity

    Low-latency REST API for serving text-embeddings

    Infinity is a high-throughput, low-latency REST API for serving vector embeddings, supporting all sentence-transformer models and frameworks. Infinity is developed under MIT License. Infinity powers inference behind Gradient.ai and other Embedding API providers.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    Mooncake

    Mooncake

    Mooncake is the serving platform for Kimi

    Mooncake is an open-source infrastructure platform designed to optimize large language model serving by focusing on efficient management and transfer of model data and KV cache. The platform was originally developed as part of the serving infrastructure for the Kimi large language model system. Its architecture centers on a high-performance transfer engine that provides unified data transfer across different storage and networking technologies. This engine enables efficient movement of...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    OneFlow

    OneFlow

    OneFlow is a deep learning framework designed to be user-friendly

    OneFlow is a deep learning framework designed to be user-friendly, scalable and efficient. An extension for OneFlow to target third-party compiler, such as XLA, TensorRT and OpenVINO etc.CUDA runtime is statically linked into OneFlow. OneFlow will work on a minimum supported driver, and any driver beyond. For more information. Distributed performance (efficiency) is the core technical difficulty of the deep learning framework. OneFlow focuses on performance improvement and heterogeneous distributed expansion. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 15
    AI-Aimbot

    AI-Aimbot

    CS2, Valorant, Fortnite, APEX, every game

    AI-Aimbot is a computer vision project that demonstrates how artificial intelligence can be used to automatically identify and target opponents in video games. The system uses an object detection model based on the YOLOv5 architecture to detect human-shaped characters in gameplay screenshots or video frames. Once a target is identified, the program automatically adjusts the player’s aim toward the detected target, effectively automating the aiming process in first-person shooter games. The...
    Downloads: 2,473 This Week
    Last Update:
    See Project
  • 16
    CLIP-as-service

    CLIP-as-service

    Embed images and sentences into fixed-length vectors

    CLIP-as-service is a low-latency high-scalability service for embedding images and text. It can be easily integrated as a microservice into neural search solutions. Serve CLIP models with TensorRT, ONNX runtime and PyTorch w/o JIT with 800QPS[*]. Non-blocking duplex streaming on requests and responses, designed for large data and long-running tasks. Horizontally scale up and down multiple CLIP models on single GPU, with automatic load balancing. Easy-to-use. No learning curve, minimalist design on client and server. Intuitive and consistent API for image and sentence embedding. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    pipeless

    pipeless

    A computer vision framework to create and deploy apps in minutes

    Pipeless is an open-source computer vision framework to create and deploy applications without the complexity of building and maintaining multimedia pipelines. It ships everything you need to create and deploy efficient computer vision applications that work in real-time in just minutes. Pipeless is inspired by modern serverless technologies. It provides the development experience of serverless frameworks applied to computer vision. You provide some functions that are executed for new...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 18
    enhancr

    enhancr

    Video Frame Interpolation & Super Resolution using NVIDIA's TensorRT

    ...The GUI was designed to provide a stunning experience powered by state-of-the-art technologies without feeling clunky and outdated like other alternatives. It features blazing-fast TensorRT inference by NVIDIA, which can speed up AI processes significantly. Pre-packaged, without the need to install Docker or WSL (Windows Subsystem for Linux) - and NCNN inference by Tencent which is lightweight and runs on NVIDIA, AMD and even Apple Silicon - in contrast to the mammoth of an inference PyTorch is, which only runs on NVIDIA GPUs.
    Downloads: 19 This Week
    Last Update:
    See Project
  • 19
    FasterTransformer

    FasterTransformer

    Transformer related optimization, including BERT, GPT

    ...FasterTransformer is particularly focused on inference workloads, where it significantly improves performance compared to standard framework implementations. Although development has transitioned toward TensorRT-LLM, the project remains an important reference for understanding optimized transformer execution.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    AI Models

    AI Models

    A repository of trained models

    All models (at least currently) are supported by chaiNNer, an upscaling GUI that allows for both very simple and very complex tasks to be completed in a nice manner where you "chain" nodes together. Highly recommended for images. If you're looking to upscale videos using the models then use enhancr simply due to the fact that it supports TensorRT, which will allow you to upscale videos at incredible speeds! The GUI is one of the best looking applications out there and is personally my go to option. While yes builds are paid, it is well worth your money and beats any other GUI out there currently.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    Hello AI World

    Hello AI World

    Guide to deploying deep-learning inference networks

    ...In just a couple of hours, you can have a set of deep learning inference demos up and running for realtime image classification and object detection on your Jetson Developer Kit with JetPack SDK and NVIDIA TensorRT. The tutorial focuses on networks related to computer vision, and includes the use of live cameras. You’ll also get to code your own easy-to-follow recognition program in Python or C++, and train your own DNN models onboard Jetson with PyTorch. Ready to dive into deep learning? It only takes two days. We’ll provide you with all the tools you need, including easy to follow guides, software samples such as TensorRT code, and even pre-trained network models including ImageNet and DetectNet examples. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 22
    YOLOX

    YOLOX

    YOLOX is a high-performance anchor-free YOLO, exceeding yolov3~v5

    YOLOX is a high-performance anchor-free YOLO, exceeding yolov3~v5 with MegEngine, ONNX, TensorRT, ncnn, and OpenVINO supported. YOLOX is an anchor-free version of YOLO, with a simpler design but better performance! It aims to bridge the gap between research and industrial communities. Prepare your own dataset with images and labels first. For labeling images, you can use tools like Labelme or CVAT. One more thing worth noting is that you should also implement pull_item and load_anno method for the Mosiac and MixUp augmentations. ...
    Downloads: 13 This Week
    Last Update:
    See Project
  • 23
    TensorRT Pro

    TensorRT Pro

    C++ library based on tensorrt integration

    High-level interface for C++/Python. Simplify the implementation of the custom plugin. And serialization and deserialization have been encapsulated for easier usage. Simplify the compilation of fp32, fp16 and int8 for facilitating the deployment with C++/Python in server or embedded device. Models ready for use also with examples are RetinaFace, Scrfd, YoloV5, YoloX, Arcface, AlphaPose, CenterNet and DeepSORT(C++).
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    Hugging Face Transformer

    Hugging Face Transformer

    CPU/GPU inference server for Hugging Face transformer models

    ...You will usually get from 2X to 4X faster inference compared to vanilla Pytorch. It's cool! However, if you want the best in class performances on GPU, there is only a single possible combination: Nvidia TensorRT and Triton. You will usually get 5X faster inference compared to vanilla Pytorch.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    Downloads: 2 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • Next