tensorrt free download

Showing 27 open source projects for "tensorrt"

View related business solutions

Mac Clear Filters & Widen Search

$300 Free Credits for Your Google Cloud Projects
Start building on Google Cloud with $300 in free credits. No commitment, no credit card required until you're ready to scale.

Launch your next project with $300 in free Google Cloud credits—no strings attached. Test, build, and deploy without risk. Use your credits across the entire Google Cloud platform to find what works best for your needs. After your credits are used, continue with always-free tier services. Only pay when you're ready to scale. Sign up in minutes and start exploring.

Start Free Trial
Streamline Azure Security with Palo Alto Networks VM-Series
Centrally manage physical and virtualized firewalls with Panorama

Improve your security posture and reduce incident response time. Use the VM-Series to natively analyze Azure traffic and dynamically drive policy updates based on workload changes.

Learn more
1

TensorRT LLM

TensorRT LLM provides users with an easy-to-use Python API

TensorRT-LLM is an open-source high-performance inference library specifically designed to optimize and accelerate large language model deployment on NVIDIA GPUs. It provides a Python-based API built on top of PyTorch that allows developers to define, customize, and deploy LLMs efficiently across a variety of hardware configurations, from single GPUs to large multi-node clusters.

Downloads: 1 This Week

Last Update: 2026-04-16
See Project
2

Torch-TensorRT

PyTorch/TorchScript/FX compiler for NVIDIA GPUs using TensorRT

Torch-TensorRT is a compiler for PyTorch/TorchScript, targeting NVIDIA GPUs via NVIDIA’s TensorRT Deep Learning Optimizer and Runtime. Unlike PyTorch’s Just-In-Time (JIT) compiler, Torch-TensorRT is an Ahead-of-Time (AOT) compiler, meaning that before you deploy your TorchScript code, you go through an explicit compile step to convert a standard TorchScript program into a module targeting a TensorRT engine.

Downloads: 0 This Week

Last Update: 2026-05-20
See Project
3

TensorRT Node for ComfyUI

Enables the best performance on NVIDIA RTX Graphics Cards

ComfyUI_TensorRT is an extension that lets ComfyUI run AI inference through NVIDIA’s TensorRT, aiming to get faster, more efficient execution on supported GPUs. It bridges the gap between ComfyUI’s flexible, node-based workflows and TensorRT’s highly optimized engine format. The result is that complex diffusion or image-processing graphs can be accelerated without the user having to rewrite the pipeline. The repo typically includes instructions for converting models to TensorRT engines and for wiring those engines into ComfyUI nodes. ...

Downloads: 0 This Week

Last Update: 2025-10-30
See Project
4

TensorRT Backend For ONNX

ONNX-TensorRT: TensorRT backend for ONNX

Parses ONNX models for execution with TensorRT. Development on the main branch is for the latest version of TensorRT 8.4.1.5 with full dimensions and dynamic shape support. For previous versions of TensorRT, refer to their respective branches. Building INetwork objects in full dimensions mode with dynamic shape support requires calling the C++ and Python API. Current supported ONNX operators are found in the operator support matrix.

Downloads: 0 This Week

Last Update: 2026-03-25
See Project
Earn up to 16% annual interest with Nexo.
Access competitive interest rates on your digital assets.

Generate interest, borrow against your crypto, and trade a range of cryptocurrencies — all in one platform. Geographic restrictions, eligibility, and terms apply.

Get started with Nexo.
5

AimAhead

The fastest AI powered Aimbot

...It captures the screen, processes the image through a selected AI model to detect enemies, and then aims towards them. Optimized for NVIDIA graphics cards, AimAhead converts ONNX models to TensorRT engine files for enhanced performance, achieving between 100 to 200 cycles per second depending on the model used.

1 Review

Downloads: 182 This Week

Last Update: 2025-03-30
See Project
6

TokenSpeed

TokenSpeed is a speed-of-light LLM inference engine

TokenSpeed is an LLM inference engine designed for high-performance production agent workloads. It aims to combine TensorRT-LLM-level speed with vLLM-level usability, making it relevant for teams that need fast generation without sacrificing developer ergonomics. The project is focused on the specific needs of agentic systems, where latency, throughput, and efficient scheduling matter across many short or tool-heavy requests. It builds on ideas and components from the broader open-source inference ecosystem while presenting its own execution stack. ...

Downloads: 9 This Week

Last Update: 2 days ago
See Project
7

WhisperLive

A nearly-live implementation of OpenAI's Whisper

...It runs as a server–client system in which the server hosts a Whisper backend and clients stream audio to be transcribed with very low delay. The project supports multiple inference backends, including Faster-Whisper, NVIDIA TensorRT, and OpenVINO, allowing you to target GPUs and different CPU architectures efficiently. It can handle microphone input, pre-recorded audio files, and network streams such as RTSP and HLS, making it flexible for live events, monitoring, or accessibility workflows. Configuration options let you control the number of clients, maximum connection time, and threading behavior so the server can be tuned for different deployment environments. ...

Downloads: 9 This Week

Last Update: 3 days ago
See Project
8

NVIDIA Model Optimizer

A unified library of SOTA model optimization techniques

...It supports a wide range of model types, including large language models, diffusion models, and vision-language models, and integrates with deployment frameworks such as TensorRT and vLLM. By providing standardized workflows and APIs, it enables developers to experiment with different optimization strategies and select the best approach for their use case.

Downloads: 0 This Week

Last Update: 2026-05-13
See Project
9

CUDA Containers for Edge AI & Robotics

Machine Learning Containers for NVIDIA Jetson and JetPack-L4T

...The repository contains container configurations that package the latest AI frameworks and dependencies optimized for Jetson hardware. These containers simplify the deployment of complex machine learning environments by bundling libraries such as CUDA, TensorRT, and deep learning frameworks into reproducible container images. The project is particularly useful for developers building edge AI and robotics systems that rely on GPU-accelerated inference and real-time computer vision. By using containerized environments, developers can ensure that their applications run consistently across different Jetson platforms and JetPack versions. ...

Downloads: 0 This Week

Last Update: 2026-05-06
See Project
Go from Code to Production URL in Seconds
Cloud Run deploys apps in any language instantly. Scales to zero. Pay only when code runs.

Skip the Kubernetes configs. Cloud Run handles HTTPS, scaling, and infrastructure automatically. Two million requests free per month.

Try it free
10

ChatGLM3

ChatGLM3 series: Open Bilingual Chat LLMs | Open Source Bilingual Chat

...The repo ships Python APIs, CLI and web demos (Gradio/Streamlit), an OpenAI-format API server, and a compact fine-tuning kit. Quantization (4/8-bit), CPU/MPS support, and accelerator backends (TensorRT-LLM, OpenVINO, chatglm.cpp) enable lightweight local or edge deployment.

Downloads: 0 This Week

Last Update: 5 days ago
See Project
11

Ultralytics

Ultralytics YOLO

Ultralytics is a comprehensive computer vision framework that provides state-of-the-art implementations of the YOLO (You Only Look Once) family of models, enabling developers to perform tasks such as object detection, segmentation, classification, tracking, and pose estimation within a unified system. It is designed to be fast, accurate, and easy to use, offering both command-line and Python-based interfaces for training, validation, and deployment of machine learning models. The framework...

Downloads: 4 This Week

Last Update: 4 days ago
See Project
12

Infinity

Low-latency REST API for serving text-embeddings

Infinity is a high-throughput, low-latency REST API for serving vector embeddings, supporting all sentence-transformer models and frameworks. Infinity is developed under MIT License. Infinity powers inference behind Gradient.ai and other Embedding API providers.

Downloads: 0 This Week

Last Update: 2025-08-22
See Project
13

Mooncake

Mooncake is the serving platform for Kimi

Mooncake is an open-source infrastructure platform designed to optimize large language model serving by focusing on efficient management and transfer of model data and KV cache. The platform was originally developed as part of the serving infrastructure for the Kimi large language model system. Its architecture centers on a high-performance transfer engine that provides unified data transfer across different storage and networking technologies. This engine enables efficient movement of...

Downloads: 0 This Week

Last Update: 2026-05-24
See Project
14

OneFlow

OneFlow is a deep learning framework designed to be user-friendly

OneFlow is a deep learning framework designed to be user-friendly, scalable and efficient. An extension for OneFlow to target third-party compiler, such as XLA, TensorRT and OpenVINO etc.CUDA runtime is statically linked into OneFlow. OneFlow will work on a minimum supported driver, and any driver beyond. For more information. Distributed performance (efficiency) is the core technical difficulty of the deep learning framework. OneFlow focuses on performance improvement and heterogeneous distributed expansion. ...

Downloads: 1 This Week

Last Update: 2024-03-11
See Project
15

AI-Aimbot

CS2, Valorant, Fortnite, APEX, every game

AI-Aimbot is a computer vision project that demonstrates how artificial intelligence can be used to automatically identify and target opponents in video games. The system uses an object detection model based on the YOLOv5 architecture to detect human-shaped characters in gameplay screenshots or video frames. Once a target is identified, the program automatically adjusts the player’s aim toward the detected target, effectively automating the aiming process in first-person shooter games. The...

Downloads: 2,473 This Week

Last Update: 2026-03-15
See Project
16

CLIP-as-service

Embed images and sentences into fixed-length vectors

CLIP-as-service is a low-latency high-scalability service for embedding images and text. It can be easily integrated as a microservice into neural search solutions. Serve CLIP models with TensorRT, ONNX runtime and PyTorch w/o JIT with 800QPS[*]. Non-blocking duplex streaming on requests and responses, designed for large data and long-running tasks. Horizontally scale up and down multiple CLIP models on single GPU, with automatic load balancing. Easy-to-use. No learning curve, minimalist design on client and server. Intuitive and consistent API for image and sentence embedding. ...

Downloads: 0 This Week

Last Update: 2023-12-20
See Project
17

pipeless

A computer vision framework to create and deploy apps in minutes

Pipeless is an open-source computer vision framework to create and deploy applications without the complexity of building and maintaining multimedia pipelines. It ships everything you need to create and deploy efficient computer vision applications that work in real-time in just minutes. Pipeless is inspired by modern serverless technologies. It provides the development experience of serverless frameworks applied to computer vision. You provide some functions that are executed for new...

Downloads: 2 This Week

Last Update: 2024-02-23
See Project
18

enhancr

Video Frame Interpolation & Super Resolution using NVIDIA's TensorRT

...The GUI was designed to provide a stunning experience powered by state-of-the-art technologies without feeling clunky and outdated like other alternatives. It features blazing-fast TensorRT inference by NVIDIA, which can speed up AI processes significantly. Pre-packaged, without the need to install Docker or WSL (Windows Subsystem for Linux) - and NCNN inference by Tencent which is lightweight and runs on NVIDIA, AMD and even Apple Silicon - in contrast to the mammoth of an inference PyTorch is, which only runs on NVIDIA GPUs.

1 Review

Downloads: 19 This Week

Last Update: 2023-06-07
See Project
19

FasterTransformer

Transformer related optimization, including BERT, GPT

...FasterTransformer is particularly focused on inference workloads, where it significantly improves performance compared to standard framework implementations. Although development has transitioned toward TensorRT-LLM, the project remains an important reference for understanding optimized transformer execution.

Downloads: 0 This Week

Last Update: 2026-03-18
See Project
20

AI Models

A repository of trained models

All models (at least currently) are supported by chaiNNer, an upscaling GUI that allows for both very simple and very complex tasks to be completed in a nice manner where you "chain" nodes together. Highly recommended for images. If you're looking to upscale videos using the models then use enhancr simply due to the fact that it supports TensorRT, which will allow you to upscale videos at incredible speeds! The GUI is one of the best looking applications out there and is personally my go to option. While yes builds are paid, it is well worth your money and beats any other GUI out there currently.

Downloads: 0 This Week

Last Update: 2023-03-29
See Project
21

Hello AI World

Guide to deploying deep-learning inference networks

...In just a couple of hours, you can have a set of deep learning inference demos up and running for realtime image classification and object detection on your Jetson Developer Kit with JetPack SDK and NVIDIA TensorRT. The tutorial focuses on networks related to computer vision, and includes the use of live cameras. You’ll also get to code your own easy-to-follow recognition program in Python or C++, and train your own DNN models onboard Jetson with PyTorch. Ready to dive into deep learning? It only takes two days. We’ll provide you with all the tools you need, including easy to follow guides, software samples such as TensorRT code, and even pre-trained network models including ImageNet and DetectNet examples. ...

Downloads: 1 This Week

Last Update: 2022-08-03
See Project
22

YOLOX

YOLOX is a high-performance anchor-free YOLO, exceeding yolov3~v5

YOLOX is a high-performance anchor-free YOLO, exceeding yolov3~v5 with MegEngine, ONNX, TensorRT, ncnn, and OpenVINO supported. YOLOX is an anchor-free version of YOLO, with a simpler design but better performance! It aims to bridge the gap between research and industrial communities. Prepare your own dataset with images and labels first. For labeling images, you can use tools like Labelme or CVAT. One more thing worth noting is that you should also implement pull_item and load_anno method for the Mosiac and MixUp augmentations. ...

Downloads: 13 This Week

Last Update: 2022-08-01
See Project
23

TensorRT Pro

C++ library based on tensorrt integration

High-level interface for C++/Python. Simplify the implementation of the custom plugin. And serialization and deserialization have been encapsulated for easier usage. Simplify the compilation of fp32, fp16 and int8 for facilitating the deployment with C++/Python in server or embedded device. Models ready for use also with examples are RetinaFace, Scrfd, YoloV5, YoloX, Arcface, AlphaPose, CenterNet and DeepSORT(C++).

Downloads: 0 This Week

Last Update: 2022-08-19
See Project
24

Hugging Face Transformer

CPU/GPU inference server for Hugging Face transformer models

...You will usually get from 2X to 4X faster inference compared to vanilla Pytorch. It's cool! However, if you want the best in class performances on GPU, there is only a single possible combination: Nvidia TensorRT and Triton. You will usually get 5X faster inference compared to vanilla Pytorch.

Downloads: 0 This Week

Last Update: 2022-08-22
See Project
25

TensorRT_Rel

Downloads: 2 This Week

Last Update: 2018-09-18
See Project