Torch-TensorRT - Browse /v2.10.0 at SourceForge.net

The interactive file manager requires Javascript. Please enable it or use sftp or scp.
You may still browse the files here.

Name	Modified	Size	InfoDownloads / Week
Parent folder
libtorchtrt-2.10.0-tensorrt10.14.1-cuda130-libtorch2.10.0-x86_64-linux.tar	2026-02-06	9.0 MB	0
libtorchtrt-2.10.0-tensorrt10.14.1-cuda130-libtorch2.10.0-x86_64-windows.zip	2026-02-06	844.1 kB	8
libtorchtrt-2.10.0-tensorrt10.14.1-cuda130-libtorch2.10.0-aarch64-linux.tar	2026-02-06	8.5 MB	0
torch_tensorrt-2.10.0-cp313-cp313-manylinux_2_28_x86_64.whl	2026-02-05	3.7 MB	0
torch_tensorrt-2.10.0-cp313-cp313-win_amd64.whl	2026-02-05	1.9 MB	0
torch_tensorrt-2.10.0-cp312-cp312-win_amd64.whl	2026-02-05	1.9 MB	0
torch_tensorrt-2.10.0-cp313-cp313-manylinux_2_28_aarch64.whl	2026-02-05	3.5 MB	0
torch_tensorrt-2.10.0-cp312-cp312-manylinux_2_28_x86_64.whl	2026-02-05	3.7 MB	0
torch_tensorrt-2.10.0-cp311-cp311-win_amd64.whl	2026-02-05	1.9 MB	0
torch_tensorrt-2.10.0-cp312-cp312-manylinux_2_28_aarch64.whl	2026-02-05	3.5 MB	0
torch_tensorrt-2.10.0-cp311-cp311-manylinux_2_28_aarch64.whl	2026-02-05	3.5 MB	0
torch_tensorrt-2.10.0-cp311-cp311-manylinux_2_28_x86_64.whl	2026-02-05	3.7 MB	0
torch_tensorrt-2.10.0-cp310-cp310-manylinux_2_28_x86_64.whl	2026-02-05	3.7 MB	0
torch_tensorrt-2.10.0-cp310-cp310-win_amd64.whl	2026-02-05	1.9 MB	0
torch_tensorrt-2.10.0-cp310-cp310-manylinux_2_28_aarch64.whl	2026-02-05	3.5 MB	0
README.md	2026-02-04	15.1 kB	0
Torch-TensorRT v2.10.0 source code.tar.gz	2026-02-04	67.8 MB	0
Torch-TensorRT v2.10.0 source code.zip	2026-02-04	74.2 MB	0
Totals: 18 Items		196.7 MB	8

Torch-TensorRT 2.10.0 Linux x86-64 and Windows targets

PyTorch 2.10, CUDA 12.9, 13.0, TensorRT 10.14, Python 3.10~3.13

Torch-TensorRT Wheels are available:

x86-64 Linux and Windows: CUDA 13.0 + Python 3.10-3.13 is Available via PyPI

https://pypi.org/project/torch-tensorrt/

CUDA 12.9/13.0 + Python 3.10-3.13 is also Available via Pytorch Index

https://download.pytorch.org/whl/torch-tensorrt

aarch64 SBSA Linux and Jetson Thor CUDA 13.0 + Python 3.10–3.13 + Torch 2.10 + TensorRT 10.14

Available via PyPI: https://pypi.org/project/torch-tensorrt/
Available via PyTorch index: https://download.pytorch.org/whl/torch-tensorrt

Jetson Orin

no torch_tensorrt 2.9/2.10 release for Jetson Orin
please continue using torch_tensorrt 2.8 release

Important Changes

Retracing is enabled as the default behavior of saving a compiled graph module with torch_tensorrt.save. Torch-TensorRT re-exports the graph using torch.export.export(strict=False) to save it. This preserves the completeness of the output FX Graph and fills in metadata.

New Features

LLM improvements

The run_llm script now supports compiling models that have previously been quantized using the TensorRT Model Optimizer Toolkit and uploaded to HuggingFace.

Now we support the following inference scenarios:

Standard high precision model, directly compile and run inference in fp16/bf16 via torch_tensorrt Autocast

python run_llm.py --model Qwen/Qwen2.5-0.5B-Instruct \
--prompt "What is parallel programming?" \
--model_precision FP16 --num_tokens 128 \
--cache static_v2 --enable_pytorch_run

Standard high precision model, use TensorRT Model optimizer to quantize and compile on device and then run inference in fp8/nvfp4 precision

python run_llm.py --model Qwen/Qwen2.5-0.5B-Instruct  \
--prompt "What is parallel programming?"  \
--model_precision FP16 \
--quant_format fp8 --num_tokens 128 \
--cache static_v2 --enable_pytorch_run

Previously quantized model uploaded to Huggingface, directly compile and run inference infp8/nvfp4

python run_llm.py --model nvidia/Qwen3-8B-FP8 \
--prompt "What is parallel programming?"  \
--model_precision FP16 \
--quant_format fp8 \
--num_tokens 128 \
--cache static_v2 --enable_pytorch_run

Notes: --model_precision this is mandatory, it is used to tell llm tool what is the model's precision --quant_format this is optional, it is only used for quantized model inference for the pre-quantized modelopt checkpoint, this is to tell

Improvements to Engine Caching

Before this release, since weight-stripped engines can be refitted only once due to the limitation of TensorRT (<10.14), we cached weighted engines to make sure Engine Caching feature work properly, which occupied unnecessary hard disk. Since this release, if users install TensorRT >= 10.14, engine caching will only save weight-stripped engines on disk regardless of compilation_settings.strip_engine_weights, and then, when users pull out the cached engine, it will be automatically refitted and kept refittable all the time, which means compiled TRT modules can be refitted multiple times with the function refit_module_weights(). e.g.:

for _ in range(3):
    trt_gm = refit_module_weights(trt_gm, exp_program)

Autocast

Before TensorRT 10.12, TensorRT would implicitly pick kernels for layers that result in the best performance (i.e., weak typing). Weak typing behavior is deprecated in newer TensorRT versions, but it is a good way to maximize performance. Therefore, in this release, we want to provide a solution for users to enable mixed precision behavior like weak typing, which is called Autocast.

Unlike PyTorch Autocast, Torch-TensorRT Autocast is a rule-based autocast, which intelligently selects nodes to keep in FP32 precision to maintain model accuracy while benefiting from reduced precision on the rest of the nodes. Torch-TensorRT Autocast also supports users to specify which nodes to exclude from Autocast, considering some nodes might be more sensitive to affecting accuracy. In addition, Torch-TensorRT Autocast can cooperate with PyTorch Autocast, allowing users to use both PyTorch Autocast and Torch-TensorRT Autocast in the same model. Torch-TensorRT Autocast respects the precision of the nodes within PyTorch Autocast context. Please refer to Torch-TRT mixed precision doc for more details.

Compilation Resource Management

Compiling large models on limited-resource hardware is challenging. Before this release, to successfully compile the FLUX model (24GB), we needed at least 128GB of host memory, which is >5x of the model size. This huge consumption limited Torch-TensorRT's capability to compile large models with limited resources.

Host Memory Optimization

Introduce the feature of trimming malloc memory, thus reducing peak host memory consumption. bash export TORCHTRT_ENABLE_BUILDER_MALLOC_TRIM=1 python example.py By using the environment variable, the peak memory usage can be reduced to 3x.

If the cuda memory is sufficient, you can disable by setting offload_module_to_cpu=False to further reduce the host memory to 2x. More detailed explanation can be found here: https://github.com/pytorch/TensorRT/blob/main/docsrc/contributors/resource_management.rst

Resource Aware Partitioner

A new feature called Resource Aware Partitioner is introduced to address situations where the available host memory is smaller than 3x of the model size. In compilation settings, set enable_resource_partitioning=True and (optionally) set a cpu_memory_budget, the partitioner will automatically shard the graph such that the compilation resource consumption can fit into very constrained resources (<2x) without sacrificing performance and accuracy. Example usage can be found here: https://github.com/pytorch/TensorRT/blob/b7ae84fc020b1f0428b019d39c6284c7d52626e7/examples/dynamo/low_cpu_memory_compilation.py

Debugger

TensorRT API Capture

In this release, we have added the TensorRT API Capture and Replay feature which streamlines the process of reproducing and debugging issues within your model. It allows you to record the engine-building phase of your model and later replay the engine-build steps.

Capture: The capture feature is by default disabled. You can enable the capture feature via environment variable: TORCHTRT_ENABLE_TENSORRT_API_CAPTURE=1 TORCHTRT_ENABLE_TENSORRT_API_CAPTURE=1 python your_model_test.py You should see the shim.json and shim.bin generated after enable the capture.

Replay: Use tensorrt_player tool to replay the captured trt engine build without the original framework tensorrt_player -j /absolute/path/to/shim.json -o /absolute/path/to/output_engine

Limitations: -This feature is currently restricted to Linux(x86-64 and aarch64) only. -This feature is currently restricted to capture and record 1 trt engine only, in case you have graph break, there are multiple engines are built, only the first engine is recorded. In the next release we will support multiple engines are all recorded in the same bin file.

You can see more details in https://docs.pytorch.org/TensorRT/getting_started/capture_and_replay.html?highlight=capture+replay#

What's Changed

upgrade pytorch from 2.9.0.dev to 2.10.0.dev by @lanluo-nvidia in https://github.com/pytorch/TensorRT/pull/3826
fix a few ci issues by @lanluo-nvidia in https://github.com/pytorch/TensorRT/pull/3833
fix: Fix a bug with dynamic shape validation in MTMM by @peri044 in https://github.com/pytorch/TensorRT/pull/3837
fix test case error by @lanluo-nvidia in https://github.com/pytorch/TensorRT/pull/3835
remove nox by @lanluo-nvidia in https://github.com/pytorch/TensorRT/pull/3832
Added warnings if the model is in training mode by @cehongwang in https://github.com/pytorch/TensorRT/pull/3676
fix: fix signature mismatch issue for non-tensor input in plugin converter by @bowang007 in https://github.com/pytorch/TensorRT/pull/3788
Lluo/cherrypick dlfw changes to main by @lanluo-nvidia in https://github.com/pytorch/TensorRT/pull/3844
TRT-LLM installation tool by @apbose in https://github.com/pytorch/TensorRT/pull/3829
move is_thor() is_tegra_platform() from dynamo.utils to utils to avoid circular import by @lanluo-nvidia in https://github.com/pytorch/TensorRT/pull/3851
Moe support by @cehongwang in https://github.com/pytorch/TensorRT/pull/3811
fix test error in main by @lanluo-nvidia in https://github.com/pytorch/TensorRT/pull/3856
add capture and replay feature in tensorrt by @lanluo-nvidia in https://github.com/pytorch/TensorRT/pull/3849
cherry pick: fix pkg_zip nested zip issue from 2.9 to main by @lanluo-nvidia in https://github.com/pytorch/TensorRT/pull/3862
move pwd under Linux branch by @lanluo-nvidia in https://github.com/pytorch/TensorRT/pull/3864
make torchvision as optional dependency for torchscript tests by @lanluo-nvidia in https://github.com/pytorch/TensorRT/pull/3872
add nccl 129 support by @lanluo-nvidia in https://github.com/pytorch/TensorRT/pull/3882
add openblas lib support for Jetpack by @lanluo-nvidia in https://github.com/pytorch/TensorRT/pull/3885
Fix the broken CI due to nvidia-cuda-runtime-cu13 issue by @lanluo-nvidia in https://github.com/pytorch/TensorRT/pull/3884
Fix the coverage report issues by @lanluo-nvidia in https://github.com/pytorch/TensorRT/pull/3874
addresses the case when shape of upsample tensor contains ITensor by @apbose in https://github.com/pytorch/TensorRT/pull/3841
DLFW 25.11 changes by @apbose in https://github.com/pytorch/TensorRT/pull/3889
fix aoti graph break by @lanluo-nvidia in https://github.com/pytorch/TensorRT/pull/3892
fix L0 RTX test issues by @lanluo-nvidia in https://github.com/pytorch/TensorRT/pull/3894
upgrade tensorrt and tensorrt_rtx by @lanluo-nvidia in https://github.com/pytorch/TensorRT/pull/3895
Fix a typo by @lanluo-nvidia in https://github.com/pytorch/TensorRT/pull/3900
cpu memory optimization rebased to main by @cehongwang in https://github.com/pytorch/TensorRT/pull/3868
fix ci broken issue: psutil not found error by @lanluo-nvidia in https://github.com/pytorch/TensorRT/pull/3903
fix: Small fix to the evaluator set + updating some test infra by @narendasan in https://github.com/pytorch/TensorRT/pull/3904
Fix Bugs by @leimao in https://github.com/pytorch/TensorRT/pull/3902
feat: Change default export behavior to re-export by @peri044 in https://github.com/pytorch/TensorRT/pull/3875
feat: Add support for SymFloat inputs and truediv converter by @peri044 in https://github.com/pytorch/TensorRT/pull/3911
chore(deps): bump monai from 1.5.0 to 1.5.1 in /tools/perf by @dependabot[bot] in https://github.com/pytorch/TensorRT/pull/3840
fix ci broken issue by @lanluo-nvidia in https://github.com/pytorch/TensorRT/pull/3931
force run all tests no matter previous is success or fail by @lanluo-nvidia in https://github.com/pytorch/TensorRT/pull/3930
fix: fix automatic plugin test issue by @bowang007 in https://github.com/pytorch/TensorRT/pull/3877
reenable back thor test by @lanluo-nvidia in https://github.com/pytorch/TensorRT/pull/3929
fix nspect vulnerability scan issue by @lanluo-nvidia in https://github.com/pytorch/TensorRT/pull/3936
Add normalization of cat operator argument into valid input by @apbose in https://github.com/pytorch/TensorRT/pull/3890
fix: Small fix to the evaluator set + updating some test infra by @narendasan in https://github.com/pytorch/TensorRT/pull/3938
chore: Deleting legacy CI infrastructure by @narendasan in https://github.com/pytorch/TensorRT/pull/3949
feat: Autocast by @zewenli98 in https://github.com/pytorch/TensorRT/pull/3878
Added the dynamic check in the validator by @cehongwang in https://github.com/pytorch/TensorRT/pull/3790
Updating codeowners and some components by @narendasan in https://github.com/pytorch/TensorRT/pull/3950
Cpu memory graph break by @cehongwang in https://github.com/pytorch/TensorRT/pull/3886
docgen: Fix building docs by switching to legacy pip resolver by @narendasan in https://github.com/pytorch/TensorRT/pull/3971
example: using nvrtc kernel for aot plugin by @bowang007 in https://github.com/pytorch/TensorRT/pull/3881
Update _MutableTorchTensorRTModule.py by @cehongwang in https://github.com/pytorch/TensorRT/pull/3970
feat: improve engine caching and fix bugs by @zewenli98 in https://github.com/pytorch/TensorRT/pull/3932
fix: add TRT version check for engine caching feature by @zewenli98 in https://github.com/pytorch/TensorRT/pull/3983
chore: Update lock file, was getting stuck and causing build issues f… by @narendasan in https://github.com/pytorch/TensorRT/pull/3948
Upgrade GitHub Actions to latest versions by @salmanmkc in https://github.com/pytorch/TensorRT/pull/3969
Upgrade GitHub Actions for Node 24 compatibility by @salmanmkc in https://github.com/pytorch/TensorRT/pull/3968
Add uv update workflow by @lanluo-nvidia in https://github.com/pytorch/TensorRT/pull/3986
Fix a few CI issues by @lanluo-nvidia in https://github.com/pytorch/TensorRT/pull/3987
cherry_pick to 2.10: filter out unsupported cuda versions by @lanluo-nvidia in https://github.com/pytorch/TensorRT/pull/3991
release cut for 2.10 by @lanluo-nvidia in https://github.com/pytorch/TensorRT/pull/3988
cherry pick 3946 by @lanluo-nvidia in https://github.com/pytorch/TensorRT/pull/4024
cherry pick 4003: support pre-quantized modelopt model in llm by @lanluo-nvidia in https://github.com/pytorch/TensorRT/pull/4025
release 2.10 fix by @lanluo-nvidia in https://github.com/pytorch/TensorRT/pull/4027
fix aarch64 wheel build issue by @lanluo-nvidia in https://github.com/pytorch/TensorRT/pull/4029
fix resource partitioner issue by @lanluo-nvidia in https://github.com/pytorch/TensorRT/pull/4028
skip llm test if modelopt is not installed by @lanluo-nvidia in https://github.com/pytorch/TensorRT/pull/4033
fix the layer info test failure and deal with potential segfault by @narendasan in https://github.com/pytorch/TensorRT/pull/4043
disable plugin in windows by @lanluo-nvidia in https://github.com/pytorch/TensorRT/pull/4046
Cherry Pick fused_rums_norm_lowering to release 2.10 by @cehongwang in https://github.com/pytorch/TensorRT/pull/4057
Cherry pick resource partitioner CI fix by @cehongwang in https://github.com/pytorch/TensorRT/pull/4058
guard generalized scatter import by @apbose in https://github.com/pytorch/TensorRT/pull/4066

New Contributors

@leimao made their first contribution in https://github.com/pytorch/TensorRT/pull/3902
@salmanmkc made their first contribution in https://github.com/pytorch/TensorRT/pull/3969

Full Changelog: https://github.com/pytorch/TensorRT/compare/v2.9.0...v2.10.0

Source: README.md, updated 2026-02-04

Torch-TensorRT Files

PyTorch/TorchScript/FX compiler for NVIDIA GPUs using TensorRT

Torch-TensorRT 2.10.0 Linux x86-64 and Windows targets

Torch-TensorRT Wheels are available:

Important Changes

New Features

LLM improvements

Standard high precision model, directly compile and run inference in fp16/bf16 via torch_tensorrt Autocast

Standard high precision model, use TensorRT Model optimizer to quantize and compile on device and then run inference in fp8/nvfp4 precision

Previously quantized model uploaded to Huggingface, directly compile and run inference infp8/nvfp4

Improvements to Engine Caching

Autocast

Compilation Resource Management

Host Memory Optimization

Resource Aware Partitioner

Debugger

TensorRT API Capture

What's Changed

New Contributors

Torch-TensorRT Files

PyTorch/TorchScript/FX compiler for NVIDIA GPUs using TensorRT

Get an email when there's a new version of Torch-TensorRT

Torch-TensorRT 2.10.0 Linux x86-64 and Windows targets

Torch-TensorRT Wheels are available:

Important Changes

New Features

LLM improvements

Standard high precision model, directly compile and run inference in fp16/bf16 via torch_tensorrt Autocast

Standard high precision model, use TensorRT Model optimizer to quantize and compile on device and then run inference in fp8/nvfp4 precision

Previously quantized model uploaded to Huggingface, directly compile and run inference infp8/nvfp4

Improvements to Engine Caching

Autocast

Compilation Resource Management

Host Memory Optimization

Resource Aware Partitioner

Debugger

TensorRT API Capture

What's Changed

New Contributors