C++ Implementation of PyTorch Tutorials for Everyone
Enabling PyTorch on Google TPU
On-device AI across mobile, embedded and edge for PyTorch
PyTorch/TorchScript/FX compiler for NVIDIA GPUs using TensorRT
ONNX Runtime: cross-platform, high performance ML inferencing
Fast Multimodal LLM on Mobile Devices
INT4/INT5/INT8 and FP16 inference on CPU for RWKV language model
OpenVINO™ Toolkit repository
MLX: An array framework for Apple silicon
CodeGeeX: An Open Multilingual Code Generation Model (KDD 2023)
LiteRT, successor to TensorFlow Lite
VMZ: Model Zoo for Video Modeling
Lightning fast C++/CUDA neural network framework
Deep Learning API and Server in C++14 support for Caffe, PyTorch
FAIR Sequence Modeling Toolkit 2
FlashMLA: Efficient Multi-head Latent Attention Kernels
OneFlow is a deep learning framework designed to be user-friendly
Locally run an Instruction-Tuned Chat-Style LLM
Transformer related optimization, including BERT, GPT
Guide to deploying deep-learning inference networks
C++ library based on tensorrt integration
Fast and user-friendly runtime for transformer inference
A domain specific language to express machine learning workloads
Open deep learning compiler stack for cpu, gpu
A low code unified framework for computer vision and deep learning