Performance meets Productivity
Large-Scale Agentic RL for High-Performance CUDA Kernel Generation
The CUDA target for Numba
How to optimize some algorithm in cuda
A NumPy-compatible array library accelerated by CUDA
Machine Learning Containers for NVIDIA Jetson and JetPack-L4T
The best AI Aimbot for Fortnite, Valorant, CS2, R6, COD, Apex, & more
A Python framework for accelerated simulation, data generation
Solve puzzles. Learn CUDA
Our first fully AI generated deep learning system
Package and deploy machine learning models using Docker containers
Rembg is a tool to remove images background
Prevent PyTorch's `CUDA error: out of memory` in just 1 line of code
Development repository for the Triton language and compiler
Self-host the powerful Chatterbox TTS model
A high-throughput and memory-efficient inference and serving engine
Fast Python collaborative filtering for implicit feedback datasets
Geometric deep learning extension library for PyTorch
Apple Silicon (MLX) port of Karpathy's autoresearch
Stable Diffusion built-in to Blender
High-Resolution Image Synthesis with Latent Diffusion Models
Nexa SDK is a comprehensive toolkit for supporting ONNX and GGML
Jittor is a high-performance deep learning framework
Fast and memory-efficient exact attention
A lightweight vLLM implementation built from scratch