Large-Scale Agentic RL for High-Performance CUDA Kernel Generation
How to optimize some algorithm in cuda
Machine Learning Containers for NVIDIA Jetson and JetPack-L4T
Solve puzzles. Learn CUDA
Our first fully AI generated deep learning system
Package and deploy machine learning models using Docker containers
Prevent PyTorch's `CUDA error: out of memory` in just 1 line of code
Self-host the powerful Chatterbox TTS model
A high-throughput and memory-efficient inference and serving engine
Fast Python collaborative filtering for implicit feedback datasets
Stable Diffusion built-in to Blender
Geometric deep learning extension library for PyTorch
Apple Silicon (MLX) port of Karpathy's autoresearch
High-Resolution Image Synthesis with Latent Diffusion Models
Nexa SDK is a comprehensive toolkit for supporting ONNX and GGML
Jittor is a high-performance deep learning framework
Fast and memory-efficient exact attention
A lightweight vLLM implementation built from scratch
A Python library for learning and evaluating knowledge graph embedding
Universal LLM Deployment Engine with ML Compilation
Interface for OuteTTS models
Stable Diffusion WebUI optimized for AMD GPUs with editing tools
Generate audiobooks from e-books
A simple native web interface that uses ChatTTS to synthesize text
An experimental version of DeepSeek model