Solve puzzles. Learn CUDA
Large-Scale Agentic RL for High-Performance CUDA Kernel Generation
How to optimize some algorithm in cuda
Machine Learning Containers for NVIDIA Jetson and JetPack-L4T
Prevent PyTorch's `CUDA error: out of memory` in just 1 line of code
GPU accelerated decision optimization
Our first fully AI generated deep learning system
Fast and memory-efficient exact attention
Package and deploy machine learning models using Docker containers
High-Resolution Image Synthesis with Latent Diffusion Models
Nexa SDK is a comprehensive toolkit for supporting ONNX and GGML
Simplest working implementation of Stylegan2
Geometric deep learning extension library for PyTorch
Generate audiobooks from e-books
A set of Docker images for training and serving models in TensorFlow
Low-latency REST API for serving text-embeddings
Trainable, memory-efficient, and GPU-friendly PyTorch reproduction
Data manipulation and transformation for audio signal processing
Synchronized Translation for Videos
Stable Diffusion WebUI optimized for AMD GPUs with editing tools
Hackable and optimized Transformers building blocks
Unified Model Serving Framework
A simple native web interface that uses ChatTTS to synthesize text
Trainable models and NN optimization tools
InvokeAI is a leading creative engine for Stable Diffusion models