Real-time NVIDIA GPU dashboard
157 models, 30 providers, one command to find what runs on hardware
How to optimize some algorithm in cuda
LightLLM is a Python-based LLM (Large Language Model) inference
A high-quality rapid TTS voice cloning model
Running a big model on a small laptop
Easily compute clip embeddings and build a clip retrieval system
Sharp Monocular Metric Depth in Less Than a Second
FlashMLA: Efficient Multi-head Latent Attention Kernels
Multi-lingual large voice generation model, providing inference
Fast ML inference & training for ONNX models in Rust
Large-Scale Agentic RL for High-Performance CUDA Kernel Generation
The best ChatGPT that $100 can buy
Deep learning optimization library: makes distributed training easy
Mooncake is the serving platform for Kimi
Calculate token/s & GPU memory requirement for any LLM
Libraries for optimizing AI models, inference speed, and GPU usage
Point cloud diffusion for 3D model synthesis
Scaled-YOLOv4: Scaling Cross Stage Partial Network
Generative Adversarial Networks for Efficient and High Fidelity Speech
Fast, modular reference implementation of Instance Segmentation
Toolkit for efficient experimentation with Speech Recognition
OpenAI’s open-weight 120B model optimized for reasoning and tooling
NVFP4 DiffusionGemma model for fast multimodal text generation
Tiny pre-trained IBM model for multivariate time series forecasting