Integrate cutting-edge LLM technology quickly and easily into your app
Efficient Triton Kernels for LLM Training
Research project. A Memory solution for users, teams, and applications
FlashInfer: Kernel Library for LLM Serving
Secure, kernel-enforced sandbox CLI and SDKs for AI agents
TT-NN operator library, and TT-Metalium low level kernel programming
Burn is a new comprehensive dynamic Deep Learning Framework
A RWKV management and startup tool, full automation, only 8MB
Clean and efficient FP8 GEMM kernels with fine-grained scaling
Open source solution that can meet the requirements of workloads
Large-Scale Agentic RL for High-Performance CUDA Kernel Generation
Training neural networks on Apple Neural Engine via APIs
An experimental version of DeepSeek model
Tool that provides interactive visualizations for large embeddings
The Compute Library is a set of computer vision and machine learning
FlashMLA: Efficient Multi-head Latent Attention Kernels
Deep and Machine Learning for Microscopy
C++ library for high performance inference on NVIDIA GPUs
Low-latency AI inference engine optimized for mobile devices
Toolkit for making machine learning and data analysis applications
Library for efficiently connecting and optimizing teams of AI agents
A Powerful Native Multimodal Model for Image Generation
oneAPI Deep Neural Network Library (oneDNN)
How to optimize some algorithm in cuda
Deepnote is a drop-in replacement for Jupyter