Integrate cutting-edge LLM technology quickly and easily into your app
Efficient Triton Kernels for LLM Training
Research project. A Memory solution for users, teams, and applications
FlashInfer: Kernel Library for LLM Serving
TT-NN operator library, and TT-Metalium low level kernel programming
Burn is a new comprehensive dynamic Deep Learning Framework
Secure, kernel-enforced sandbox CLI and SDKs for AI agents
A RWKV management and startup tool, full automation, only 8MB
Open source solution that can meet the requirements of workloads
Clean and efficient FP8 GEMM kernels with fine-grained scaling
Large-Scale Agentic RL for High-Performance CUDA Kernel Generation
Training neural networks on Apple Neural Engine via APIs
An experimental version of DeepSeek model
FlashMLA: Efficient Multi-head Latent Attention Kernels
The Compute Library is a set of computer vision and machine learning
C++ library for high performance inference on NVIDIA GPUs
Tool that provides interactive visualizations for large embeddings
A Powerful Native Multimodal Model for Image Generation
Deep and Machine Learning for Microscopy
Library for efficiently connecting and optimizing teams of AI agents
How to optimize some algorithm in cuda
oneAPI Deep Neural Network Library (oneDNN)
Toolkit for making machine learning and data analysis applications
The easiest way to use Ollama in .NET
Automate native Android apps with AI using accessibility APIs