Port of OpenAI's Whisper model in C/C++
Run Local LLMs on Any Device. Open-source
FlashInfer: Kernel Library for LLM Serving
User-friendly AI Interface
The free, Open Source alternative to OpenAI, Claude and others
Serve, optimize and scale PyTorch models in production
A library for accelerating Transformer models on NVIDIA GPUs
A high-throughput and memory-efficient inference and serving engine
ONNX Runtime: cross-platform, high performance ML inferencing
Run local LLMs like llama, deepseek, kokoro etc. inside your browser
Ready-to-use OCR with 80+ supported languages
20+ high-performance LLMs with recipes to pretrain, finetune at scale
Connect home devices into a powerful cluster to accelerate LLM
AI interface for tinkerers (Ollama, Haystack RAG, Python)
Protect and discover secrets using Gitleaks
Framework which allows you transform your Vector Database
Trainable, memory-efficient, and GPU-friendly PyTorch reproduction
LMDeploy is a toolkit for compressing, deploying, and serving LLMs
Build your chatbot within minutes on your favorite device
Fast inference engine for Transformer models
Pytorch domain library for recommendation systems
Scripts for fine-tuning Meta Llama3 with composable FSDP & PEFT method
PArallel Distributed Deep LEarning: Machine Learning Framework
MNN is a blazing fast, lightweight deep learning framework
Deep Learning API and Server in C++14 support for Caffe, PyTorch