Port of Facebook's LLaMA model in C/C++
Run serverless GPU workloads with fast cold starts on bare-metal
Easiest and laziest way for building multi-agent LLMs applications
A RWKV management and startup tool, full automation, only 8MB
A high-throughput and memory-efficient inference and serving engine
Fast inference engine for Transformer models
On-device Speech Recognition for Apple Silicon
Protect and discover secrets using Gitleaks
Gaussian processes in TensorFlow
PyTorch extensions for fast R&D prototyping and Kaggle farming
A general-purpose probabilistic programming system
MNN is a blazing fast, lightweight deep learning framework
lightweight, standalone C++ inference engine for Google's Gemma models
Unified Model Serving Framework
INT4/INT5/INT8 and FP16 inference on CPU for RWKV language model
Run local LLMs like llama, deepseek, kokoro etc. inside your browser
Deep Learning API and Server in C++14 support for Caffe, PyTorch
Build Production-ready Agentic Workflow with Natural Language
Low-latency REST API for serving text-embeddings
LLM training code for MosaicML foundation models
Tensor search for humans
Set of comprehensive computer vision & machine intelligence libraries
Easy-to-use deep learning framework with 3 key features
A real time inference engine for temporal logical specifications
Images to inference with no labeling