Port of Facebook's LLaMA model in C/C++
Run serverless GPU workloads with fast cold starts on bare-metal
A RWKV management and startup tool, full automation, only 8MB
Easiest and laziest way for building multi-agent LLMs applications
A high-throughput and memory-efficient inference and serving engine
Fast inference engine for Transformer models
Gaussian processes in TensorFlow
Protect and discover secrets using Gitleaks
PyTorch extensions for fast R&D prototyping and Kaggle farming
A general-purpose probabilistic programming system
lightweight, standalone C++ inference engine for Google's Gemma models
Unified Model Serving Framework
INT4/INT5/INT8 and FP16 inference on CPU for RWKV language model
Run local LLMs like llama, deepseek, kokoro etc. inside your browser
Deep Learning API and Server in C++14 support for Caffe, PyTorch
Build Production-ready Agentic Workflow with Natural Language
Low-latency REST API for serving text-embeddings
LLM training code for MosaicML foundation models
Tensor search for humans
Set of comprehensive computer vision & machine intelligence libraries
Easy-to-use deep learning framework with 3 key features
A real time inference engine for temporal logical specifications
Images to inference with no labeling
High quality, fast, modular reference implementation of SSD in PyTorch
Framework that is dedicated to making neural data processing