A scalable inference server for models optimized with OpenVINO
The Triton Inference Server provides an optimized cloud
Port of OpenAI's Whisper model in C/C++
Deep Learning API and Server in C++14 support for Caffe, PyTorch
Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs
Run local LLMs like llama, deepseek, kokoro etc. inside your browser
A Pythonic framework to simplify AI service building
Library for serving Transformers models on Amazon SageMaker
Ready-to-use OCR with 80+ supported languages
Bring the notion of Model-as-a-Service to life
Open-Source AI Camera. Empower any camera/CCTV
State-of-the-art diffusion models for image and audio generation
Easiest and laziest way for building multi-agent LLMs applications
LLM training code for MosaicML foundation models
A general-purpose probabilistic programming system
A high-performance ML model serving framework, offers dynamic batching
GPU environment management and cluster orchestration
MNN is a blazing fast, lightweight deep learning framework
Protect and discover secrets using Gitleaks
Easy-to-use Speech Toolkit including Self-Supervised Learning model
A unified framework for scalable computing
Large Language Model Text Generation Inference
Fast inference engine for Transformer models
Openai style api for open large language models
Library for OCR-related tasks powered by Deep Learning