MII makes low-latency and high-throughput inference possible
A robust, efficient, low-latency speech-to-text library
Towards Human-Sounding Speech
FlashInfer: Kernel Library for LLM Serving
Optimizing inference proxy for LLMs
Machine learning on FPGAs using HLS
Personal AI, On Personal Devices
Build Vision Agents quickly with any model or video provider
AI memory OS for LLM and Agent systems
Advancing Open-source World Models
Deep learning optimization library: makes distributed training easy
RF-DETR is a real-time object detection and segmentation
Fast multimodal LLM for real-time voice interaction and AI apps
Large Audio Language Model built for natural interactions
Fast backend for long-term AI user memory via structured profiles
The official Python SDK for the ElevenLabs API
Converts text to speech in realtime
NVR with realtime local object detection for IP cameras
Parallax is a distributed model serving framework
Long-form streaming TTS system for multi-speaker dialogue generation
Implementation of "MobileCLIP" CVPR 2024
Low-latency AI inference engine optimized for mobile devices
Cache-Augmented Generation: A Simple, Efficient Alternative to RAG
An LLM Compiler for Parallel Function Calling
LightLLM is a Python-based LLM (Large Language Model) inference