AI memory OS for LLM and Agent systems
Implementation of "MobileCLIP" CVPR 2024
CoreNet: A library for training deep neural networks
Python ETL framework for stream processing, real-time analytics, LLM
Qwen3-omni is a natively end-to-end, omni-modal LLM
Low-latency AI inference engine optimized for mobile devices
An LLM Compiler for Parallel Function Calling
LightLLM is a Python-based LLM (Large Language Model) inference
Large Audio Language Model built for natural interactions
StreamSpeech is a seamless model for offline speech recognition
Blazing-fast vector DB with similarity search and metadata filtering
DeepEP: an efficient expert-parallel communication library
SoTA open-source TTS
Low-latency REST API for serving text-embeddings
Official python implementation of UTCP. UTCP is an open standard
Implementation of TurboQuant (ICLR 2026)
Tokenizer-Free TTS for Multilingual Speech Generation
The official Meta Llama 3 GitHub site
Bailing is a voice dialogue robot similar to GPT-4o
Converts text to speech in realtime
Plug-and-play library to enable agents to call MCP and UTCP tools
The behavior guidance framework for customer-facing LLM agents
Block Diffusion for Ultra-Fast Speculative Decoding
Easiest and laziest way for building multi-agent LLMs applications
An open-source, ultra-low-latency remote desktop for Linux hosts