Multilingual Automatic Speech Recognition with word-level timestamps
Neural Network architecture based on ideas of the original LSTM
A lightweight, powerful framework for multi-agent workflows
Unified KV Cache Compression Methods for Auto-Regressive Models
State-of-the-art TTS model under 25MB
Fast, small, and fully autonomous AI assistant infrastructure
Run a 1-billion parameter LLM on a $10 board with 256MB RAM
Automatic AI-powered timeline of your daily work activity logs
Demo of a customer service use case implemented with the OpenAI Agents
Open-source large language model family from Tencent Hunyuan
C++ implementation of ChatGLM-6B & ChatGLM2-6B & ChatGLM3 & GLM4(V)
Redundancy-aware KV Cache Compression for Reasoning Models
Low-latency AI inference engine optimized for mobile devices
AI Agent Source Code Deep Research Report
ChatGLM2-6B: An Open Bilingual Chat LLM
157 models, 30 providers, one command to find what runs on hardware
A step-by-step guide to build your own AI agent
A personal AI assistant that evolves with you
Faster Whisper transcription with CTranslate2
Memory-efficient and performant finetuning of Mistral's models
High-performance neural network inference framework for mobile
Supercharge Your LLM with the Fastest KV Cache Layer
Persistent context and multi-instance coordination
A Python library for audio
Developer friendly Natural Language Processing