An opinionated CLI to transcribe Audio files w/ Whisper on-device
MARS5 speech model (TTS) from CAMB.AI
Chinese and English multimodal conversational language model
Petastorm library enables single machine or distributed training
Turn words into colors
A TTS model capable of generating ultra-realistic dialogue
AutoGluon: AutoML for Image, Text, and Tabular Data
A Repo For Document AI
Bidirectional token-classification model for identifiable info
Open speech-to-speech models and pipelines by Hugging Face toolkit AI
A Systematic Framework for Interactive World Modeling
Automatically translates the text of a video based on a subtitle file
Multilingual sentence & image embeddings with BERT
[CVPR 2026 Oral] VGGT Omega
Making RAG Simpler with Small and Open-Sourced Language Models
A Python tool to help extracting information from structured PDFs
Paste Markdown and AI responses into Word Excel instantly fast
Automate native Android apps with AI using accessibility APIs
LLM-based Reinforcement Learning audio edit model
The official repo of Qwen chat & pretrained large language model
Google Gen AI Python SDK provides an interface for developers
Ultra-Efficient LLMs on End Device
CineCLI is a cross-platform command-line movie browser
Python MUD/MUX/MUSH/MU* development system
A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming