A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming
Library for OCR-related tasks powered by Deep Learning
Capable of understanding text, audio, vision, video
Controllable and fast Text-to-Speech for over 7000 languages
LLM powered fuzzing via OSS-Fuzz
Mixture-of-Experts Vision-Language Models for Advanced Multimodal
kaldi-asr/kaldi is the official location of the Kaldi project
Official implementation of Watermark Anything with Localized Messages
Multilingual Automatic Speech Recognition with word-level timestamps
Build GenAI application quick and easy
Your open-source LLM evaluation toolkit
A Unified Framework for Image Customization
Tensor search for humans
Stanford NLP Python library for many human languages
UI-TARS-desktop version that can operate on your local personal device
LLM-based Reinforcement Learning audio edit model
Serving LangChain LLM apps automagically with FastApi
Optimized Workforce Learning for General Multi-Agent Assistance
FaceOnLive Open KYC: Streamlining Identity Verification with AI
A central, open resource for data and tools
A Machine Learning Framework for Time Series Intelligence
A python library built to empower developers
A text generation library with pre-trained language models github.com
A Strong and Easy-to-use Single View 3D Hand+Body Pose Estimator
Automatic architecture search and hyperparameter optimization