Finding the Scaling Law of Agents. A multi-agent framework
RGBD video generation model conditioned on camera input
TTS for Context-Aware Speech Generation and True-to-Life Voice Cloning
OpenLIT is an open-source LLM Observability tool
Distill your ex into an AI Skill
Recovering the Visual Space from Any Views
Deep learning optimization library making distributed training easy
Designed for text embedding and ranking tasks
SOTA Open Source TTS
Mixture-of-Experts Vision-Language Models for Advanced Multimodal
Qwen3-omni is a natively end-to-end, omni-modal LLM
A Python package for segmenting geospatial data with the SAM
The Open Source Cowork Desktop to Unlock Your Exceptional Productivity
An open phone agent model & framework
An experimental version of DeepSeek model
Fast multimodal LLM for real-time voice interaction and AI apps
Qwen2.5-VL is the multimodal large language model series
A simple but complete full-attention transformer
A high-quality PDF to Markdown tool based on large language model
Spark-TTS Inference Code
A trainable PyTorch reproduction of AlphaFold 3
TFX is an end-to-end platform for deploying production ML pipelines
State-of-the-art (SoTA) text-to-video pre-trained model
GLM-4-Voice | End-to-End Chinese-English Conversational Model
Build and run agents you can see, understand and trust