Management of Yandex Station and other smart home devices
A fast TTS architecture with conditional flow matching
Implementation of Video Diffusion Models
Capable of understanding text, audio, vision, video
A New Axis of Sparsity for Large Language Models
Towards Human-Sounding Speech
Multimodal-Driven Architecture for Customized Video Generation
Chat & pretrained large vision language model
NLP Cloud serves high performance pre-trained or custom models for NER
Qwen2.5-VL is the multimodal large language model series
StreamSpeech is a seamless model for offline speech recognition
Foundational model for human-like, expressive TTS
Qwen3 is the large language model series developed by Qwen team
Large-language-model & vision-language-model based on Linear Attention
Implementation of Make-A-Video, new SOTA text to video generator
Open source machine learning framework to automate text conversations
Official inference repo for FLUX.1 models
Chinese and English multimodal conversational language model
Designed for text embedding and ranking tasks
Open source personal AI Assistant for Linux, Windows and Mac
"Big Model" trains a visual multimodal VLM with 26M parameters
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
The most accurate natural language detection library for Python
A very simple framework for state-of-the-art NLP
End-to-end speech processing toolkit