Contexts Optical Compression
Visual Causal Flow
Official repository for LTX-Video
General-purpose image editing model that delivers high-fidelity
Qwen3-ASR is an open-source series of ASR models
Open-source multi-speaker long-form text-to-speech model
Large-language-model & vision-language-model based on Linear Attention
Diffusion Transformer with Fine-Grained Chinese Understanding
The official repo of Qwen chat & pretrained large language model
Large Multimodal Models for Video Understanding and Editing
Multimodal model achieving SOTA performance
OCR expert VLM powered by Hunyuan's native multimodal architecture
Audio foundation model excelling in audio understanding
Multi-modal large language model designed for audio understanding
Official implementation of DreamCraft3D
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
Dataset of GPT-2 outputs for research in detection, biases, and more
A Conversational Speech Generation Model
Encoder of greater-than-word length text trained on a variety of data
Code for the paper Hybrid Spectrogram and Waveform Source Separation
Real Time Speech Enhancement in the Waveform Domain (Interspeech 2020)
ClinicalBERT model trained on MIMIC notes for clinical NLP tasks
CTC-based forced aligner for audio-text in 158 languages
Compact 8B multimodal instruct model optimized for edge deployment
Small 3B-base multimodal model ideal for custom AI on edge hardware