Contexts Optical Compression
Visual Causal Flow
Official repository for LTX-Video
Qwen3-ASR is an open-source series of ASR models
General-purpose image editing model that delivers high-fidelity
Open-source multi-speaker long-form text-to-speech model
Large-language-model & vision-language-model based on Linear Attention
Diffusion Transformer with Fine-Grained Chinese Understanding
Large Multimodal Models for Video Understanding and Editing
Multimodal model achieving SOTA performance
Audio foundation model excelling in audio understanding
Official implementation of DreamCraft3D
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
Dataset of GPT-2 outputs for research in detection, biases, and more
A Conversational Speech Generation Model
Encoder of greater-than-word length text trained on a variety of data
Real Time Speech Enhancement in the Waveform Domain (Interspeech 2020)
ClinicalBERT model trained on MIMIC notes for clinical NLP tasks
CTC-based forced aligner for audio-text in 158 languages
Compact 8B multimodal instruct model optimized for edge deployment
Small 3B-base multimodal model ideal for custom AI on edge hardware
Efficient 14B multimodal instruct model with edge deployment and FP8