Recovering the Visual Space from Any Views
Qwen-Image is a powerful image generation foundation model
My personal Claude Code configuration
FlashMLA: Efficient Multi-head Latent Attention Kernels
Chat & pretrained large audio language model proposed by Alibaba Cloud
Encoder of greater-than-word length text trained on a variety of data
Let us control diffusion models
Real Time Speech Enhancement in the Waveform Domain (Interspeech 2020)
Learning embeddings for classification, retrieval and ranking
Lightweight on-device model for private AI text redaction
Vision-language-action model for robot control via images and text