CogView4, CogView3-Plus and CogView3(ECCV 2024)
Memory-efficient and performant finetuning of Mistral's models
Official implementation of DreamCraft3D
Ring is a reasoning MoE LLM provided and open-sourced by InclusionAI
NVIDIA Isaac GR00T N1.5 is the world's first open foundation model
A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming
New family of code large language models (LLMs)
Reproduction of Poetiq's record-breaking submission to the ARC-AGI-1
Stable Diffusion WebUI Forge is a platform on top of Stable Diffusion
code for Mesh R-CNN, ICCV 2019
Uncommon Objects in 3D dataset
Language modeling in a sentence representation space
A series of math-specific large language models of our Qwen2 series
A SOTA open-source image editing model
Repo of Qwen2-Audio chat & pretrained large audio language model
Chat & pretrained large vision language model
Multi-modal large language model designed for audio understanding
Open-source framework for intelligent speech interaction
OCR expert VLM powered by Hunyuan's native multimodal architecture
Open-weight, large-scale hybrid-attention reasoning model
Large-language-model & vision-language-model based on Linear Attention
Capable of understanding text, audio, vision, video
Qwen3-omni is a natively end-to-end, omni-modal LLM
Chat & pretrained large audio language model proposed by Alibaba Cloud
Release for Improved Denoising Diffusion Probabilistic Models