Multimodal-Driven Architecture for Customized Video Generation
Advanced language and coding AI model
Qwen3.6 is the large language model series developed by Qwen team
Moonshot's most powerful AI model
A Customizable Image-to-Video Model based on HunyuanVideo
Reference PyTorch implementation and models for DINOv3
Phi-3.5 for Mac: Locally-run Vision and Language Models
GLM-5: From Vibe Coding to Agentic Engineering
OpenTinker is an RL-as-a-Service infrastructure for foundation models
Industrial-level controllable zero-shot text-to-speech system
General-purpose image editing model that delivers high-fidelity
OCR expert VLM powered by Hunyuan's native multimodal architecture
Qwen-Image is a powerful image generation foundation model
Controllable & emotion-expressive zero-shot TTS
Pokee Deep Research Model Open Source Repo
Official implementation of Watermark Anything with Localized Messages
The official repo of Qwen chat & pretrained large language model
Robust Speech Recognition Across Languages, Dialects
PyTorch implementation of JiT
Video Object and Interaction Deletion
Qwen3-omni is a natively end-to-end, omni-modal LLM
A state-of-the-art open visual language model
PyTorch code and models for the DINOv2 self-supervised learning
Stable Diffusion WebUI Forge is a platform on top of Stable Diffusion
GPT4V-level open-source multi-modal model based on Llama3-8B