Official implementation of DreamCraft3D
Tencent Hunyuan Multimodal diffusion transformer (MM-DiT) model
High-resolution models for human tasks
Multimodal Diffusion with Representation Alignment
Repo of Qwen2-Audio chat & pretrained large audio language model
INT4/INT5/INT8 and FP16 inference on CPU for RWKV language model
Pokee Deep Research Model Open Source Repo
Stable Diffusion WebUI Forge is a platform on top of Stable Diffusion
MapAnything: Universal Feed-Forward Metric 3D Reconstruction
Renderer for the harmony response format to be used with gpt-oss
Qwen3-omni is a natively end-to-end, omni-modal LLM
GLM-4-Voice | End-to-End Chinese-English Conversational Model
Qwen2.5-VL is the multimodal large language model series
The official PyTorch implementation of Google's Gemma models
An AI-powered security review GitHub Action using Claude
A series of math-specific large language models of our Qwen2 series
Inference framework for 1-bit LLMs
Capable of understanding text, audio, vision, video
MedicalGPT: Training Your Own Medical GPT Model with ChatGPT Training
Implementation of "MobileCLIP" CVPR 2024
A state-of-the-art open visual language model
Towards Real-World Vision-Language Understanding
CLIP, Predict the most relevant text snippet given an image
Ling is a MoE LLM provided and open-sourced by InclusionAI
A Unified Framework for Text-to-3D and Image-to-3D Generation