High-Resolution 3D Assets Generation with Large Scale Diffusion Models
The official repo of Qwen chat & pretrained large language model
Bidirectional token-classification model for identifiable info
HY-Motion model for 3D character animation generation
A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming
Implementation of "MobileCLIP" CVPR 2024
A Systematic Framework for Interactive World Modeling
Unified Multimodal Understanding and Generation Models
OCR expert VLM powered by Hunyuan's native multimodal architecture
Fast-stable-diffusion + DreamBooth
Ultra-Efficient LLMs on End Device
Large Multimodal Models for Video Understanding and Editing
Memory-efficient and performant finetuning of Mistral's models
Repo of Qwen2-Audio chat & pretrained large audio language model
Diffusion model(SD,Flux,Wan,Qwen Image,Z-Image,...) inference
FAIR Sequence Modeling Toolkit 2
Qwen-Image-Layered: Layered Decomposition for Inherent Editablity
Renderer for the harmony response format to be used with gpt-oss
Multimodal Diffusion with Representation Alignment
Audio foundation model excelling in audio understanding
Multi-modal large language model designed for audio understanding
The official PyTorch implementation of Google's Gemma models
Open-weight, large-scale hybrid-attention reasoning model
Open-source industrial-grade ASR models
Block Diffusion for Ultra-Fast Speculative Decoding