A Unified Framework for Text-to-3D and Image-to-3D Generation
Multimodal-Driven Architecture for Customized Video Generation
Bidirectional token-classification model for identifiable info
Inference script for Oasis 500M
Official implementation of DreamCraft3D
Open-source large language model family from Tencent Hunyuan
A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming
A Multi-Modal World Model for Reconstructing, Generating, Simulation
A Systematic Framework for Interactive World Modeling
Stable Diffusion WebUI Forge is a platform on top of Stable Diffusion
Unified Multimodal Understanding and Generation Models
Global weather forecasting model using graph neural networks and JAX
GPT4V-level open-source multi-modal model based on Llama3-8B
A series of math-specific large language models of our Qwen2 series
Tongyi Deep Research, the Leading Open-source Deep Research Agent
Large Multimodal Models for Video Understanding and Editing
OCR expert VLM powered by Hunyuan's native multimodal architecture
Open-weight, large-scale hybrid-attention reasoning model
Netease Youdao's open-source embedding and reranker models
An Efficient Agentic Model for Computer Use
Audio foundation model excelling in audio understanding
Official code base for LeWorldModel: Stable End-to-End Joint-Embedding
Repo for SeedVR2 & SeedVR
A 0.1B Omni model trained from scratch
26m function call model that runs on incredibly small devices