HunyuanOCR
Tencent Hunyuan is a large-scale, multimodal AI model family developed by Tencent that spans text, image, video, and 3D modalities, designed for general-purpose AI tasks like content generation, visual reasoning, and business automation. Its model lineup includes variants optimized for natural language understanding, multimodal vision-language comprehension (e.g., image & video understanding), text-to-image creation, video generation, and 3D content generation. Hunyuan models leverage a mixture-of-experts architecture and other innovations (like hybrid “mamba-transformer” designs) to deliver strong performance on reasoning, long-context understanding, cross-modal tasks, and efficient inference. For example, the vision-language model Hunyuan-Vision-1.5 supports “thinking-on-image”, enabling deep multimodal understanding and reasoning on images, video frames, diagrams, or spatial data.
Learn more
HunyuanWorld
HunyuanWorld-1.0 is an open source AI framework and generative model developed by Tencent Hunyuan that creates immersive, explorable, and interactive 3D worlds from text prompts or image inputs by combining the strengths of 2D and 3D generation techniques into a unified pipeline. At its core, the project features a semantically layered 3D mesh representation that uses 360° panoramic world proxies to decompose and reconstruct scenes with geometric consistency and semantic awareness, enabling the creation of diverse, coherent environments that can be navigated and interacted with. Unlike traditional 3D generation methods that struggle with either limited diversity or inefficient data representations, HunyuanWorld-1.0 integrates panoramic proxy generation, hierarchical 3D reconstruction, and semantic layering to balance high visual quality and structural integrity while enabling exportable meshes compatible with common graphics workflows.
Learn more
Hunyuan T1
Hunyuan T1 is Tencent's deep-thinking AI model, now fully open to all users through the Tencent Yuanbao platform. This model excels in understanding multiple dimensions and potential logical relationships, making it suitable for handling complex tasks. Users can experience various AI models on the platform, including DeepSeek-R1 and Tencent Hunyuan Turbo. The official version of the Tencent Hunyuan T1 model will also be launched soon, providing external API access and other services. Built upon Tencent's Hunyuan large language model, Yuanbao excels in Chinese language understanding, logical reasoning, and task execution. It offers AI-based search, summaries, and writing capabilities, enabling users to analyze documents and engage in prompt-based interactions.
Learn more
Hunyuan-Vision-1.5
HunyuanVision is a cutting-edge vision-language model developed by Tencent’s Hunyuan team. It uses a mamba-transformer hybrid architecture to deliver strong performance and efficient inference in multimodal reasoning tasks. The version Hunyuan-Vision-1.5 is designed for “thinking on images,” meaning it not only understands vision+language content, but can perform deeper reasoning that involves manipulating or reflecting on image inputs, such as cropping, zooming, pointing, box drawing, or drawing on the image to acquire additional knowledge. It supports a variety of vision tasks (image + video recognition, OCR, diagram understanding), visual reasoning, and even 3D spatial comprehension, all in a unified multilingual framework. The model is built to work seamlessly across languages and tasks and is intended to be open sourced (including checkpoints, technical report, inference support) to encourage the community to experiment and adopt.
Learn more