+
+

Related Products

  • LM-Kit.NET
    22 Ratings
    Visit Website
  • Vertex AI
    743 Ratings
    Visit Website
  • Google AI Studio
    9 Ratings
    Visit Website
  • LTX
    142 Ratings
    Visit Website
  • Ango Hub
    15 Ratings
    Visit Website
  • Encompassing Visions
    13 Ratings
    Visit Website
  • Mentornity
    99 Ratings
    Visit Website
  • Jesta Vision Suite
    25 Ratings
    Visit Website
  • Inspire Software
    149 Ratings
    Visit Website
  • RunPod
    180 Ratings
    Visit Website

About

HunyuanVision is a cutting-edge vision-language model developed by Tencent’s Hunyuan team. It uses a mamba-transformer hybrid architecture to deliver strong performance and efficient inference in multimodal reasoning tasks. The version Hunyuan-Vision-1.5 is designed for “thinking on images,” meaning it not only understands vision+language content, but can perform deeper reasoning that involves manipulating or reflecting on image inputs, such as cropping, zooming, pointing, box drawing, or drawing on the image to acquire additional knowledge. It supports a variety of vision tasks (image + video recognition, OCR, diagram understanding), visual reasoning, and even 3D spatial comprehension, all in a unified multilingual framework. The model is built to work seamlessly across languages and tasks and is intended to be open sourced (including checkpoints, technical report, inference support) to encourage the community to experiment and adopt.

About

HunyuanCustom is a multi-modal customized video generation framework that emphasizes subject consistency while supporting image, audio, video, and text conditions. Built upon HunyuanVideo, it introduces a text-image fusion module based on LLaVA for enhanced multi-modal understanding, along with an image ID enhancement module that leverages temporal concatenation to reinforce identity features across frames. To enable audio- and video-conditioned generation, it further proposes modality-specific condition injection mechanisms, an AudioNet module that achieves hierarchical alignment via spatial cross-attention, and a video-driven injection module that integrates latent-compressed conditional video through a patchify-based feature-alignment network. Extensive experiments on single- and multi-subject scenarios demonstrate that HunyuanCustom significantly outperforms state-of-the-art open and closed source methods in terms of ID consistency, realism, and text-video alignment.

Platforms Supported

Windows
Mac
Linux
Cloud
On-Premises
iPhone
iPad
Android
Chromebook

Platforms Supported

Windows
Mac
Linux
Cloud
On-Premises
iPhone
iPad
Android
Chromebook

Audience

AI researchers, developers, and teams interested in a solution offering multimodal understanding and reasoning across languages

Audience

Digital content creators and filmmakers wanting a solution to generate personalized, subject-consistent videos using multi-modal inputs

Support

Phone Support
24/7 Live Support
Online

Support

Phone Support
24/7 Live Support
Online

API

Offers API

API

Offers API

Screenshots and Videos

Screenshots and Videos

Pricing

Free
Free Version
Free Trial

Pricing

No information available.
Free Version
Free Trial

Reviews/Ratings

Overall 0.0 / 5
ease 0.0 / 5
features 0.0 / 5
design 0.0 / 5
support 0.0 / 5

This software hasn't been reviewed yet. Be the first to provide a review:

Review this Software

Reviews/Ratings

Overall 0.0 / 5
ease 0.0 / 5
features 0.0 / 5
design 0.0 / 5
support 0.0 / 5

This software hasn't been reviewed yet. Be the first to provide a review:

Review this Software

Training

Documentation
Webinars
Live Online
In Person

Training

Documentation
Webinars
Live Online
In Person

Company Information

Tencent
Founded: 1998
China
github.com/Tencent-Hunyuan/HunyuanVision

Company Information

Tencent
Founded: 1998
China
hunyuancustom.github.io

Alternatives

Hunyuan T1

Hunyuan T1

Tencent

Alternatives

HunyuanVideo-Avatar

HunyuanVideo-Avatar

Tencent-Hunyuan
PaliGemma 2

PaliGemma 2

Google
VideoPoet

VideoPoet

Google
Seaweed

Seaweed

ByteDance
Gemini Robotics

Gemini Robotics

Google DeepMind
Gen-2

Gen-2

Runway

Categories

Categories

Integrations

CUDA
Hugging Face
Hunyuan T1
HunyuanVideo

Integrations

CUDA
Hugging Face
Hunyuan T1
HunyuanVideo
Claim Hunyuan-Vision-1.5 and update features and information
Claim Hunyuan-Vision-1.5 and update features and information
Claim HunyuanCustom and update features and information
Claim HunyuanCustom and update features and information