A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming
Private chat with local GPT with document, images, video, etc.
AI-powered video clipping and highlight generation
OCR expert VLM powered by Hunyuan's native multimodal architecture
Implementation of Phenaki Video, which uses Mask GIT
Large Multimodal Models for Video Understanding and Editing
Generate high-definition story short videos with one click using AI
A Customizable Image-to-Video Model based on HunyuanVideo
Multimodal-Driven Architecture for Customized Video Generation
AI video agents framework for next-gen video interactions
Repo for SeedVR2 & SeedVR
Focus on prompting and generating
Capable of understanding text, audio, vision, video
Official MiniMax Model Context Protocol (MCP) server
Tencent Hunyuan Multimodal diffusion transformer (MM-DiT) model
A minimalist environment for decision-making in autonomous driving
An Open Source package that allows video game creators
3D reconstruction software
Official Repo For "Sa2VA: Marrying SAM2 with LLaVA
Multimodal Diffusion with Representation Alignment
Stable Diffusion web UI
An unsupervised and free tool for image and video dataset analysis
The Clay Foundation Model - An open source AI model and interface
Lightweight framework for building Agents with memory, knowledge, etc.
Agentic, Reasoning, and Coding (ARC) foundation models