Generating Immersive, Explorable, and Interactive 3D Worlds
Unifying 3D Mesh Generation with Language Models
A Unified Framework for Text-to-3D and Image-to-3D Generation
High-Resolution 3D Assets Generation with Large Scale Diffusion Models
A text-to-speech, speech-to-text and speech-to-speech library
A Multi-Modal World Model for Reconstructing, Generating, Simulation
Generate Any 3D Scene in Seconds
HY-Motion model for 3D character animation generation
Official implementation of DreamCraft3D
Workflow and speech recognition app
Qwen3-VL, the multimodal large language model series by Alibaba Cloud
The data structure for multimodal data
Framework for building AI-powered interactive digital humans and agent
A Systematic Framework for Interactive World Modeling
Crafting engine for artists, designers, and filmmakers
Framework for building neural networks
Implementation of Make-A-Video, new SOTA text to video generator
Implementation of Video Diffusion Models
State-of-the-art diffusion models for image and audio generation
Build cross-modal and multimodal applications on the cloud
Amica is an open source interface for interactive communication
Generate 3D objects conditioned on text or images
Framework that is dedicated to making neural data processing
CLIP + FFT/DWT/RGB = text to image/video
Text-to-3D & Image-to-3D & Mesh Exportation with NeRF + Diffusion