Multimodal Diffusion with Representation Alignment
Open-source deep-learning framework
A Powerful Native Multimodal Model for Image Generation
The Clay Foundation Model - An open source AI model and interface
Robust Speech Recognition Across Languages, Dialects
Video Object and Interaction Deletion
Recovering the Visual Space from Any Views
Achieving 3+ generation speedup on reasoning tasks
Ultra-Efficient LLMs on End Device
HY-Motion model for 3D character animation generation
Generate Any 3D Scene in Seconds
PyTorch code and models for the DINOv2 self-supervised learning
A Customizable Image-to-Video Model based on HunyuanVideo
GLM-4-Voice | End-to-End Chinese-English Conversational Model
Renderer for the harmony response format to be used with gpt-oss
High-Fidelity and Controllable Generation of Textured 3D Assets
RGBD video generation model conditioned on camera input
Large-language-model & vision-language-model based on Linear Attention
Qwen-Image is a powerful image generation foundation model
Inference code for scalable emulation of protein equilibrium ensembles
Tiny vision language model
The official PyTorch implementation of Google's Gemma models
Z80-μLM is a 2-bit quantized language model
Implementation of "MobileCLIP" CVPR 2024
Tool for exploring and debugging transformer model behaviors