High-Resolution 3D Assets Generation with Large Scale Diffusion Models
AlphaFold 3 inference pipeline
Python inference and LoRA trainer package for the LTX-2 audio–video
RGBD video generation model conditioned on camera input
Chat & pretrained large audio language model proposed by Alibaba Cloud
Stable Virtual Camera: Generative View Synthesis with Diffusion Models
State-of-the-art (SoTA) text-to-video pre-trained model
This repository contains the official implementation of FastVLM
An experimental version of DeepSeek model
GPT4V-level open-source multi-modal model based on Llama3-8B
Designed for text embedding and ranking tasks
Official inference repo for FLUX.2 models
Inference script for Oasis 500M
Qwen3-TTS is an open-source series of TTS models
Qwen3-omni is a natively end-to-end, omni-modal LLM
Open-source framework for intelligent speech interaction
Tool for exploring and debugging transformer model behaviors
Recovering the Visual Space from Any Views
GLM-4-Voice | End-to-End Chinese-English Conversational Model
Mixture-of-Experts Vision-Language Models for Advanced Multimodal
Chat & pretrained large vision language model
code for Mesh R-CNN, ICCV 2019
Ling-V2 is a MoE LLM provided and open-sourced by InclusionAI
Tiny vision language model
Qwen2.5-VL is the multimodal large language model series