Official implementation of DreamCraft3D
The official PyTorch implementation of Google's Gemma models
Reference PyTorch implementation and models for DINOv3
Qwen-Image is a powerful image generation foundation model
A state-of-the-art open visual language model
Python bindings for llama.cpp
Convert Google Gemini web into OpenAI-compatible API
Collection of Gemma 3 variants that are trained for performance
Multimodal Diffusion with Representation Alignment
Qwen3-TTS is an open-source series of TTS models
Tongyi Deep Research, the Leading Open-source Deep Research Agent
LTX-Video Support for ComfyUI
Accurate × Fast × Comprehensive
Fast and Universal 3D reconstruction model for versatile tasks
4M: Massively Multimodal Masked Modeling
This repository contains the official implementation of FastVLM
ICLR2024 Spotlight: curation/training code, metadata, distribution
RGBD video generation model conditioned on camera input
Recovering the Visual Space from Any Views
Advancing Open-source World Models
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
Controllable & emotion-expressive zero-shot TTS
Pokee Deep Research Model Open Source Repo
Unified Multimodal Understanding and Generation Models
An AI-powered security review GitHub Action using Claude