Official repository for LTX-Video
Python bindings for llama.cpp
LTX-Video Support for ComfyUI
Video Object and Interaction Deletion
A theoretical reconstruction of the Claude Mythos architecture
Recovering the Visual Space from Any Views
Contexts Optical Compression
Bidirectional token-classification model for identifiable info
Visual Causal Flow
Open-source multi-speaker long-form text-to-speech model
The official repo of Qwen chat & pretrained large language model
Audio foundation model excelling in audio understanding
Sharp Monocular Metric Depth in Less Than a Second
Open Source Speech Language Model
VGGSfM: Visual Geometry Grounded Deep Structure From Motion
Video understanding codebase from FAIR for reproducing video models
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
Large Multimodal Models for Video Understanding and Editing
Ultra-Efficient LLMs on End Device
Official implementation of DreamCraft3D
Diffusion Transformer with Fine-Grained Chinese Understanding
Multi-modal large language model designed for audio understanding
OCR expert VLM powered by Hunyuan's native multimodal architecture
Large-language-model & vision-language-model based on Linear Attention
AI-powered tool to quickly remove watermarks from images flawlessly