Official repository for LTX-Video
Python bindings for llama.cpp
C++ implementation of ChatGLM-6B & ChatGLM2-6B & ChatGLM3 & GLM4(V)
LTX-Video Support for ComfyUI
From Vibe Coding to Agentic Engineering
Video Object and Interaction Deletion
A theoretical reconstruction of the Claude Mythos architecture
Contexts Optical Compression
Recovering the Visual Space from Any Views
Bidirectional token-classification model for identifiable info
Visual Causal Flow
Open-source multi-speaker long-form text-to-speech model
A multimodal model for brain response prediction
The official repo of Qwen chat & pretrained large language model
Audio foundation model excelling in audio understanding
Sharp Monocular Metric Depth in Less Than a Second
Open Source Speech Language Model
VGGSfM: Visual Geometry Grounded Deep Structure From Motion
Video understanding codebase from FAIR for reproducing video models
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
Large Multimodal Models for Video Understanding and Editing
Ultra-Efficient LLMs on End Device
Multimodal model achieving SOTA performance
Official implementation of DreamCraft3D
Diffusion Transformer with Fine-Grained Chinese Understanding