Python bindings for llama.cpp
Official repository for LTX-Video
Inference framework for 1-bit LLMs
Contexts Optical Compression
LTX-Video Support for ComfyUI
The official repo of Qwen chat & pretrained large language model
Chinese LLaMA-2 & Alpaca-2 Large Model Phase II Project
Open-source multi-speaker long-form text-to-speech model
Sharp Monocular Metric Depth in Less Than a Second
VGGSfM: Visual Geometry Grounded Deep Structure From Motion
Video understanding codebase from FAIR for reproducing video models
OCR expert VLM powered by Hunyuan's native multimodal architecture
Official implementation of DreamCraft3D
Diffusion Transformer with Fine-Grained Chinese Understanding
Large Multimodal Models for Video Understanding and Editing
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
Multi-modal large language model designed for audio understanding
Large-language-model & vision-language-model based on Linear Attention
Dataset of GPT-2 outputs for research in detection, biases, and more
Suite with Real-ESRGAN, BSRGAN , RealESRNet, IRCNN, GFPGAN & RIFE.
Di♪♪Rhythm: Blazingly Fast & Simple End-to-End Song Generation
Qwen2.5-Coder is the code version of Qwen2.5, the large language model
A Conversational Speech Generation Model
Code for the paper Hybrid Spectrogram and Waveform Source Separation
Real Time Speech Enhancement in the Waveform Domain (Interspeech 2020)