Generate blog articles from video or audio
When LLM Meets Domain Experts
Open-sourced unified customization model
SOTA discrete acoustic codec models with 40/75 tokens per second
VGGSfM: Visual Geometry Grounded Deep Structure From Motion
Uncommon Objects in 3D dataset
GPT4V-level open-source multi-modal model based on Llama3-8B
GLM-4.5V and GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning
Evals is a framework for evaluating LLMs and LLM systems
The ChatGPT Retrieval Plugin lets you easily find personal documents
Revolutionizes the way users interact with Autogen
A global resource download orchestration system
Petastorm library enables single machine or distributed training
This repo contains the code for 1D tokenizer and generator
A Universal Customization Method for Single and Multi Conditioning
A Unified Framework for Image Customization
Plug-and-play library to enable agents to call MCP and UTCP tools
MetricFlow allows you to define, build, and maintain metrics in code
LLM powered fuzzing via OSS-Fuzz
Blender pipeline for photorealistic training image generation
Multi-Modal Neural Networks for Semantic Search, based on Mid-Fusion
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
Multi-modal large language model designed for audio understanding
Automatically translates the text of a video based on a subtitle file
Qwen3-omni is a natively end-to-end, omni-modal LLM