C++ implementation of ChatGLM-6B & ChatGLM2-6B & ChatGLM3 & GLM4(V)
Tongyi Deep Research, the Leading Open-source Deep Research Agent
Repo for SeedVR2 & SeedVR
Multimodal-Driven Architecture for Customized Video Generation
Tiny vision language model
VGGSfM: Visual Geometry Grounded Deep Structure From Motion
Qwen3-omni is a natively end-to-end, omni-modal LLM
Large-language-model & vision-language-model based on Linear Attention
Generate Any 3D Scene in Seconds
High-resolution models for human tasks
MiniMax-M2, a model built for Max coding & agentic workflows
Uncommon Objects in 3D dataset
Towards Real-World Vision-Language Understanding
code for Mesh R-CNN, ICCV 2019
Memory-efficient and performant finetuning of Mistral's models
A SOTA open-source image editing model
Python example app from the OpenAI API quickstart tutorial
GLM-4.5V and GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning
The ChatGPT Retrieval Plugin lets you easily find personal documents
High-Fidelity and Controllable Generation of Textured 3D Assets
LLM-based Reinforcement Learning audio edit model
MedicalGPT: Training Your Own Medical GPT Model with ChatGPT Training
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
Instructions on how to use the Realtime API on Microcontrollers
Inference script for Oasis 500M