GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
Repo for SeedVR2 & SeedVR
Large Multimodal Models for Video Understanding and Editing
code for Mesh R-CNN, ICCV 2019
OCR expert VLM powered by Hunyuan's native multimodal architecture
Official implementation of DreamCraft3D
Collection of Gemma 3 variants that are trained for performance
The official PyTorch implementation of Google's Gemma models
Reproduction of Poetiq's record-breaking submission to the ARC-AGI-1
A SOTA open-source image editing model
Implementation of the Surya Foundation Model for Heliophysics
Inference script for Oasis 500M
tiktoken is a fast BPE tokeniser for use with OpenAI's models
Global weather forecasting model using graph neural networks and JAX
LLM-based Reinforcement Learning audio edit model
Qwen2.5-Coder is the code version of Qwen2.5, the large language model
Open Multilingual Multimodal Chat LMs
Open-source, high-performance Mixture-of-Experts large language model
Official code for Style Aligned Image Generation via Shared Attention
Powerful open source image generation model
Code for the paper Hybrid Spectrogram and Waveform Source Separation
Fine-tuning ChatGLM-6B with PEFT
Official PyTorch Implementation of "Scalable Diffusion Models"
PyTorch implementation of VALL-E (Zero-Shot Text-To-Speech)
Code release for ConvNeXt V2 model