Recovering the Visual Space from Any Views
OCR expert VLM powered by Hunyuan's native multimodal architecture
Repo for SeedVR2 & SeedVR
The official PyTorch implementation of Google's Gemma models
Long-form streaming TTS system for multi-speaker dialogue generation
Implementation of the Surya Foundation Model for Heliophysics
A SOTA open-source image editing model
Pretrained time-series foundation model developed by Google Research
Inference script for Oasis 500M
tiktoken is a fast BPE tokeniser for use with OpenAI's models
Release for Improved Denoising Diffusion Probabilistic Models
code for Mesh R-CNN, ICCV 2019
LLM-based Reinforcement Learning audio edit model
Reproduction of Poetiq's record-breaking submission to the ARC-AGI-1
Qwen2.5-Coder is the code version of Qwen2.5, the large language model
Open-source, high-performance Mixture-of-Experts large language model
Towards Ultimate Expert Specialization in Mixture-of-Experts Language
Powerful open source image generation model
Open Multilingual Multimodal Chat LMs
Official code for Style Aligned Image Generation via Shared Attention
Code for the paper Hybrid Spectrogram and Waveform Source Separation
Chinese LLaMA & Alpaca large language model + local CPU/GPU training
Fine-tuning ChatGLM-6B with PEFT
Official PyTorch Implementation of "Scalable Diffusion Models"
PyTorch implementation of VALL-E (Zero-Shot Text-To-Speech)