LLM-based Reinforcement Learning audio edit model
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
New family of code large language models (LLMs)
Reproduction of Poetiq's record-breaking submission to the ARC-AGI-1
DeepMind model for tracking arbitrary points across videos & robotics
Uncommon Objects in 3D dataset
MapAnything: Universal Feed-Forward Metric 3D Reconstruction
Language modeling in a sentence representation space
An AI-powered security review GitHub Action using Claude
Implementation of the Surya Foundation Model for Heliophysics
A SOTA open-source image editing model
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
High-Fidelity and Controllable Generation of Textured 3D Assets
Multi-modal large language model designed for audio understanding
A state-of-the-art open visual language model
ChatGPT interface with better UI
Stable Diffusion with Core ML on Apple Silicon
High-Resolution Image Synthesis with Latent Diffusion Models
Towards Real-World Vision-Language Understanding
The ChatGPT Retrieval Plugin lets you easily find personal documents
AI-powered tool to quickly remove watermarks from images flawlessly
Chat & pretrained large vision language model
Pushing the Limits of Mathematical Reasoning in Open Language Models
Qwen2.5-Coder is the code version of Qwen2.5, the large language model