PyTorch code and models for the DINOv2 self-supervised learning
Clean and efficient FP8 GEMM kernels with fine-grained scaling
Towards Real-World Vision-Language Understanding
Towards self-verifiable mathematical reasoning
Sharp Monocular Metric Depth in Less Than a Second
GLM-4-Voice | End-to-End Chinese-English Conversational Model
GLM-4.5V and GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning
A Unified Framework for Text-to-3D and Image-to-3D Generation
Audio foundation model excelling in audio understanding
Collection of Gemma 3 variants that are trained for performance
A trainable PyTorch reproduction of AlphaFold 3
Fast, Sharp & Reliable Agentic Intelligence
tiktoken is a fast BPE tokeniser for use with OpenAI's models
Tongyi Deep Research, the Leading Open-source Deep Research Agent
Tool for exploring and debugging transformer model behaviors
Provides convenient access to the Anthropic REST API from any Python 3
An AI-powered security review GitHub Action using Claude
Research code artifacts for Code World Model (CWM)
A state-of-the-art open visual language model
GPT4V-level open-source multi-modal model based on Llama3-8B
Recovering the Visual Space from Any Views
Official implementation of DreamCraft3D
A PyTorch library for implementing flow matching algorithms
GLIDE: a diffusion-based text-conditional image synthesis model
Example Discord bot written in Python that uses the completions API