Tool for exploring and debugging transformer model behaviors
GLM-4 series: Open Multilingual Multimodal Chat LMs
Large Multimodal Models for Video Understanding and Editing
Generate Any 3D Scene in Seconds
Provides convenient access to the Anthropic REST API from any Python 3
Capable of understanding text, audio, vision, video
Programmatic access to the AlphaGenome model
This repository contains the official implementation of FastVLM
Chinese and English multimodal conversational language model
Research code artifacts for Code World Model (CWM)
Qwen3-omni is a natively end-to-end, omni-modal LLM
Phi-3.5 for Mac: Locally-run Vision and Language Models
OCR expert VLM powered by Hunyuan's native multimodal architecture
A Unified Framework for Text-to-3D and Image-to-3D Generation
Diversity-driven optimization and large-model reasoning ability
Foundation Models for Time Series
A Production-ready Reinforcement Learning AI Agent Library
tiktoken is a fast BPE tokeniser for use with OpenAI's models
The Clay Foundation Model - An open source AI model and interface
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
Real-time behaviour synthesis with MuJoCo, using Predictive Control
Sharp Monocular Metric Depth in Less Than a Second
A state-of-the-art open visual language model
GLM-4.5V and GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning
Chat & pretrained large vision language model