GLM-4 series: Open Multilingual Multimodal Chat LMs
ChatGPT interface with better UI
Pushing the Limits of Mathematical Reasoning in Open Language Models
Unified Multimodal Understanding and Generation Models
Capable of understanding text, audio, vision, video
Tool for exploring and debugging transformer model behaviors
GPT4V-level open-source multi-modal model based on Llama3-8B
A Unified Framework for Text-to-3D and Image-to-3D Generation
Provides convenient access to the Anthropic REST API from any Python 3
Large Multimodal Models for Video Understanding and Editing
Generate Any 3D Scene in Seconds
This repository contains the official implementation of FastVLM
Research code artifacts for Code World Model (CWM)
Programmatic access to the AlphaGenome model
Chinese and English multimodal conversational language model
Qwen3-omni is a natively end-to-end, omni-modal LLM
Foundation Models for Time Series
A Production-ready Reinforcement Learning AI Agent Library
tiktoken is a fast BPE tokeniser for use with OpenAI's models
Sharp Monocular Metric Depth in Less Than a Second
Chat & pretrained large vision language model
Diversity-driven optimization and large-model reasoning ability
Phi-3.5 for Mac: Locally-run Vision and Language Models
A state-of-the-art open visual language model
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning