Qwen-Image-Layered: Layered Decomposition for Inherent Editablity
The Clay Foundation Model - An open source AI model and interface
Controllable & emotion-expressive zero-shot TTS
Sharp Monocular Metric Depth in Less Than a Second
MapAnything: Universal Feed-Forward Metric 3D Reconstruction
Inference framework for 1-bit LLMs
Generate Any 3D Scene in Seconds
GLM-4-Voice | End-to-End Chinese-English Conversational Model
Release for Improved Denoising Diffusion Probabilistic Models
Hackable and optimized Transformers building blocks
PyTorch code and models for the DINOv2 self-supervised learning
Official implementation of DreamCraft3D
Provides convenient access to the Anthropic REST API from any Python 3
Tiny vision language model
OCR expert VLM powered by Hunyuan's native multimodal architecture
This repository contains the official implementation of FastVLM
Pushing the Limits of Mathematical Reasoning in Open Language Models
Research code artifacts for Code World Model (CWM)
Diversity-driven optimization and large-model reasoning ability
Tool for exploring and debugging transformer model behaviors
Foundation Models for Time Series
tiktoken is a fast BPE tokeniser for use with OpenAI's models
Capable of understanding text, audio, vision, video
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
Mixture-of-Experts Vision-Language Models for Advanced Multimodal