OCR expert VLM powered by Hunyuan's native multimodal architecture
A Unified Framework for Text-to-3D and Image-to-3D Generation
Official inference repo for FLUX.1 models
tiktoken is a fast BPE tokeniser for use with OpenAI's models
Open-source multi-speaker long-form text-to-speech model
Provides convenient access to the Anthropic REST API from any Python 3
Capable of understanding text, audio, vision, video
Generate Any 3D Scene in Seconds
CogView4, CogView3-Plus and CogView3(ECCV 2024)
GLM-4 series: Open Multilingual Multimodal Chat LMs
AlphaFold 3 inference pipeline
Release for Improved Denoising Diffusion Probabilistic Models
Tool for exploring and debugging transformer model behaviors
This repository contains the official implementation of FastVLM
Foundation Models for Time Series
A Production-ready Reinforcement Learning AI Agent Library
Pushing the Limits of Mathematical Reasoning in Open Language Models
Research code artifacts for Code World Model (CWM)
Sharp Monocular Metric Depth in Less Than a Second
VGGSfM: Visual Geometry Grounded Deep Structure From Motion
MapAnything: Universal Feed-Forward Metric 3D Reconstruction
An AI-powered security review GitHub Action using Claude
GPT4V-level open-source multi-modal model based on Llama3-8B
Chinese and English multimodal conversational language model
Chat & pretrained large vision language model