This repository contains the official implementation of FastVLM
OCR expert VLM powered by Hunyuan's native multimodal architecture
CLIP, Predict the most relevant text snippet given an image
Ling is a MoE LLM provided and open-sourced by InclusionAI
GLM-4.5: Open-source LLM for intelligent agents by Z.ai
High-Fidelity and Controllable Generation of Textured 3D Assets
Qwen3-omni is a natively end-to-end, omni-modal LLM
Real-time behaviour synthesis with MuJoCo, using Predictive Control
A PyTorch library for implementing flow matching algorithms
Global weather forecasting model using graph neural networks and JAX
Official implementation of DreamCraft3D
Implementation of "MobileCLIP" CVPR 2024
Designed for text embedding and ranking tasks
tiktoken is a fast BPE tokeniser for use with OpenAI's models
The official PyTorch implementation of Google's Gemma models
GLM-4 series: Open Multilingual Multimodal Chat LMs
Repo of Qwen2-Audio chat & pretrained large audio language model
Language modeling in a sentence representation space
Towards Ultimate Expert Specialization in Mixture-of-Experts Language
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
The ChatGPT Retrieval Plugin lets you easily find personal documents
Inference script for Oasis 500M
ICLR2024 Spotlight: curation/training code, metadata, distribution
Large-language-model & vision-language-model based on Linear Attention
Open-source framework for intelligent speech interaction