Qwen3-omni is a natively end-to-end, omni-modal LLM
FAIR Sequence Modeling Toolkit 2
VGGSfM: Visual Geometry Grounded Deep Structure From Motion
Phi-3.5 for Mac: Locally-run Vision and Language Models
tiktoken is a fast BPE tokeniser for use with OpenAI's models
Inference framework for 1-bit LLMs
Fast and Universal 3D reconstruction model for versatile tasks
Stable Diffusion WebUI Forge is a platform on top of Stable Diffusion
GPT4V-level open-source multi-modal model based on Llama3-8B
MedicalGPT: Training Your Own Medical GPT Model with ChatGPT Training
Multimodal Diffusion with Representation Alignment
An AI-powered security review GitHub Action using Claude
Stable Virtual Camera: Generative View Synthesis with Diffusion Models
Tooling for the Common Objects In 3D dataset
MapAnything: Universal Feed-Forward Metric 3D Reconstruction
PyTorch code and models for the DINOv2 self-supervised learning
Dataset of GPT-2 outputs for research in detection, biases, and more
Chat & pretrained large vision language model
Chat & pretrained large audio language model proposed by Alibaba Cloud
A series of math-specific large language models of our Qwen2 series
The Clay Foundation Model - An open source AI model and interface
Personalize Any Characters with a Scalable Diffusion Transformer
Inference code for scalable emulation of protein equilibrium ensembles
GLM-4-Voice | End-to-End Chinese-English Conversational Model
A Unified Framework for Text-to-3D and Image-to-3D Generation