Foundation Models for Time Series
tiktoken is a fast BPE tokeniser for use with OpenAI's models
Industrial-level controllable zero-shot text-to-speech system
Towards Real-World Vision-Language Understanding
A Powerful Native Multimodal Model for Image Generation
Diversity-driven optimization and large-model reasoning ability
Large Multimodal Models for Video Understanding and Editing
CLIP, Predict the most relevant text snippet given an image
Mixture-of-Experts Vision-Language Models for Advanced Multimodal
Renderer for the harmony response format to be used with gpt-oss
Towards Ultimate Expert Specialization in Mixture-of-Experts Language
Open-source large language model family from Tencent Hunyuan
Sharp Monocular Metric Depth in Less Than a Second
MapAnything: Universal Feed-Forward Metric 3D Reconstruction
This repository contains the official implementation of FastVLM
PyTorch code and models for the DINOv2 self-supervised learning
Official implementation of DreamCraft3D
Phi-3.5 for Mac: Locally-run Vision and Language Models
Reproduction of Poetiq's record-breaking submission to the ARC-AGI-1
Stable Diffusion WebUI Forge is a platform on top of Stable Diffusion
Unified Multimodal Understanding and Generation Models
An AI-powered security review GitHub Action using Claude
Dataset of GPT-2 outputs for research in detection, biases, and more
Implementation of the Surya Foundation Model for Heliophysics
Implementation of "MobileCLIP" CVPR 2024