Implementation of a U-net complete with efficient attention
State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX
Open-Sora: Democratizing Efficient Video Production for All
"Big Model" trains a visual multimodal VLM with 26M parameters
Implementation of 'lightweight' GAN, proposed in ICLR 2021
Personalize Any Characters with a Scalable Diffusion Transformer
An open source implementation of CLIP
Dealing with all unstructured data, such as reverse image search
Unified Multimodal Understanding and Generation Models
Official implementation of DreamCraft3D
Turn your website into a GIF
Sharp Monocular Metric Depth in Less Than a Second
Open source personal AI Assistant for Linux, Windows and Mac
A lightweight vision library for performing large object detection
AutoGluon: AutoML for Image, Text, and Tabular Data
Implementation of "MobileCLIP" CVPR 2024
Implementation of Phenaki Video, which uses Mask GIT
A state-of-the-art open visual language model
A Telegram bot that integrates with OpenAI's official ChatGPT APIs
Multilingual sentence & image embeddings with BERT
Qwen3-omni is a natively end-to-end, omni-modal LLM
Phi-3.5 for Mac: Locally-run Vision and Language Models
Make drawing and labeling bounding boxes easy as cake
We write your reusable computer vision tools
Virtual AI anchor that combines state-of-the-art technology