Audio foundation model excelling in audio understanding
Phi-3.5 for Mac: Locally-run Vision and Language Models
INT4/INT5/INT8 and FP16 inference on CPU for RWKV language model
Revolutionizing Database Interactions with Private LLM Technology
DeepMind model for tracking arbitrary points across videos & robotics
State-of-the-art Image & Video CLIP, Multimodal Large Language Models
GPT4V-level open-source multi-modal model based on Llama3-8B
The ChatGPT Retrieval Plugin lets you easily find personal documents
Repo for SeedVR2 & SeedVR
The official PyTorch implementation of Google's Gemma models
LLM-based Reinforcement Learning audio edit model
Diversity-driven optimization and large-model reasoning ability
Chat & pretrained large vision language model
Hunyuan Translation Model Version 1.5
Block Diffusion for Ultra-Fast Speculative Decoding
Multimodal embedding and reranking models built on Qwen3-VL
Implementation of "MobileCLIP" CVPR 2024
High-resolution models for human tasks
CLIP, Predict the most relevant text snippet given an image
Ling is a MoE LLM provided and open-sourced by InclusionAI
Multimodal-Driven Architecture for Customized Video Generation
Personalize Any Characters with a Scalable Diffusion Transformer
Large-language-model & vision-language-model based on Linear Attention
MedicalGPT: Training Your Own Medical GPT Model with ChatGPT Training
Inference script for Oasis 500M