"VideoRAG: Chat with Your Videos
Photorealistic Synthetic Dataset for Holistic Indoor Scene
Multimodal embedding and reranking models built on Qwen3-VL
High-resolution models for human tasks
MapAnything: Universal Feed-Forward Metric 3D Reconstruction
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
Free Tailwind CSS UI component library for modern web interfaces
Code and models for ICML 2024 paper, NExT-GPT
Collection of CVPR 2025 papers and open source projects
A blazing fast AI Gateway with integrated guardrails
Foundational Models for State-of-the-Art Speech and Text Translation
Qwen2.5-VL is the multimodal large language model series
GPT4V-level open-source multi-modal model based on Llama3-8B
A Pioneering Open-Source Alternative to GPT-4o
Multi-modal large language model designed for audio understanding
Virtual AI anchor that combines state-of-the-art technology
Embed images and sentences into fixed-length vectors
Transformers4Rec is a flexible and efficient library
Langchain Apps on Production with Jina & FastAPI
Task-oriented finetuning for better embeddings on neural search
Windows-GUI
Implementation of research papers on Deep Learning+ NLP+ CV in Python