Qwen2.5-VL is the multimodal large language model series
Lightweight framework for evaluating large language model performance
Meta Agents Research Environments is a comprehensive platform
Code for the paper Language Models are Unsupervised Multitask Learners
This repository provides an advanced RAG
Qwen3-omni is a natively end-to-end, omni-modal LLM
The largest collection of PyTorch image encoders / backbones
Open Source Document Management System for Digital Archives
21 Lessons, Get Started Building with Generative AI
Chat & pretrained large audio language model proposed by Alibaba Cloud
Benchmarking synthetic data generation methods
Implementation of Vision Transformer, a simple way to achieve SOTA
Official code for Style Aligned Image Generation via Shared Attention
A Model Context Protocol server for searching and analyzing arXiv
4M: Massively Multimodal Masked Modeling
Guiding Instruction-based Image Editing via Multimodal Large Language
This repository contains the official implementation of FastVLM
Refer and Ground Anything Anywhere at Any Granularity
Utilities intended for use with Llama models
Agent toolkit providing semantic retrieval and editing capabilities
FAIR Sequence Modeling Toolkit 2
A Production-ready Reinforcement Learning AI Agent Library
PyTorch code and models for V-JEPA self-supervised learning from video
A PyTorch library for implementing flow matching algorithms