Benchmarking synthetic data generation methods
Deploy and share agents with open infrastructure
4M: Massively Multimodal Masked Modeling
Agent toolkit providing semantic retrieval and editing capabilities
ICLR2024 Spotlight: curation/training code, metadata, distribution
[CVPR 2025 Best Paper Award] VGGT
PyTorch code and models for the DINOv2 self-supervised learning
CogView4, CogView3-Plus and CogView3(ECCV 2024)
No-code multi-agent framework to build LLM Agents, workflows
Chat & pretrained large audio language model proposed by Alibaba Cloud
State-of-the-art Image & Video CLIP, Multimodal Large Language Models
PyTorch code and models for VJEPA2 self-supervised learning from video
Educational framework exploring multi-agent orchestration
A lightweight vision library for performing large object detection
Build cross-modal and multimodal applications on the cloud
A set of Docker images for training and serving models in TensorFlow
Powering Amazon custom machine learning chips
LLMFlows - Simple, Explicit and Transparent LLM Apps
FAIR's research platform for object detection research
OpenMMLab's Next Generation Video Understanding Toolbox and Benchmark
RL implementations
Framework for Accelerating LLM Generation with Multiple Decoding Heads
Code for the paper Hybrid Spectrogram and Waveform Source Separation
A Customizable Image-to-Video Model based on HunyuanVideo
PyTorch Lightning + Hydra. A very user-friendly template