Repo of Qwen2-Audio chat & pretrained large audio language model
High quality, fast, modular reference implementation of SSD in PyTorch
Lightning fast C++/CUDA neural network framework
Hub of ready-to-use datasets for ML models
Database system for building simpler and faster AI-powered application
Fast and accurate AI powered file content types detection
A simple, secure MCP-to-OpenAPI proxy server
The most powerful Android RPA agent framework
Implementation of "MobileCLIP" CVPR 2024
A fast, powerful, and simple hierarchical vision transformer
Code release for Cut and Learn for Unsupervised Object Detection
High-resolution models for human tasks
Video understanding codebase from FAIR for reproducing video models
Towards Real-World Vision-Language Understanding
CLIP, Predict the most relevant text snippet given an image
Ling is a MoE LLM provided and open-sourced by InclusionAI
Lightning-Fast RL for LLM Reasoning and Agents. Made Simple & Flexible
Research code artifacts for Code World Model (CWM)
A Unified Framework for Text-to-3D and Image-to-3D Generation
Multimodal Diffusion with Representation Alignment
Personalize Any Characters with a Scalable Diffusion Transformer
Talk to Your AI Agents from Anywhere
Stable Virtual Camera: Generative View Synthesis with Diffusion Models
Finetune Llama 3.2, Mistral, Phi & Gemma LLMs 2-5x faster
Open Source Generative Process Automation