Fast and accurate AI powered file content types detection
A simple, secure MCP-to-OpenAPI proxy server
The most powerful Android RPA agent framework
Implementation of "MobileCLIP" CVPR 2024
A fast, powerful, and simple hierarchical vision transformer
Code release for Cut and Learn for Unsupervised Object Detection
CoreNet: A library for training deep neural networks
High-resolution models for human tasks
Video understanding codebase from FAIR for reproducing video models
Towards Real-World Vision-Language Understanding
CLIP, Predict the most relevant text snippet given an image
Ling is a MoE LLM provided and open-sourced by InclusionAI
Lightning-Fast RL for LLM Reasoning and Agents. Made Simple & Flexible
Research code artifacts for Code World Model (CWM)
A Unified Framework for Text-to-3D and Image-to-3D Generation
Multimodal-Driven Architecture for Customized Video Generation
Multimodal Diffusion with Representation Alignment
Personalize Any Characters with a Scalable Diffusion Transformer
Talk to Your AI Agents from Anywhere
Finetune Llama 3.2, Mistral, Phi & Gemma LLMs 2-5x faster
Open Source Generative Process Automation
AI agent that streamlines the entire process of data analysis
SWE-agent takes a GitHub issue and tries to automatically fix it
Harness LLMs with Multi-Agent Programming
A system for quickly generating training data with weak supervision