A fast, powerful, and simple hierarchical vision transformer
Code release for Cut and Learn for Unsupervised Object Detection
High-resolution models for human tasks
Towards Real-World Vision-Language Understanding
CLIP, Predict the most relevant text snippet given an image
Ling is a MoE LLM provided and open-sourced by InclusionAI
Lightning-Fast RL for LLM Reasoning and Agents. Made Simple & Flexible
Multimodal-Driven Architecture for Customized Video Generation
Multimodal Diffusion with Representation Alignment
Personalize Any Characters with a Scalable Diffusion Transformer
The NVIDIA AgentIQ toolkit is an open-source library
Extensible AGI Framework
Open Source Generative Process Automation
AI agent that streamlines the entire process of data analysis
SWE-agent takes a GitHub issue and tries to automatically fix it
Finding the Scaling Law of Agents. A multi-agent framework
PraisonAI application combines AutoGen and CrewAI or similar framework
Multilingual Automatic Speech Recognition with word-level timestamps
Superfast AI decision making and processing of multi-modal data
A system for quickly generating training data with weak supervision
Uniform Manifold Approximation and Projection
PyTorch version of Stable Baselines
Beta Machine Learning Toolkit
OpenDAN is an open source Personal AI OS
Gorilla: An API store for LLMs