PaddlePaddle End-to-End Development Toolkit
An open phone agent model & framework
Open-source platform for building enterprise-grade agents
Gracefully face hCaptcha challenge with multimodal llms
Benchmarking Multimodal Agents for Open-Ended Tasks
Phi-3.5 for Mac: Locally-run Vision and Language Models
No-code LLM Platform to launch APIs and ETL Pipelines
Browse the web, directly from Cursor etc.
Gemma open-weight LLM library, from Google DeepMind
Chinese and English multimodal conversational language model
The library to build & auto-optimize LLM applications
Python package for AutoML on Tabular Data with Feature Engineering
Zero-code platform for building AI agents from natural language input
Inference script for Oasis 500M
Stable Virtual Camera: Generative View Synthesis with Diffusion Models
Qwen3-omni is a natively end-to-end, omni-modal LLM
InvokeAI is a leading creative engine for Stable Diffusion models
Automate native Android apps with AI using accessibility APIs
PyTorch3D is FAIR's library of reusable components for deep learning
A frontier, first-principles handbook
Foundation model for image generation
Large-language-model & vision-language-model based on Linear Attention
Data manipulation and transformation for audio signal processing
[NeurIPS 2023] ImageReward: Learning and Evaluating Human Preferences
From Paper to Presentation in One Click