Foundation model for image generation
Contexts Optical Compression
InvokeAI is a leading creative engine for Stable Diffusion models
Taming Stable Diffusion for Lip Sync
Chinese and English multimodal conversational language model
The library to build & auto-optimize LLM applications
PyTorch3D is FAIR's library of reusable components for deep learning
Stable Virtual Camera: Generative View Synthesis with Diffusion Models
Phi-3.5 for Mac: Locally-run Vision and Language Models
Browse the web, directly from Cursor etc.
Automate native Android apps with AI using accessibility APIs
Claude code for everything except coding
[CVPR 2025 Best Paper Award] VGGT
Elyra extends JupyterLab with an AI centric approach
Towards Real-World Vision-Language Understanding
Benchmarking Multimodal Agents for Open-Ended Tasks
Data manipulation and transformation for audio signal processing
GitLab automatic code review tool based on large models
Zero-code platform for building AI agents from natural language input
General-purpose image editing model that delivers high-fidelity
No-code LLM Platform to launch APIs and ETL Pipelines
Inference script for Oasis 500M
Open-source platform for building enterprise-grade agents
Gracefully face hCaptcha challenge with multimodal llms
From Paper to Presentation in One Click