Browse the web, directly from Cursor etc.
A Pioneering Open-Source Alternative to GPT-4o
Phi-3.5 for Mac: Locally-run Vision and Language Models
Label Studio is a multi-type data labeling and annotation tool
Extension of Google Research’s PaperBanana
Multimodal Agents as Smartphone Users, an LLM-based multimodal agent
A frontier, first-principles handbook
Modular quant framework
Taming Stable Diffusion for Lip Sync
Chinese and English multimodal conversational language model
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
Qwen3-omni is a natively end-to-end, omni-modal LLM
GitLab automatic code review tool based on large models
Agent Skill for generating 2D sprite sheets and map, transparent PNG
General-purpose image editing model that delivers high-fidelity
Handwritten Text Recognition (HTR) system implemented with TensorFlow
Motion-controllable Video Generation via Latent Trajectory Guidance
Python package for AutoML on Tabular Data with Feature Engineering
An open phone agent model & framework
Open-source platform for building enterprise-grade agents
InvokeAI is a leading creative engine for Stable Diffusion models
Claude code for everything except coding
ComfyUI wrapper nodes for HunyuanVideo
From Paper to Presentation in One Click
Zero-code platform for building AI agents from natural language input