The most powerful Android RPA agent framework
Create beautiful slides on the web using Claude's frontend skills
Detects phishing and lookalike domains using DNS fuzzing techniques
Foundation model for image generation
Benchmarking Multimodal Agents for Open-Ended Tasks
Master the fundamentals of machine learning, deep learning
Open-source evaluation toolkit of large multi-modality models (LMMs)
All-in-one AI productivity platform with agents, workflows, and IM
A Pioneering Open-Source Alternative to GPT-4o
Official Repo For "Sa2VA: Marrying SAM2 with LLaVA
Extension of Google Research’s PaperBanana
Multimodal Agents as Smartphone Users, an LLM-based multimodal agent
GPT Image 2 prompt gallery, image prompt library, agentic skill
A frontier, first-principles handbook
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
Smart video converter using YOLOv8 and FFmpeg
Phi-3.5 for Mac: Locally-run Vision and Language Models
GitLab automatic code review tool based on large models
A Python library for extracting structured information
Handwritten Text Recognition (HTR) system implemented with TensorFlow
General-purpose image editing model that delivers high-fidelity
RAG-Anything: All-in-One RAG Framework
Motion-controllable Video Generation via Latent Trajectory Guidance
Open-source platform for building enterprise-grade agents
An AI-powered data science team of agents