Showing 9 open source projects for "image text input"

View related business solutions
  • Auth0 B2B Essentials: SSO, MFA, and RBAC Built In Icon
    Auth0 B2B Essentials: SSO, MFA, and RBAC Built In

    Unlimited organizations, 3 enterprise SSO connections, role-based access control, and pro MFA included. Dev and prod tenants out of the box.

    Auth0's B2B Essentials plan gives you everything you need to ship secure multi-tenant apps. Unlimited orgs, enterprise SSO, RBAC, audit log streaming, and higher auth and API limits included. Add on M2M tokens, enterprise MFA, or additional SSO connections as you scale.
    Sign Up Free
  • Gemini 3 and 200+ AI Models on One Platform Icon
    Gemini 3 and 200+ AI Models on One Platform

    Access Google's best plus Claude, Llama, and Gemma. Fine-tune and deploy from one console.

    Build, govern, and optimize agents and models with Gemini Enterprise Agent Platform.
    Start Free
  • 1
    OpenHuman

    OpenHuman

    Your Personal AI super intelligence. Private, simple and powerful

    ...The project connects to common productivity tools, gathers fresh information from integrations, and organizes user knowledge into a local memory system. It also includes practical agent tools such as web search, web fetching, file access, coding utilities, voice input, text-to-speech, and model routing. Its goal is to make an AI assistant feel continuously useful across meetings, messages, documents, tasks, and personal workflows. Since it is still in early beta, it is best suited for technical users and early adopters who want to experiment with a customizable personal AI environment.
    Downloads: 255 This Week
    Last Update:
    See Project
  • 2
    MetaGPT

    MetaGPT

    The Multi-Agent Framework

    The Multi-Agent Framework: Given one line Requirement, return PRD, Design, Tasks, Repo. Assign different roles to GPTs to form a collaborative software entity for complex tasks. MetaGPT takes a one-line requirement as input and outputs user stories / competitive analysis/requirements/data structures / APIs / documents, etc. Internally, MetaGPT includes product managers/architects/project managers/engineers. It provides the entire process of a software company along with carefully orchestrated SOPs.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 3
    OpenAI Codex CLI

    OpenAI Codex CLI

    Lightweight coding agent that runs in your terminal

    OpenAI Codex CLI is a lightweight, open-source coding assistant that runs directly in your terminal, designed to bring ChatGPT-level reasoning to your code workflows. It allows developers to interactively query, edit, and generate code within their repositories, all while maintaining version control. The CLI can scaffold new files, run code in sandboxed environments, install dependencies, and commit changes automatically, streamlining chat-driven development. It supports various approval...
    Downloads: 95 This Week
    Last Update:
    See Project
  • 4
    TEN

    TEN

    Open-source framework for conversational voice AI agents

    TEN (Transformative Extensions Network) is an open source framework designed to empower developers to build real-time multimodal AI agents capable of voice, video, text, image, and data-stream interaction with ultra-low latency. It includes a full ecosystem, TEN Turn Detection, TEN Agent, and TMAN Designer, allowing developers to rapidly assemble human-like, responsive agents that can see, speak, hear, and interact. With support for languages like Python, C++, and Go, it offers flexible deployment on both edge and cloud environments. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Custom VMs From 1 to 96 vCPUs With 99.95% Uptime Icon
    Custom VMs From 1 to 96 vCPUs With 99.95% Uptime

    General-purpose, compute-optimized, or GPU/TPU-accelerated. Built to your exact specs.

    Live migration and automatic failover keep workloads online through maintenance. One free e2-micro VM every month.
    Try Free
  • 5
    Open-AutoGLM

    Open-AutoGLM

    An open phone agent model & framework

    Open-AutoGLM is an open-source framework and model designed to empower autonomous mobile intelligent assistants by enabling AI agents to understand and interact with phone screens in a multimodal manner, blending vision and language capability to control real devices. It aims to create an “AI phone agent” that can perceive on-screen content, reason about user goals, and execute sequences of taps, swipes, and text input via automated device control interfaces like ADB, enabling hands-off completion of multi-step tasks such as navigating apps, filling forms, and more. Unlike traditional automation scripts that depend on brittle heuristics, Open-AutoGLM uses pretrained large language and vision-language models to interpret visual context and natural language instructions, giving the agent robust adaptability across apps and interfaces.
    Downloads: 12 This Week
    Last Update:
    See Project
  • 6
    MAI-UI

    MAI-UI

    Real-World Centric Foundation GUI Agents

    ...Developed by Tongyi-MAI (Alibaba’s research initiative), the MAI-UI models are multimodal agents trained to understand user instructions and corresponding screenshots, grounding those instructions to on-screen elements and generating sequences of GUI actions such as taps, swipes, text input, and system commands. Unlike traditional UI frameworks, MAI-UI emphasizes realistic deployment by supporting agent–user interaction (clarifying ambiguous instructions), integration with external tool APIs using MCP calls, and a device–cloud collaboration mechanism that dynamically routes computation to on-device or cloud models based on task state and privacy constraints.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    NodeTool

    NodeTool

    Visual AI Workflow Builder

    NodeTool is an open‑source, visual AI workflow builder that lets you connect nodes for text, images, audio, video, data, and automation—then run them locally or on the cloud. Build multi‑step agents, RAG systems, and creative media pipelines without coding, inspect execution in real time, and deploy anywhere: home server, private VPC, RunPod, or Cloud Run. With a local‑first design, NodeTool keeps models and data under your control while still supporting providers like OpenAI, Anthropic,...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 8
    Intelligent Keyword Miner

    Intelligent Keyword Miner

    Intelligent SEO keyword miner and predicing tool

    THIS IS A NETBEANS 8.02 PROJECT ENGLISH ONLY This program was made to help me with the patent research. It simply generates the search keywords, based on your upvotes or a downvotes of the input parameters. It can accept a text or URL (text takes a prescedence over the URL). If you input URL, it goes to a page, and learns its text from HTML format. This program is intelligent as it predicts what you may want to search next, based on your personal trends. After searching the suggestions, you can choose to reset or train it further. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    DGiovanni
    A multi-agent architecture for building interactive dramas. It uses the Jason's BDI engine, being the Jason's agent-oriented programming language utilized for performing the drama management and for authoring behaviors for the characters.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Stop Cyber Threats with VM-Series Next-Gen Firewall on Azure Icon
    Stop Cyber Threats with VM-Series Next-Gen Firewall on Azure

    Native application identity and user-based security for your Azure cloud

    Gain integrated visibility across all traffic in a single pass. Deploy Palo Alto Networks VM-Series to determine application identity and content while automating security policy updates via rich APIs.
    Get a free trial
  • Previous
  • You're on page 1
  • Next
MongoDB Logo MongoDB