Join/Login
Business Software
Open Source Software
For Vendors
Blog
About
More

For Vendors Help Create Join Login

Business Software

Open Source Software

SourceForge Podcast

Resources

Articles
Case Studies
Blog

Menu

Help
Create
Join
Login

Home
Open Source Software
Artificial Intelligence
Agentic AI Tools
Search Results

Search Results for "image text input"

x

Sort By:

Relevance

Clear All Filters

OS

Windows 15
Mac 10
Linux 9
More...
BSD 2
ChromeOS 2
Mobile Operating Systems 2

Category

Artificial Intelligence 15
Games 1
Internet 1
Software Development 1

License

OSI-Approved Open Source 12
Creative Commons Attribution License 1

Translations

Programming Language

Python 7
Java 2
Rust 2
C 1
More...
Go 1
TypeScript 1

Status

Alpha 1
Beta 1
Mature 1

Showing 15 open source projects for "image text input"

View related business solutions

Agentic AI Windows Clear Filters & Widen Search

Stop Storing Third-Party Tokens in Your Database
Auth0 Token Vault handles secure token storage, exchange, and refresh for external providers so you don't have to build it yourself.

Rolling your own OAuth token storage can be a security liability. Token Vault securely stores access and refresh tokens from federated providers and handles exchange and renewal automatically. Connected accounts, refresh exchange, and privileged worker flows included.

Try Auth0 for Free
Go From AI Idea to AI App Fast
One platform to build, fine-tune, and deploy ML models. No MLOps team required.

Access Gemini 3 and 200+ models. Build chatbots, agents, or custom models with built-in monitoring and scaling.

Try Free
1

ComfyUI-HunyuanVideoWrapper

ComfyUI wrapper nodes for HunyuanVideo

The ComfyUI-HunyuanVideoWrapper project is a ComfyUI extension that integrates Hunyuan-based multimodal video generation models into node-based workflows. It allows users to generate or manipulate video content by combining text prompts with one or more input images, enabling flexible conditioning of outputs. The system introduces specialized nodes such as text-image encoders that allow multiple image inputs to be referenced directly within prompts. This makes it possible to guide generation using both visual and textual context simultaneously. The wrapper is designed to fit seamlessly into ComfyUI pipelines, enabling chaining with other nodes for advanced workflows. ...

Downloads: 0 This Week

Last Update: 2026-04-16
See Project
2

OpenHuman

Your Personal AI super intelligence. Private, simple and powerful

...The project connects to common productivity tools, gathers fresh information from integrations, and organizes user knowledge into a local memory system. It also includes practical agent tools such as web search, web fetching, file access, coding utilities, voice input, text-to-speech, and model routing. Its goal is to make an AI assistant feel continuously useful across meetings, messages, documents, tasks, and personal workflows. Since it is still in early beta, it is best suited for technical users and early adopters who want to experiment with a customizable personal AI environment.

Downloads: 244 This Week

Last Update: 3 days ago
See Project
3

Magine

A cloud of orchestrated, vision-enabled AI agents

Magine is an experimental AI-powered image generation and manipulation tool designed to provide users with a streamlined interface for creating and modifying visual content using modern generative models. It focuses on simplifying the workflow of prompt-based image generation while integrating additional controls that allow users to refine outputs with more precision than typical text-to-image tools.

Downloads: 0 This Week

Last Update: 5 days ago
See Project
4

MetaGPT

The Multi-Agent Framework

The Multi-Agent Framework: Given one line Requirement, return PRD, Design, Tasks, Repo. Assign different roles to GPTs to form a collaborative software entity for complex tasks. MetaGPT takes a one-line requirement as input and outputs user stories / competitive analysis/requirements/data structures / APIs / documents, etc. Internally, MetaGPT includes product managers/architects/project managers/engineers. It provides the entire process of a software company along with carefully orchestrated SOPs.

Downloads: 3 This Week

Last Update: 2025-03-02
See Project
Enterprise-grade ITSM, for every business
Give your IT, operations, and business teams the ability to deliver exceptional services—without the complexity.

Freshservice is an intuitive, AI-powered platform that helps IT, operations, and business teams deliver exceptional service without the usual complexity. Automate repetitive tasks, resolve issues faster, and provide seamless support across the organization. From managing incidents and assets to driving smarter decisions, Freshservice makes it easy to stay efficient and scale with confidence.

Try it Free
5

ComfyUI-WanVideoWrapper

ComfyUI wrapper nodes for WanVideo and related models

...It acts as a standalone wrapper layer that allows developers and creators to integrate experimental features and models without modifying the core ComfyUI codebase. This design makes it easier to rapidly test new capabilities such as text-to-video and image-to-video generation while avoiding compatibility issues with the main framework. The project supports complex node-based pipelines where users can control sampling, conditioning, and frame continuity across generated sequences. It also enables extended video generation by linking outputs between iterations, allowing for longer and more coherent animations. ...

Downloads: 0 This Week

Last Update: 2026-05-05
See Project
6

OpenAI Codex CLI

Lightweight coding agent that runs in your terminal

OpenAI Codex CLI is a lightweight, open-source coding assistant that runs directly in your terminal, designed to bring ChatGPT-level reasoning to your code workflows. It allows developers to interactively query, edit, and generate code within their repositories, all while maintaining version control. The CLI can scaffold new files, run code in sandboxed environments, install dependencies, and commit changes automatically, streamlining chat-driven development. It supports various approval...

2 Reviews

Downloads: 90 This Week

Last Update: 22 hours ago
See Project
7

TEN

Open-source framework for conversational voice AI agents

TEN (Transformative Extensions Network) is an open source framework designed to empower developers to build real-time multimodal AI agents capable of voice, video, text, image, and data-stream interaction with ultra-low latency. It includes a full ecosystem, TEN Turn Detection, TEN Agent, and TMAN Designer, allowing developers to rapidly assemble human-like, responsive agents that can see, speak, hear, and interact. With support for languages like Python, C++, and Go, it offers flexible deployment on both edge and cloud environments. ...

Downloads: 0 This Week

Last Update: 2026-05-10
See Project
8

CogAgent

An open sourced end-to-end VLM-based GUI Agent

CogAgent is a 9B-parameter bilingual vision-language GUI agent model based on GLM-4V-9B, trained with staged data curation, optimization, and strategy upgrades to improve perception, action prediction, and generalization across tasks. It focuses on operating real user interfaces from screenshots plus text, and follows a strict input–output format that returns structured actions, grounded operations, and optional sensitivity annotations. The model is designed for agent-style execution rather than freeform chat, maintaining a continuous execution history across steps while requiring a fresh session for each new task. Inference supports BF16 on NVIDIA GPUs, with optional INT8 and INT4 modes available but with noted performance loss at INT4; example CLIs and a web demo illustrate bounding-box outputs and operation categories.

Downloads: 0 This Week

Last Update: 5 days ago
See Project
9

Open-AutoGLM

An open phone agent model & framework

Open-AutoGLM is an open-source framework and model designed to empower autonomous mobile intelligent assistants by enabling AI agents to understand and interact with phone screens in a multimodal manner, blending vision and language capability to control real devices. It aims to create an “AI phone agent” that can perceive on-screen content, reason about user goals, and execute sequences of taps, swipes, and text input via automated device control interfaces like ADB, enabling hands-off completion of multi-step tasks such as navigating apps, filling forms, and more. Unlike traditional automation scripts that depend on brittle heuristics, Open-AutoGLM uses pretrained large language and vision-language models to interpret visual context and natural language instructions, giving the agent robust adaptability across apps and interfaces.

Downloads: 11 This Week

Last Update: 2026-03-06
See Project
Full-stack observability with actually useful AI | Grafana Cloud
Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.

Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.

Create free account
10

MAI-UI

Real-World Centric Foundation GUI Agents

...Developed by Tongyi-MAI (Alibaba’s research initiative), the MAI-UI models are multimodal agents trained to understand user instructions and corresponding screenshots, grounding those instructions to on-screen elements and generating sequences of GUI actions such as taps, swipes, text input, and system commands. Unlike traditional UI frameworks, MAI-UI emphasizes realistic deployment by supporting agent–user interaction (clarifying ambiguous instructions), integration with external tool APIs using MCP calls, and a device–cloud collaboration mechanism that dynamically routes computation to on-device or cloud models based on task state and privacy constraints.

Downloads: 0 This Week

Last Update: 2026-04-20
See Project
11

Free AI Watermark Remover - FreeRepair

AI-powered tool to quickly remove watermarks from images flawlessly

AI Watermark Remover (Free And Open-Source) & Make Blurry Images Clearer Or Larger Tool - FreeRepair, Simulation IOPaint Based On The Django Of Python With No Sign-Up. As a free, open-source, AI-powered tool, FreeRepair makes it easy to remove watermarks, logos, text or clutter from images, and blurry images can be made clearer or larger. No installation, no internet connection, it works out of the box, safe and secure, unlimited.

1 Review

Downloads: 38 This Week

Last Update: 2026-03-30
See Project
12

NodeTool

Visual AI Workflow Builder

NodeTool is an open‑source, visual AI workflow builder that lets you connect nodes for text, images, audio, video, data, and automation—then run them locally or on the cloud. Build multi‑step agents, RAG systems, and creative media pipelines without coding, inspect execution in real time, and deploy anywhere: home server, private VPC, RunPod, or Cloud Run. With a local‑first design, NodeTool keeps models and data under your control while still supporting providers like OpenAI, Anthropic,...

Downloads: 1 This Week

Last Update: 2026-01-20
See Project
13

Ai-Assistant

Open-source novel writing & AI coding assistant aggregating top models

This is an open‑source, powerful novel‑writing and AI programming assistant with the following core strengths: Model Aggregation: Natively supports the latest DeepSeek and seamlessly integrates with top‑tier models such as Gemini, Claude, GPT, Tongyi Qianwen, Kimi, and others—both domestic and international—delivering a one‑stop intelligent experience. Multimodal Capability: Accurately interprets images and PDF content, and supports invoking advanced models for high‑quality text‑to‑image generation. Security & Management: Conversation records are encrypted and stored locally (SQLite), with convenient history search, bookmarking, and categorization. Precise Context Control: Individual dialogue entries can be freely edited or deleted, allowing precise mastery of context to elicit optimal AI performance. ...

Downloads: 2 This Week

Last Update: 5 days ago
See Project
14

Intelligent Keyword Miner

Intelligent SEO keyword miner and predicing tool

THIS IS A NETBEANS 8.02 PROJECT ENGLISH ONLY This program was made to help me with the patent research. It simply generates the search keywords, based on your upvotes or a downvotes of the input parameters. It can accept a text or URL (text takes a prescedence over the URL). If you input URL, it goes to a page, and learns its text from HTML format. This program is intelligent as it predicts what you may want to search next, based on your personal trends. After searching the suggestions, you can choose to reset or train it further. ...

Downloads: 0 This Week

Last Update: 2015-03-09
See Project
15

DGiovanni

A multi-agent architecture for building interactive dramas. It uses the Jason's BDI engine, being the Jason's agent-oriented programming language utilized for performing the drama management and for authoring behaviors for the characters.

Downloads: 0 This Week

Last Update: 2013-04-26
See Project

Previous
You're on page 1
Next

Related Searches

codex

ai

windows 10 qcow2

ai assistant

patent

mp3 visualization

recovery

make

gpts

docker

Related Categories

Artificial Intelligence

Games

Internet

Software Development

SourceForge

Create a Project
Open Source Software
Business Software
Top Downloaded Projects

Company

About
Team
SourceForge Headquarters
1320 Columbia Street Suite 310
San Diego, CA 92101
+1 (858) 422-6466

Resources

Support
Site Documentation
Site Status
SourceForge Reviews

© 2026 Slashdot Media. All Rights Reserved.

Terms Privacy Opt Out Advertise