Search Results for "image text input" - Page 4

Sort By:

Showing 1564 open source projects for "image text input"

View related business solutions

Linux Clear Filters & Widen Search

Our Free Plans just got better! | Auth0
With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.

Try free now
Try Google Cloud Risk-Free With $300 in Credit
No hidden charges. No surprise bills. Cancel anytime.

Use your credit across every product. Compute, storage, AI, analytics. When it runs out, 20+ products stay free. You only pay when you choose to.

Start Free
1

electerm

Terminal/SSH/SFTP client (Linux, Mac, Win)

...Auth with publicKey + password. Support Zmodem (rz, sz). Support Trzsz (trz/tsz), similar to rz/sz, and compatible with tmux. Transparent window (Mac, Win). Terminal background image. Global/session proxy. Quick commands. UI/terminal theme. Sync bookmarks/themes/quick commands to GitHub/Gitee secret gist. Support serial Port (version > 1.21.8). Quick input to one or all terminals.

Downloads: 218 This Week

Last Update: 3 days ago
See Project
2

OpenAI.fm

Code for openai.fm, a demo for the OpenAI Speech API

...Users can experiment with different input text and voice options directly in their browser, gaining a sense of how high-fidelity AI audio can be integrated into applications ranging from podcasts and narration to accessibility tools and interactive agents. Although the web demo is free to explore, production use of the underlying API requires an OpenAI API key and may incur costs based on usage.

Downloads: 12 This Week

Last Update: 2026-01-28
See Project
3

WhisperLive

A nearly-live implementation of OpenAI's Whisper

WhisperLive is a “nearly live” implementation of OpenAI’s Whisper model focused on real-time transcription. It runs as a server–client system in which the server hosts a Whisper backend and clients stream audio to be transcribed with very low delay. The project supports multiple inference backends, including Faster-Whisper, NVIDIA TensorRT, and OpenVINO, allowing you to target GPUs and different CPU architectures efficiently. It can handle microphone input, pre-recorded audio files, and...

Downloads: 15 This Week

Last Update: 2026-03-17
See Project
4

Skiko

Kotlin Multiplatform bindings to Skia

...It serves as the low-level rendering backbone for Kotlin UI frameworks like Compose for Desktop and Compose for Web, enabling smooth, GPU-accelerated 2D graphics across Windows, macOS, Linux, and other supported targets without writing native code. Skiko abstracts away platform-specific rendering details while exposing Skia’s powerful features such as high-quality text shaping, image filters, path operations, and hardware accelerated canvases, making it ideal for building rich UI components, animations, games, or custom drawing surfaces. By leveraging Skia’s proven performance and cross-platform consistency, Skiko helps developers write a single graphics pipeline that behaves predictably across environments, simplifying maintenance and reducing platform fragmentation.

Downloads: 87 This Week

Last Update: 2026-04-27
See Project
Forever Free Full-Stack Observability | Grafana Cloud
Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.

Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.

Create free account
5

Gemini Next Chat

Deploy your private Gemini application for free with one click

Gemini Next Chat is an open-source web application that allows you to deploy your own private chat interface powered by Google’s Gemini models (e.g., Gemini 1.5, Gemini 2.0, etc.). It is built with Next.js/TypeScript and targets developers and hobbyists who want a self-hosted solution for interacting with advanced multimodal models (text, image, voice). It supports features like image recognition, voice-based conversation, plugins (web search, ArXiv search, weather, etc.), and client apps (tray app) for greater convenience. The project emphasizes “one-click” deployment, aiming to make it easy to spin up a custom chat front end without deep infra-setup. It’s licensed under MIT and has an active community of contributors; documentation and release notes note support for newer features like mixed image+text generation. ...

Downloads: 1 This Week

Last Update: 2025-11-24
See Project
6

YAML

JavaScript parser and stringifier for YAML

yaml is a definitive library for YAML, the human friendly data serialization standard. This library supports both YAML 1.1 and YAML 1.2 and all common data schemas, passes all of the yaml-test-suite tests. It can accept any string as input without throwing, parsing as much YAML out of it as it can, and supports parsing, modifying, and writing YAML comments and blank lines. The library is released under the ISC open source license, and the code is available on GitHub. It has no external...

Downloads: 8 This Week

Last Update: 2 days ago
See Project
7

PlantUML

Generate diagrams from textual description

...The easiest way to test PlantUML is in an online solution that has PlantUML embedded, such as our online server. After testing, you may want to install PlantUML locally. Run (or have your software call) PlantUML, using sequenceDiagram.txt as input. The output is an image, which either appears in the other software, or is written to an image file on disk. Diagrams are defined using a simple and intuitive language. (see PlantUML Language Reference Guide). Images can be generated in PNG, in SVG or in LaTeX format. It is also possible to generate ASCII art diagrams (only for sequence diagrams).

Downloads: 41 This Week

Last Update: 2026-02-27
See Project
8

InternVL

A Pioneering Open-Source Alternative to GPT-4o

...The project focuses on scaling vision models and aligning them with large language models so that they can perform tasks involving both visual and textual information. InternVL is trained on massive collections of image-text data, enabling it to learn representations that capture both visual patterns and semantic meaning. The model supports a wide variety of tasks, including visual perception, image classification, and cross-modal retrieval between images and text. It can also be connected to language models to enable conversational interfaces that understand images, videos, and other visual content. ...

Downloads: 0 This Week

Last Update: 2026-03-04
See Project
9

ChatterBot

Machine learning, conversational dialog engine for creating chat bots

...Additionally, the machine-learning nature of ChatterBot allows an agent instance to improve it’s own knowledge of possible responses as it interacts with humans and other sources of informative data. An untrained instance of ChatterBot starts off with no knowledge of how to communicate. Each time a user enters a statement, the library saves the text that they entered and the text that the statement was in response to. As ChatterBot receives more input the number of responses that it can reply increase.

Downloads: 1 This Week

Last Update: 2026-03-24
See Project
Go From AI Idea to AI App Fast
One platform to build, fine-tune, and deploy ML models. No MLOps team required.

Access Gemini 3 and 200+ models. Build chatbots, agents, or custom models with built-in monitoring and scaling.

Try Free
10

Terminal.Gui

Console-based user interface toolkit for .NET applications

A toolkit for building console GUI apps for .NET, .NET Core, and Mono that works on Windows, the Mac, and Linux/Unix. In addition, a complete Xterm/Vt100 terminal emulator that you can embed is now part of XtermSharp, you just need to pull TerminalView.cs into your project. Works on Windows, Mac, and Linux. Terminal drivers for Curses, Windows Console, and the .NET Console mean Terminal.Gui works well on both color and monochrome terminals and has mouse support on terminal emulators that...

Downloads: 19 This Week

Last Update: 6 days ago
See Project
11

PhotoEditor

A Photo Editor library with simple, easy support for image editing

A Photo Editor library with simple, easy support for image editing using Paints, Text, Filters, Emoji and Sticker like stories. Drawing on the image with the option to change its Brush's Color, Size, Opacity, Erasing and basic shapes. Apply Filter Effect on the image using MediaEffect. Adding/Editing Text with the option to change its Color with Custom Fonts. Adding Emoji with Custom Emoji Fonts.

Downloads: 2 This Week

Last Update: 2026-03-12
See Project
12

Streamdown

Streaming markdown renderer for AI apps with smooth updates

Streamdown is a lightweight rendering library designed to display streaming Markdown content in real time, making it particularly useful for AI-powered applications that generate text incrementally. It focuses on providing a smooth and visually stable experience while content is being appended, avoiding layout shifts that can disrupt readability. Streamdown is built to handle partial Markdown input gracefully, progressively enhancing the output as more text becomes available. It is especially relevant for chat interfaces, coding assistants, and any environment where responses are streamed token by token. ...

Downloads: 0 This Week

Last Update: 2026-03-18
See Project
13

LocalAI

The free, Open Source alternative to OpenAI, Claude and others

...It acts as a drop-in replacement for APIs such as OpenAI, enabling developers to build AI-powered applications without relying on external cloud services. The platform supports a wide range of model types, including text generation, image creation, speech processing, and embeddings. LocalAI can run on consumer-grade hardware and does not necessarily require a GPU, making it accessible for local development and private deployments. It integrates with multiple backends like llama.cpp, transformers, and diffusers to support different AI workloads. With its self-hosted architecture and OpenAI-compatible API, LocalAI enables developers to build secure, local-first AI applications.

Downloads: 29 This Week

Last Update: 2026-04-07
See Project
14

react-markdown-editor-lite

A light-weight Markdown editor based on React

A light-weight(20KB zipped) Markdown editor of React component. Supports TypeScript. Supports custom markdown parser. Full markdown support. Supports pluggable function bars. Full control over UI. Supports image uploading and dragging. Supports synced scrolling between editor and preview.

Downloads: 0 This Week

Last Update: 2026-01-21
See Project
15

Final Cut

A text-based widget toolkit

Library for creating terminal applications with text-based widgets. FINAL CUT is a C++ class library and widget toolkit with full mouse support for creating a text-based user interface. The library supports the programmer to develop an application for the text console. It allows the simultaneous handling of multiple text windows on the screen. The structure of the Qt framework was originally the inspiration for the C++ class design of FINAL CUT.

Downloads: 5 This Week

Last Update: 2024-07-27
See Project
16

Markdown

WeChat Markdown Editor

WeChat Markdown Editor | A highly concise WeChat Markdown editor, that supports Markdown syntax, color palette selection, multi-image upload, one-click document download, custom CSS style, one-click reset, and other features. Markdown documents are automatically rendered into WeChat graphics and text in real-time, so you no longer have to worry about the typesetting of WeChat articles! As long as you know the basic Markdown syntax, you can make a simple and beautiful WeChat graphic. ...

Downloads: 1 This Week

Last Update: 2025-10-17
See Project
17

nut.js

Native UI testing / controlling with node

nut.js gives you full control over your mouse. Move, click or drag your cursor where you need it! Press (and hold) single keys or type pages of text, nut.js handles both! It allows for native UI interactions via keyboard and/or mouse but additionally gives you the possibility to navigate the screen based on image matching. nut.js gives you access to your system clipboard. Copy and paste text as you go! Retrieve info about open windows to improve your tests or workflows. nut.js provides plug-ins to perform on-screen image search, the key component for visual testing or image-based automation! ...

Downloads: 0 This Week

Last Update: 2024-04-10
See Project
18

ImageReward

[NeurIPS 2023] ImageReward: Learning and Evaluating Human Preferences

ImageReward is the first general-purpose human preference reward model (RM) designed for evaluating text-to-image generation, introduced alongside the NeurIPS 2023 paper ImageReward: Learning and Evaluating Human Preferences for Text-to-Image Generation. Trained on 137k expert-annotated image pairs, ImageReward significantly outperforms existing scoring methods like CLIP, Aesthetic, and BLIP in capturing human visual preferences. It is provided as a Python package (image-reward) that enables quick scoring of generated images against textual prompts, with APIs for ranking, scoring, and filtering outputs. ...

Downloads: 0 This Week

Last Update: 2 days ago
See Project
19

Screenshot to Code

A neural network that transforms a design mock-up into static websites

Screenshot-to-code is a tool or prototype that attempts to convert UI screenshots (e.g., of mobile or web UIs) into code representations, likely generating layouts, HTML, CSS, or markup from image inputs. It is part of a research/proof-of-concept domain in UI automation and image-to-UI code generation. Mapping visual design to code constructs. Code/UI layout (HTML, CSS, or markup). Examples/demo scripts showing “image UI code”.

Downloads: 0 This Week

Last Update: 2025-09-26
See Project
20

Mozc Devices

Circuit diagrams and firmware source code for Gboard DIY keyboards

...These devices creatively reinterpret how users can interact with Japanese text input, blending humor, engineering, and physical computing. The repository serves as an archive of the schematics, firmware, and PCB designs for these inventive input mechanisms, with many projects including promotional videos and technical references.

Downloads: 1 This Week

Last Update: 1 day ago
See Project
21

CycleGAN and pix2pix in PyTorch

Image-to-Image Translation in PyTorch

CycleGAN and pix2pix in PyTorch repository is a PyTorch implementation of two influential image-to-image translation frameworks: CycleGAN (for unpaired translation) and pix2pix (for paired translation). This repo gives developers and researchers a convenient, modern (PyTorch-based) platform to train and test these methods — supporting both paired datasets (input to output) and unpaired datasets (domain-to-domain) with minimal changes.

Downloads: 2 This Week

Last Update: 2025-12-09
See Project
22

Brotli

Brotli compression format

Version 1.0.9 contains a fix to "integer overflow" problem. This happens when "one-shot" decoding API is used (or input chunk for streaming API is not limited), input size (chunk size) is larger than 2GiB, and input contains uncompressed blocks. After the overflow happens, memcpy is invoked with a gigantic num value, that will likely cause the crash. Brotli is a generic-purpose lossless compression algorithm that compresses data using a combination of a modern variant of the LZ77 algorithm,...

Downloads: 36 This Week

Last Update: 2025-10-27
See Project
23

FastSD CPU

Fast stable diffusion on CPU and AI PC

FastSD CPU is an optimized fork of Stable Diffusion designed to run efficiently on CPUs and devices without dedicated GPUs by leveraging Latent Consistency Models and Adversarial Diffusion Distillation techniques that accelerate inference. It focuses on bringing fast text-to-image generation to mainstream hardware like desktop CPUs, lower-end laptops, or edge devices without requiring high-end graphics processors. The repository contains multiple interfaces including a desktop GUI for simple generation, an advanced web-based UI with support for extensions like LoRA and ControlNet, and a command-line interface for scripted usage or server deployments. ...

Downloads: 49 This Week

Last Update: 2 days ago
See Project
24

Whisper-WebUI

A Web UI for easy subtitle using whisper model

Whisper WebUI is an open-source browser-based interface that simplifies the use of Whisper speech recognition models by providing an intuitive graphical environment for transcription, translation, and subtitle generation. Built with Gradio, it allows users to upload audio or video files, process them locally, and generate accurate text outputs without relying on command-line tools. The platform integrates optimized implementations such as faster-whisper, significantly improving transcription speed and reducing memory usage compared to standard models. It supports multiple input sources including local files, YouTube content, and microphone input, making it versatile for different workflows. ...

Downloads: 8 This Week

Last Update: 2026-03-18
See Project
25

CogView4

CogView4, CogView3-Plus and CogView3(ECCV 2024)

CogView4 is the latest generation in the CogView series of vision-language foundation models, developed as a bilingual (Chinese and English) open-source system for high-quality image understanding and generation. Built on top of the GLM framework, it supports multimodal tasks including text-to-image synthesis, image captioning, and visual reasoning. Compared to previous CogView versions, CogView4 introduces architectural upgrades, improved training pipelines, and larger-scale datasets, enabling stronger alignment between textual prompts and generated visual content. ...

Downloads: 1 This Week

Last Update: 4 hours ago
See Project