image text input free download

Showing 59 open source projects for "image text input"

View related business solutions

Multimedia Python Clear Filters & Widen Search

Compliant and Reliable File Transfers Backed by Top Security Certifications
Cerberus FTP Server delivers SOC 2 Type II certified security and FIPS 140-2 validated encryption.

Stop relying on non-certified, legacy file transfer tools that creak under the weight of modern security demands. Get full audit trails, advanced access controls and more supported by an award-winning team of experts. Start your free 25-day trial today.

Start Free Trial
Stop Storing Third-Party Tokens in Your Database
Auth0 Token Vault handles secure token storage, exchange, and refresh for external providers so you don't have to build it yourself.

Rolling your own OAuth token storage can be a security liability. Token Vault securely stores access and refresh tokens from federated providers and handles exchange and renewal automatically. Connected accounts, refresh exchange, and privileged worker flows included.

Try Auth0 for Free
1

Real-ESRGAN GUI

Cross-platform GUI for image upscaler Real-ESRGAN

...Real-ESRGAN can only enlarge the input image with a fixed 2-4x magnification (related to the selected model). This functionality is achieved by downsampling using a conventional scaling algorithm after multiple calls to Real-ESRGAN. Split each frame of the GIF and record the duration, zoom in one by one and then merge. Drag an image file or directory to any position in the window, and its path can be automatically set as the input.

Downloads: 60 This Week

Last Update: 2024-06-02
See Project
2

IOPaint

Image inpainting tool powered by SOTA AI Model

...Its feature set includes erasing people, watermarks, or defects, adding or replacing objects, applying text-aware edits, and extending images outward (outpainting) to fill contours or expand compositions.

Downloads: 24 This Week

Last Update: 2026-02-03
See Project
3

Dream Textures

Stable Diffusion built-in to Blender

Create textures, concept art, background assets, and more with a simple text prompt. Use the 'Seamless' option to create textures that tile perfectly with no visible seam. Texture entire scenes with 'Project Dream Texture' and depth to image. Re-style animations with the Cycles render pass. Run the models on your machine to iterate without slowdowns from a service. Create textures, concept art, and more with text prompts.

Downloads: 4 This Week

Last Update: 2024-08-26
See Project
4

SpeechRecognition

Speech recognition module for Python

Library for performing speech recognition, with support for several engines and APIs, online and offline. Recognize speech input from the microphone, transcribe an audio file, save audio data to an audio file. Show extended recognition results, calibrate the recognizer energy threshold for ambient noise levels (see recognizer_instance.energy_threshold for details). Listening to a microphone in the background, various other useful recognizer features. The easiest way to install this is using...

Downloads: 9 This Week

Last Update: 2026-04-24
See Project
Try Google Cloud Risk-Free With $300 in Credit
No hidden charges. No surprise bills. Cancel anytime.

Use your credit across every product. Compute, storage, AI, analytics. When it runs out, 20+ products stay free. You only pay when you choose to.

Start Free
5

ChatterBot

Machine learning, conversational dialog engine for creating chat bots

...Additionally, the machine-learning nature of ChatterBot allows an agent instance to improve it’s own knowledge of possible responses as it interacts with humans and other sources of informative data. An untrained instance of ChatterBot starts off with no knowledge of how to communicate. Each time a user enters a statement, the library saves the text that they entered and the text that the statement was in response to. As ChatterBot receives more input the number of responses that it can reply increase.

Downloads: 0 This Week

Last Update: 2026-03-24
See Project
6

Mozc Devices

Circuit diagrams and firmware source code for Gboard DIY keyboards

...These devices creatively reinterpret how users can interact with Japanese text input, blending humor, engineering, and physical computing. The repository serves as an archive of the schematics, firmware, and PCB designs for these inventive input mechanisms, with many projects including promotional videos and technical references.

Downloads: 0 This Week

Last Update: 5 days ago
See Project
7

PersonaLive

Expressive Portrait Image Animation for Live Streaming

PersonaLive is an open-source diffusion-based portrait animation framework focused on generating expressive, long-duration animated sequences in real time, primarily for live streaming or interactive applications. It leverages deep generative models that condition on a static reference image and a driving input (such as motion or expression cues) to produce a seamless animated portrait sequence that can run indefinitely without segmentation artifacts. The framework prioritizes low-latency and streamable output, making it suitable for real-time creative workflows, broadcast overlays, or interactive avatars on consumer-grade GPUs. ...

Downloads: 3 This Week

Last Update: 2026-05-15
See Project
8

Speakr

Speakr is a personal, self-hosted web application

Speakr is an open-source, real-time text-to-speech (TTS) web application that allows users to convert written text into natural-sounding speech in just a few clicks. It provides a clean, user-friendly interface where users can input text, choose a voice style or language, and immediately hear the output, making it ideal for accessibility, content creation, and learning applications.

Downloads: 0 This Week

Last Update: 2026-05-09
See Project
9

PersonaPlex

PersonaPlex code

...PersonaPlex also supports persona and voice control, allowing developers to define the role and speaking style of the agent using text prompts and voice conditioning, making it suitable for applications like customized voice assistants, interactive character agents, or domain-specific conversational tools. Internally, it processes continuous audio streams in a hybrid input format so that speech understanding and generation occur jointly.

Downloads: 1 This Week

Last Update: 2026-03-02
See Project
MongoDB Atlas runs apps anywhere
Deploy in 115+ regions with the modern database for every enterprise.

MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.

Start Free
10

Moshi

A speech-text foundation model for real time dialogue

...Moshi models two streams of audio: one corresponds to Moshi, and the other one to the user. At inference, the stream from the user is taken from the audio input, and the one for Moshi is sampled from the model's output. Along these two audio streams, Moshi predicts text tokens corresponding to its own speech, its inner monologue, which greatly improves the quality of its generation. A small Depth Transformer models inter codebook dependencies for a given time step, while a large, 7B parameter Temporal Transformer models the temporal dependencies.

Downloads: 2 This Week

Last Update: 2024-11-05
See Project
11

Podcastfy.ai

Transforming Multimodal Content into Captivating Multilingual Audio

Podcastfy is an open-source Python package that transforms multi-modal content (text, images) into engaging, multi-lingual audio conversations using GenAI. Input content includes websites, PDFs, youtube videos as well as images. Unlike UI-based tools focused primarily on note-taking or research synthesis (e.g. NotebookLM), Podcastfy focuses on the programmatic and bespoke generation of engaging, conversational transcripts and audio from a multitude of multi-modal sources enabling customization and scale.

Downloads: 0 This Week

Last Update: 2024-11-16
See Project
12

ML Sharp

Sharp Monocular View Synthesis in Less Than a Second

ML Sharp is a research code release that turns a single 2D photograph into a photorealistic 3D representation that can be rendered from nearby viewpoints. Instead of requiring multi-view input, it predicts the parameters of a 3D Gaussian scene representation directly from one image using a single forward pass through a neural network. The core idea is speed: the 3D representation is produced in under a second on a standard GPU, and then the resulting scene can be rendered in real time to generate new views interactively. ...

Downloads: 0 This Week

Last Update: 2026-01-29
See Project
13

Windrecorder

Windrecorder is a memory search app by records everything

Windrecorder is an open-source personal memory search engine that continuously records on-screen activity in a highly optimized and storage-efficient format. It captures screen content locally and builds a searchable database using OCR and image understanding, allowing users to rewind and rediscover anything they have previously seen. The system indexes only meaningful visual changes, extracting text, browser data, and contextual information to improve search accuracy and reduce storage overhead. It includes a web-based interface where users can browse timelines, analyze activity, and perform semantic queries on recorded content. ...

Downloads: 1 This Week

Last Update: 2026-04-24
See Project
14

Mesh R-CNN

code for Mesh R-CNN, ICCV 2019

...Unlike voxel-based or point-based approaches, Mesh R-CNN uses a differentiable mesh representation, allowing it to efficiently refine surface geometry while maintaining high spatial detail. The system combines 2D detection from Mask R-CNN with 3D reasoning modules that output full mesh reconstructions aligned with the input image. It has been evaluated on datasets such as Pix3D, where it demonstrates state-of-the-art performance in reconstructing real-world object geometry.

Downloads: 0 This Week

Last Update: 5 days ago
See Project
15

Segmentation Models

Segmentation models with pretrained backbones. PyTorch

...Preparing your data the same way as during weights pre-training may give you better results (higher metric score and faster convergence). It is not necessary in case you train the whole model, not only the decoder. Pytorch Image Models (a.k.a. timm) has a lot of pretrained models and interface which allows using these models as encoders in smp, however, not all models are supported. Input channels parameter allows you to create models, which process tensors with an arbitrary number of channels.

Downloads: 0 This Week

Last Update: 2025-04-17
See Project
16

AnimateDiff

Plug-n-play module turning text-to-image models into animation

AnimateDiff is an open-source project designed to enhance text-to-image diffusion models by adding animation capabilities. It allows users to turn static images generated by popular text-to-image models into animated sequences without requiring additional model training. This plug-and-play tool is compatible with a wide range of community models and facilitates the generation of animation directly from pre-existing text-to-image models. ...

1 Review

Downloads: 22 This Week

Last Update: 2025-03-06
See Project
17

stmani3

Stereo Photo Manipulation

A set of programs for Alignment and Rendering of still Stereo Photos (3D). This is a Python3 updated version of the old StMani

Downloads: 0 This Week

Last Update: 2024-07-13
See Project
18

xSTUDIO

xSTUDIO is a high performance playback and review tool.

xSTUDIO is a high performance playback and review tool designed by and for Visual Effects, Animation and Post Production professionals. The application can load and play large collections of media files. The efficient playback engine allows you to quickly load and play high resolution image formats with a wide range of file formats and encoding. Intuitive tools allow you to create and organise playlists and media sub-sets within playlists to build interactive review sessions, image and video...

Downloads: 16 This Week

Last Update: 2026-03-21
See Project
19

MLT Multimedia Framework

A multimedia authoring and processing framework and a video playout server for television broadcasting.

17 Reviews

Downloads: 11 This Week

Last Update: 2026-04-22
See Project
20

Color to Waveform

Convert colors to synth presets

The purpose of the program is to convert a color to a waveform you can use as a synthesizer oscillator inside a DAW such as FL Studio from Image Line. Many synths are provided with an option to load your own waveform, to replace the basic saw, square and sine waveforms commonly used to create synth sounds. The waveform generated by the program will correspond to the subliminal synesthetic sensation of the selected color. You can create your own synth presets to use in a track using color as a base.

Downloads: 0 This Week

Last Update: 2024-09-16
See Project
21

CLIP-as-service

Embed images and sentences into fixed-length vectors

...Easy-to-use. No learning curve, minimalist design on client and server. Intuitive and consistent API for image and sentence embedding. Async client support. Easily switch between gRPC, HTTP, WebSocket protocols with TLS and compression. Smooth integration with neural search ecosystem including Jina and DocArray. Build cross-modal and multi-modal solutions in no time.

Downloads: 0 This Week

Last Update: 2023-12-20
See Project
22

DALL-E 2 - Pytorch

Implementation of DALL-E 2, OpenAI's updated text-to-image synthesis

Implementation of DALL-E 2, OpenAI's updated text-to-image synthesis neural network, in Pytorch. The main novelty seems to be an extra layer of indirection with the prior network (whether it is an autoregressive transformer or a diffusion network), which predicts an image embedding based on the text embedding from CLIP. Specifically, this repository will only build out the diffusion prior network, as it is the best performing variant (but which incidentally involves a causal transformer as the denoising network) To train DALLE-2 is a 3 step process, with the training of CLIP being the most important. ...

Downloads: 0 This Week

Last Update: 2023-10-19
See Project
23

Stable Diffusion in Docker

Run the Stable Diffusion releases in a Docker container

...Create an image from an existing image and a text prompt. Modify an existing image with its depth map and a text prompt.

Downloads: 0 This Week

Last Update: 2023-09-22
See Project
24

PicResize

A simple pic resizer

A simple pic resizer working with drag and drop. Drag and drop an image file on a shortcut to the program, input width or height, confirm, find your resized image in the same folder with new dimensions in the file name.

Downloads: 1 This Week

Last Update: 2023-12-09
See Project
25

Image Downloader

Download images from Google, Bing, Baidu

Crawl and download images using Selenium Using python3 and PyQt5. Supported Search Engine: Google, Bing, Baidu. Keywords input from the keyboard or input from line separated keywords list file for batch process. Download image using a customizable number of threads. Fully supported conditional search (eg. filetype:, site:). Switch for Google safe mode. Proxy configuration (socks, HTTP). CMD and GUI ways of using are provided.

Downloads: 0 This Week

Last Update: 2023-04-03
See Project