image text input free download

Showing 10 open source projects for "image text input"

View related business solutions

Sound/Audio Python Clear Filters & Widen Search

Full-stack observability with actually useful AI | Grafana Cloud
Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.

Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.

Create free account
Our Free Plans just got better! | Auth0
With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.

Try free now
1

SpeechRecognition

Speech recognition module for Python

Library for performing speech recognition, with support for several engines and APIs, online and offline. Recognize speech input from the microphone, transcribe an audio file, save audio data to an audio file. Show extended recognition results, calibrate the recognizer energy threshold for ambient noise levels (see recognizer_instance.energy_threshold for details). Listening to a microphone in the background, various other useful recognizer features. The easiest way to install this is using...

Downloads: 9 This Week

Last Update: 2026-04-24
See Project
2

Speakr

Speakr is a personal, self-hosted web application

Speakr is an open-source, real-time text-to-speech (TTS) web application that allows users to convert written text into natural-sounding speech in just a few clicks. It provides a clean, user-friendly interface where users can input text, choose a voice style or language, and immediately hear the output, making it ideal for accessibility, content creation, and learning applications.

Downloads: 0 This Week

Last Update: 2026-05-09
See Project
3

PersonaPlex

PersonaPlex code

...PersonaPlex also supports persona and voice control, allowing developers to define the role and speaking style of the agent using text prompts and voice conditioning, making it suitable for applications like customized voice assistants, interactive character agents, or domain-specific conversational tools. Internally, it processes continuous audio streams in a hybrid input format so that speech understanding and generation occur jointly.

Downloads: 1 This Week

Last Update: 2026-03-02
See Project
4

Moshi

A speech-text foundation model for real time dialogue

...Moshi models two streams of audio: one corresponds to Moshi, and the other one to the user. At inference, the stream from the user is taken from the audio input, and the one for Moshi is sampled from the model's output. Along these two audio streams, Moshi predicts text tokens corresponding to its own speech, its inner monologue, which greatly improves the quality of its generation. A small Depth Transformer models inter codebook dependencies for a given time step, while a large, 7B parameter Temporal Transformer models the temporal dependencies.

Downloads: 2 This Week

Last Update: 2024-11-05
See Project
Compliant and Reliable File Transfers Backed by Top Security Certifications
Cerberus FTP Server delivers SOC 2 Type II certified security and FIPS 140-2 validated encryption.

Stop relying on non-certified, legacy file transfer tools that creak under the weight of modern security demands. Get full audit trails, advanced access controls and more supported by an award-winning team of experts. Start your free 25-day trial today.

Start Free Trial
5

Podcastfy.ai

Transforming Multimodal Content into Captivating Multilingual Audio

Podcastfy is an open-source Python package that transforms multi-modal content (text, images) into engaging, multi-lingual audio conversations using GenAI. Input content includes websites, PDFs, youtube videos as well as images. Unlike UI-based tools focused primarily on note-taking or research synthesis (e.g. NotebookLM), Podcastfy focuses on the programmatic and bespoke generation of engaging, conversational transcripts and audio from a multitude of multi-modal sources enabling customization and scale.

Downloads: 0 This Week

Last Update: 2024-11-16
See Project
6

Color to Waveform

Convert colors to synth presets

The purpose of the program is to convert a color to a waveform you can use as a synthesizer oscillator inside a DAW such as FL Studio from Image Line. Many synths are provided with an option to load your own waveform, to replace the basic saw, square and sine waveforms commonly used to create synth sounds. The waveform generated by the program will correspond to the subliminal synesthetic sensation of the selected color. You can create your own synth presets to use in a track using color as a base.

Downloads: 0 This Week

Last Update: 2024-09-16
See Project
7

psgdump

Dump psg/ym chip tune files to txt and midi format

PSGDump tool is parser and converter for chip tune files. It supports PSG and YM input file formats, focusing on AY/YM chip tunes from ZX Spectrum and Atari ST. The tool produces text output of notes played and creates multi-track MIDI file.

Downloads: 0 This Week

Last Update: 2022-09-19
See Project
8

EnKoDeur-Mixeur

EnKoDeur-Mixeur (EKD) is an open source software which makes videos, pictures and audio post-production. It can be also used to convert videos in many formats. It is written in python and use the PyQt4 bindings.

1 Review

Downloads: 0 This Week

Last Update: 2013-04-30
See Project
9

tuxshout

Text-User-Interface base SHOUTCast tuner. tuxshout can run comfortablely under low-memory and fossil environment. Supported input device is not only keyboard but also mouse! So you can use like gui.

Downloads: 0 This Week

Last Update: 2015-11-29
See Project
Gemini 3 and 200+ AI Models on One Platform
Access Google's best plus Claude, Llama, and Gemma. Fine-tune and deploy from one console.

Build, govern, and optimize agents and models with Gemini Enterprise Agent Platform.

Start Free
10

cwtext text to morse code converter

Convert text to International Morse Code. Input is ASCII text. Output can be: - . -..- - on the console, raw 8bit PCM suitable for piping to /dev/audio, .wav files or even (mp3|ogg). Good for headlines on your MP3 player or code practice.

7 Reviews

Downloads: 3 This Week

Last Update: 2013-03-22
See Project