Showing 1002 open source projects for "text to"

View related business solutions
  • Outgrown Windows Task Scheduler? Icon
    Outgrown Windows Task Scheduler?

    Free diagnostic identifies where your workflow is breaking down—with instant analysis of your scheduling environment.

    Windows Task Scheduler wasn't built for complex, cross-platform automation. Get a free diagnostic that shows exactly where things are failing and provides remediation recommendations. Interactive HTML report delivered in minutes.
    Download Free Tool
  • AI-generated apps that pass security review Icon
    AI-generated apps that pass security review

    Stop waiting on engineering. Build production-ready internal tools with AI—on your company data, in your cloud.

    Retool lets you generate dashboards, admin panels, and workflows directly on your data. Type something like “Build me a revenue dashboard on my Stripe data” and get a working app with security, permissions, and compliance built in from day one. Whether on our cloud or self-hosted, create the internal software your team needs without compromising enterprise standards or control.
    Try Retool free
  • 1
    Recognizers-Text

    Recognizers-Text

    Recognition and resolution of numbers, units, date/time, etc.

    Recognizers-Text is a multilingual text recognition library that extracts structured information such as dates, numbers, and currency values from unstructured text.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    Obsidian Text Generator Plugin

    Obsidian Text Generator Plugin

    Text generator is a handy plugin for Obsidian

    Text Generator is an open-source AI Assistant Tool that brings the power of Generative Artificial Intelligence to the power of knowledge creation and organization in Obsidian. For example, use Text Generator to generate ideas, attractive titles, summaries, outlines, and whole paragraphs based on your knowledge database.
    Downloads: 6 This Week
    Last Update:
    See Project
  • 3
    MCP Text Editor

    MCP Text Editor

    Provides line-oriented text file editing capabilities

    The MCP Text Editor Server provides line-oriented text file editing capabilities through a standardized API, optimized for integration with Large Language Models (LLMs). It enables efficient partial file access, minimizing token usage while ensuring safe concurrent editing.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 4
    Text Generation Inference

    Text Generation Inference

    Large Language Model Text Generation Inference

    Text Generation Inference is a high-performance inference server for text generation models, optimized for Hugging Face's Transformers. It is designed to serve large language models efficiently with optimizations for performance and scalability.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Atera all-in-one platform IT management software with AI agents Icon
    Atera all-in-one platform IT management software with AI agents

    Ideal for internal IT departments or managed service providers (MSPs)

    Atera’s AI agents don’t just assist, they act. From detection to resolution, they handle incidents and requests instantly, taking your IT management from automated to autonomous.
    Learn More
  • 5
    Text Generation Web UI

    Text Generation Web UI

    Oobabooga - The definitive Web UI for local AI, with powerful features

    ...Markdown output for GALACTICA, including LaTeX rendering. Custom chat characters. Advanced chat features (send images, get audio responses with TTS). Very efficient text streaming. Parameter presets, 8-bit mode. Layers splitting across GPU(s), CPU, and disk. CPU mode, FlexGen, DeepSpeed ZeRO-3, API with streaming and without streaming. LLaMA model, including 4-bit GPTQ. RWKV model, LoRA (loading and training), Softprompts, and extensions.
    Downloads: 10 This Week
    Last Update:
    See Project
  • 6

    Tesseract OCR

    Open Source OCR Engine

    Tesseract is an open source OCR or optical character recognition engine and command line program. OCR is a technology that allows for the recognition of text characters within a digital image. With the latest version of Tesseract, there is a greater focus on line recognition, however it still supports the legacy Tesseract OCR engine which recognizes character patterns. Tesseract can recognize over 100 languages out-of-the-box, and can be trained to recognize other languages. It supports various output formats, including plain text, HTML, PDF and more. ...
    Downloads: 2,515 This Week
    Last Update:
    See Project
  • 7
    sherpa-onnx

    sherpa-onnx

    Speech-to-text, text-to-speech, and speaker recognition

    Speech-to-text, text-to-speech, and speaker recognition using next-gen Kaldi with onnxruntime without an Internet connection. Support embedded systems, Android, iOS, Raspberry Pi, RISC-V, x86_64 servers, websocket server/client, C/C++, Python, Kotlin, C#, Go, NodeJS, Java, Swift, Dart, JavaScript, Flutter.
    Downloads: 54 This Week
    Last Update:
    See Project
  • 8
    OCRmyPDF

    OCRmyPDF

    OCRmyPDF adds an OCR text layer to scanned PDF files

    OCRmyPDF adds an optical character recognition (OCR) text layer to scanned PDF files, allowing them to be searched. PDF is the best format for storing and exchanging scanned documents. Unfortunately, PDFs can be difficult to modify. OCRmyPDF makes it easy to apply image processing and OCR (recognized, searchable text) to existing PDFs.
    Downloads: 116 This Week
    Last Update:
    See Project
  • 9
    abogen

    abogen

    Generate audiobooks from EPUBs, PDFs and text with captions

    abogen is a tool designed to generate audiobooks (or speech narrations) from textual sources such as EPUBs, PDFs, or plain text, with synchronized captions. In other words, it automates the pipeline of reading a digital book (or document), converting its text into speech via a TTS engine, and packaging the result into an audiobook format — likely along with timestamped captions or subtitles that align with the spoken audio. This can be very useful for accessibility, content consumption on the go, or for users who prefer audio over reading. ...
    Downloads: 15 This Week
    Last Update:
    See Project
  • Payments you can rely on to run smarter. Icon
    Payments you can rely on to run smarter.

    Never miss a sale. Square payment processing serves customers better with tools and integrations that make work more efficient.

    Accept payments at your counter or on the go. It’s easy to get started. Try the Square POS app on your phone or pick from a range of hardworking hardware.
    Learn More
  • 10
    Wan2.2

    Wan2.2

    Wan2.2: Open and Advanced Large-Scale Video Generative Model

    ...The model is trained on significantly larger datasets than its predecessor, greatly enhancing motion complexity, semantic understanding, and aesthetic diversity. Wan2.2 also open-sources a 5-billion parameter high-compression VAE-based hybrid text-image-to-video (TI2V) model that supports 720P video generation at 24fps on consumer-grade GPUs like the RTX 4090. It supports multiple video generation tasks including text-to-video.
    Downloads: 217 This Week
    Last Update:
    See Project
  • 11
    Handy STT

    Handy STT

    A free, open source, and extensible speech-to-text application

    Handy is a free, open-source, offline speech-to-text application built for privacy, accessibility, and extensibility. Developed using Tauri (Rust + React/TypeScript), it runs natively across Windows, macOS, and Linux while performing local speech recognition without sending any audio to cloud servers. Handy allows users to start transcription instantly using a configurable keyboard shortcut—press to record, release to transcribe—and automatically pastes the resulting text into any active text field. ...
    Downloads: 21 This Week
    Last Update:
    See Project
  • 12
    Pot Desktop

    Pot Desktop

    A cross-platform software for text translation and recognition

    Pot-Desktop is a cross-platform productivity tool aimed at helping users quickly translate, perform OCR (optical character recognition), and synthesize speech for selected text or images — all with minimal friction. It supports picking text via mouse selection (“highlight-and-translate”), clipboard listening, or screenshot-based OCR; this makes it ideal for reading webpages, documents, images — or any on-screen text — and instantly getting translations or text extraction. The tool supports external plugin extensions, which means its functionality can be expanded far beyond the built-in options: you can add translation engines, OCR backends, TTS engines, vocabulary export (e.g. for language learning), and more. ...
    Downloads: 12 This Week
    Last Update:
    See Project
  • 13
    SAM 3

    SAM 3

    Code for running inference and finetuning with SAM 3 model

    SAM 3 (Segment Anything Model 3) is a unified foundation model for promptable segmentation in both images and videos, capable of detecting, segmenting, and tracking objects. It accepts both text prompts (open-vocabulary concepts like “red car” or “goalkeeper in white”) and visual prompts (points, boxes, masks) and returns high-quality masks, boxes, and scores for the requested concepts. Compared with SAM 2, SAM 3 introduces the ability to exhaustively segment all instances of an open-vocabulary concept specified by a short phrase or exemplars, scaling to a vastly larger set of categories than traditional closed-set models. ...
    Downloads: 101 This Week
    Last Update:
    See Project
  • 14
    Readest

    Readest

    Readest is a modern, feature-rich ebook reader

    ...Although the repository is not as widely documented or popular as some, the idea is that Readest supports features to help with reading comprehension — likely combining OCR / text retrieval, translation, note-taking, or summarization for reading materials (eBooks, articles, PDFs). The goal appears to be to let users feed in arbitrary reading material and then interact with it (highlighting, translation, lookup, maybe TTS or summarization) more comfortably. Because of that, it's oriented towards learners, researchers, or people dealing with multilingual documents — especially when they need to rapidly digest or reference large amounts of text.
    Downloads: 16 This Week
    Last Update:
    See Project
  • 15

    PaddleOCR

    Awesome multilingual OCR toolkits based on PaddlePaddle

    PaddleOCR offers exceptional, multilingual, and practical Optical Character Recognition (OCR) tools that can help users train better models and apply them into practice. Inspired by PaddlePaddle, PaddleOCR is an ultra lightweight OCR system, with multilingual recognition, digit recognition, vertical text recognition, as well as long text recognition. It features a PPOCR series of high-quality pre-trained models, which includes: ultra lightweight ppocr_mobile series models, general ppocr_server series models, and ultra lightweight compression ppocr_mobile_slim series models. PaddleOCR is easy to install and easy to use on Windows, Linux, MacOS and other systems.
    Downloads: 41 This Week
    Last Update:
    See Project
  • 16
    Voice-Pro

    Voice-Pro

    Comprehensive Gradio WebUI for audio processing

    Voice-Pro is the best gradio WebUI for transcription, translation and text-to-speech. It can be easily installed with one click. Create a virtual environment using Miniconda, running completely separate from the Windows system (fully portable). Supports real-time transcription and translation, as well as batch mode.
    Downloads: 13 This Week
    Last Update:
    See Project
  • 17
    Fooocus

    Fooocus

    Focus on prompting and generating

    Fooocus is an open-source image generation software that simplifies the process of creating images from text prompts. Built on Gradio and leveraging Stable Diffusion XL, Fooocus eliminates the need for manual parameter tweaking, allowing users to focus solely on crafting prompts. It offers a user-friendly interface with minimal setup, making advanced image synthesis accessible to a broader audience.
    Downloads: 217 This Week
    Last Update:
    See Project
  • 18
    Speech Note

    Speech Note

    Speech Note Linux app. Note taking, reading and translating

    Speech Note is a Linux desktop and Sailfish OS application for taking, reading, and translating notes with integrated offline speech technology. It combines speech-to-text, text-to-speech, and machine translation in a single interface, allowing users to dictate notes, listen back to them, and translate them without ever sending data to the cloud. All processing is done locally, which means audio, text, and translations never leave the device, emphasizing strong privacy guarantees. The application supports multiple STT engines such as Coqui STT (DeepSpeech fork), Vosk, whisper.cpp, Faster Whisper, and april-asr, giving users flexibility in accuracy, speed, and hardware requirements. ...
    Downloads: 11 This Week
    Last Update:
    See Project
  • 19
    Wan2.1

    Wan2.1

    Wan2.1: Open and Advanced Large-Scale Video Generative Model

    Wan2.1 is a foundational open-source large-scale video generative model developed by the Wan team, providing high-quality video generation from text and images. It employs advanced diffusion-based architectures to produce coherent, temporally consistent videos with realistic motion and visual fidelity. Wan2.1 focuses on efficient video synthesis while maintaining rich semantic and aesthetic detail, enabling applications in content creation, entertainment, and research. The model supports text-to-video and image-to-video generation tasks with flexible resolution options suitable for various GPU hardware configurations. ...
    Downloads: 82 This Week
    Last Update:
    See Project
  • 20
    DeepSeek-OCR

    DeepSeek-OCR

    Contexts Optical Compression

    DeepSeek-OCR is an open-source optical character recognition solution built as part of the broader DeepSeek AI vision-language ecosystem. It is designed to extract text from images, PDFs, and scanned documents, and integrates with multimodal capabilities that understand layout, context, and visual elements beyond raw character recognition. The system treats OCR not simply as “read the text” but as “understand what the text is doing in the image”—for example distinguishing captions from body text, interpreting tables, or recognizing handwritten versus printed words. ...
    Downloads: 14 This Week
    Last Update:
    See Project
  • 21
    MeloTTS

    MeloTTS

    High-quality multi-lingual text-to-speech library by MyShell.ai

    MeloTTS is an open-source text-to-speech (TTS) system that generates natural-sounding speech from text input. It utilizes advanced machine-learning models to produce high-quality audio outputs.
    Downloads: 6 This Week
    Last Update:
    See Project
  • 22
    MLX-Audio

    MLX-Audio

    A text-to-speech, speech-to-text and speech-to-speech library

    MLX-Audio is a speech library built on Apple’s MLX framework and optimized for Apple Silicon machines (M-series Macs). It focuses on text-to-speech and speech-to-speech workflows, with APIs and a command-line interface that make it easy to generate high-quality audio from text. Because it uses MLX and targets Apple Silicon, inference is fast and can take advantage of hardware acceleration and quantization for efficient on-device performance. The project provides a straightforward CLI (mlx_audio.tts.generate) as well as a Python API for programmatic generation of audio, including parameters for voice choice, speed, language hints, output format, and sample rate. ...
    Downloads: 12 This Week
    Last Update:
    See Project
  • 23
    HeartMuLa

    HeartMuLa

    A Family of Open Sourced Music Foundation Models

    ...The project also includes HeartCodec, a music codec optimized for high reconstruction fidelity, enabling efficient tokenization and reconstruction workflows that are critical for training and generation pipelines. For text extraction from audio, it provides HeartTranscriptor, a Whisper-based model tuned specifically for lyrics transcription, which helps bridge generated or recorded audio back into structured text. It also introduces HeartCLAP, which aligns audio and text into a shared embedding space.
    Downloads: 34 This Week
    Last Update:
    See Project
  • 24
    OpenAI.fm

    OpenAI.fm

    Code for openai.fm, a demo for the OpenAI Speech API

    OpenAI.fm is an official interactive demo application built to showcase the OpenAI Speech API and its advanced text-to-speech capabilities, providing developers and creators with a hands-on web interface to convert text into high-quality, customizable audio using state-of-the-art TTS models. Developed using Next.js and the OpenAI Speech API, this demo illustrates how the latest neural voice models can produce natural, expressive speech with adjustable styles and voices, highlighting features like emotional range, tone, and real-time playback. ...
    Downloads: 11 This Week
    Last Update:
    See Project
  • 25
    HunyuanImage-3.0

    HunyuanImage-3.0

    A Powerful Native Multimodal Model for Image Generation

    HunyuanImage-3.0 is a powerful, native multimodal text-to-image generation model released by Tencent’s Hunyuan team. It unifies multimodal understanding and generation in a single autoregressive framework, combining text and image modalities seamlessly rather than relying on separate image-only diffusion components. It uses a Mixture-of-Experts (MoE) architecture with many expert subnetworks to scale efficiently, deploying only a subset of experts per token, which allows large parameter counts without linear inference cost explosion. ...
    Downloads: 15 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • 4
  • 5
  • Next