Search Results for "open source png text" - Page 5

Showing 1599 open source projects for "open source png text"

View related business solutions
  • $300 Free Credits to Build on Google Cloud Icon
    $300 Free Credits to Build on Google Cloud

    New to Google Cloud? Get $300 in credits to explore Compute Engine, BigQuery, Cloud Run, Gemini Enterprise Agent Platform, and more.

    Start your next project with $300 in free Google Cloud credit. Spin up VMs, run containers, query petabytes in BigQuery, or build agents with Gemini Enterprise Agent Platform. Once your credits are used, keep building with 20+ always-free tier products including Compute Engine, Cloud Storage, GKE, and Cloud Run functions. No commitment required—just sign up and start building.
    Claim $300 Free
  • Stop Cyber Threats with VM-Series Next-Gen Firewall on Azure Icon
    Stop Cyber Threats with VM-Series Next-Gen Firewall on Azure

    Native application identity and user-based security for your Azure cloud

    Gain integrated visibility across all traffic in a single pass. Deploy Palo Alto Networks VM-Series to determine application identity and content while automating security policy updates via rich APIs.
    Get a free trial
  • 1
    Parlant

    Parlant

    The behavior guidance framework for customer-facing LLM agents

    Parlant is a lightweight speech-to-text and text-to-speech framework designed for real-time AI-driven voice applications.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    Zerox OCR

    Zerox OCR

    PDF to Markdown with vision models

    A dead simple way of OCR-ing a document for AI ingestion. Documents are meant to be a visual representation after all. With weird layouts, tables, charts, etc. The vision models just make sense. ZeroX is an open-source machine learning framework designed for fast experimentation and production deployment, optimized for speed and ease of use.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 3
    ArchiveBox

    ArchiveBox

    Open source self-hosted web archiving

    ArchiveBox is a powerful, self-hosted internet archiving solution to collect, save, and view websites offline. Without active preservation effort, everything on the internet eventually disappears or degrades. Archive.org does a great job as a centralized service, but saved URLs have to be public, and they can't save every type of content. ArchiveBox is an open source tool that lets organizations & individuals archive both public & private web content while retaining control over their data....
    Downloads: 5 This Week
    Last Update:
    See Project
  • 4
    LLM-Aided OCR Project

    LLM-Aided OCR Project

    Enhances Tesseract OCR output using LLMs (local or API)

    LLM Aided OCR is an open-source system designed to improve optical character recognition accuracy by combining traditional OCR tools with large language models. The project addresses common OCR challenges such as distorted text, unusual fonts, historical documents, and complex layouts that often produce inaccurate results with standard OCR pipelines. The system first extracts raw text using OCR engines and then applies language models to analyze and correct recognition errors based on context. ...
    Downloads: 3 This Week
    Last Update:
    See Project
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • 5
    vim-ai

    vim-ai

    AI-powered code assistant for Vim. OpenAI and ChatGPT plugin for Vim

    vim-ai is an AI-powered assistant plugin for Vim and Neovim that brings language-model features directly into the editor. It allows users to generate code or text, edit selections in place, and carry on interactive chat-style conversations without leaving the terminal editing environment. The plugin is built around OpenAI-compatible APIs, which means it can work not only with OpenAI itself but also with compatible proxies and alternative providers. Its command set covers text completion,...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 6
    pyVideoTrans

    pyVideoTrans

    Translate the video from one language to another and embed dubbing

    pyVideoTrans is an ambitious open-source multimedia processing project that assembles speech recognition, subtitle generation, AI translation, voice synthesis, and video assembly into a unified pipeline for converting videos from one language to another with embedded dubbing and captions. At its core it runs speech-to-text models to transcribe audio tracks, translates the resulting text into a target language using local or cloud-based translation engines, synthesizes new speech to match the translated subtitles, and then merges that speech back into the video, creating a fully localized media file. ...
    Downloads: 14 This Week
    Last Update:
    See Project
  • 7
    Planning with Files

    Planning with Files

    Claude Code skill implementing Manus-style persistent planning

    Planning With Files is a Claude Code skill — essentially a plugin for AI agent workflows — that adapts the “Manus-style” persistent markdown planning methodology into developer workflows, enabling structured project planning, progress tracking, and knowledge storage using plain text files. Inspired by high-profile agent workflows and context engineering patterns, it uses persistent markdown files (like task_plan.md, progress.md, and findings.md) as the “working memory” for AI agents,...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 8
    CLIP

    CLIP

    CLIP, Predict the most relevant text snippet given an image

    CLIP (Contrastive Language-Image Pretraining) is a neural model that links images and text in a shared embedding space, allowing zero-shot image classification, similarity search, and multimodal alignment. It was trained on large sets of (image, caption) pairs using a contrastive objective: images and their matching text are pulled together in embedding space, while mismatches are pushed apart. Once trained, you can give it any text labels and ask it to pick which label best matches a given...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    IndexTTS2

    IndexTTS2

    Industrial-level controllable zero-shot text-to-speech system

    ...The system supports zero-shot voice cloning — meaning it can mimic a target speaker’s voice from a short reference sample — making it versatile for multi-voice uses. Compared to many open-source TTS tools, IndexTTS emphasizes efficiency and controllability: it offers faster inference, simpler training pipelines, and controllable speech parameters (like duration, pitch, and prosody), which is critical for production use.
    Downloads: 6 This Week
    Last Update:
    See Project
  • Our Free Plans just got better! | Auth0 Icon
    Our Free Plans just got better! | Auth0

    With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

    You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.
    Try free now
  • 10
    Tilf

    Tilf

    Tilf (Tiny Elf) is a free, simple yet powerful pixel art editor

    Tilf (Tiny Elf) is a lightweight, cross-platform pixel art editor developed in Python with PySide6, designed for simplicity, speed, and freedom from account systems or installation overhead. It focuses on enabling artists to create sprites, icons, and small 2D assets quickly, without requiring setup, dependencies, or internet connectivity. Tilf provides a familiar drawing environment with essential tools—such as pencil, eraser, fill, eyedropper, rectangle, and ellipse—along with zoom, grid...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    Real-Time Voice Cloning

    Real-Time Voice Cloning

    Clone a voice in 5 seconds to generate arbitrary speech in real-time

    Real-Time Voice Cloning is an influential deep-learning repository that demonstrates how to clone a voice from just a few seconds of audio and then generate arbitrary speech in that voice in near real time. It implements the SV2TTS pipeline (“Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis”) in three stages: a speaker encoder, a synthesizer, and a vocoder. In the first stage, short audio clips are converted into a fixed-dimensional speaker embedding that...
    Downloads: 8 This Week
    Last Update:
    See Project
  • 12
    Clarity AI Upscaler

    Clarity AI Upscaler

    AI Image Upscaler & Enhancer

    Clarity AI Upscaler is an open-source AI image enhancement tool designed to increase the resolution and visual quality of images using modern generative techniques. The system uses deep learning models based on diffusion and other image generation methods to reconstruct high-resolution versions of low-resolution images while preserving important visual details. Unlike traditional interpolation-based upscaling algorithms, the system generates additional visual information that improves...
    Downloads: 8 This Week
    Last Update:
    See Project
  • 13
    ARIS

    ARIS

    Lightweight Markdown-only skills for autonomous ML research

    ARIS is an experimental automation framework that leverages AI coding agents to perform continuous research and development tasks autonomously, even without active user supervision. The system is designed to run iterative cycles of research, coding, testing, and refinement, effectively simulating a “sleep mode” where productive work continues in the background. It integrates with AI tools such as Claude Code to generate solutions, analyze results, and improve outputs over time. The project...
    Downloads: 7 This Week
    Last Update:
    See Project
  • 14
    Paperless-ngx

    Paperless-ngx

    A community-supported supercharged version of paperless

    Paperless-ngx is a community-supported open-source document management system that transforms your physical documents into a searchable online archive so you can keep, well, less paper.
    Downloads: 14 This Week
    Last Update:
    See Project
  • 15
    OpenBB Terminal

    OpenBB Terminal

    Investment research for everyone, anywhere

    Fully written in python which is one of the most used programming languages due to its simplified syntax and shallow learning curve. It is the first time in history that users, regardless of their background, can so easily add features to an investment research platform. The MIT Open Source license allows any user to fork the project to either add features to the broader community or create their own customized terminal version. The terminal allows for users to import their own proprietary datasets to use on our econometric menu. In addition, users are allowed to export any type of data to any type of format whether that is raw data in Excel or an image in PNG. ...
    Downloads: 5 This Week
    Last Update:
    See Project
  • 16
    ElevenLabs Python

    ElevenLabs Python

    The official Python SDK for the ElevenLabs API

    elevenlabs-python is the official Python SDK for the ElevenLabs API, giving developers a convenient way to access ElevenLabs’ high-quality, lifelike voices. The library wraps the HTTP API into a typed Python client, so you can perform text-to-speech, streaming, voice cloning, voice management, and agents-related operations with simple method calls. It exposes ElevenLabs’ main models such as Eleven Multilingual v2, Eleven Flash v2.5, and Eleven Turbo v2.5, each targeting different trade-offs...
    Downloads: 5 This Week
    Last Update:
    See Project
  • 17
    Ren'Py

    Ren'Py

    The Ren'Py Visual Novel Engine

    Ren’Py is a free and open-source visual novel engine that makes telling interactive stories easy by combining narrative scripting with multimedia support for images, sound, and music in a way that’s accessible to both beginners and experienced developers. Its scripting language is designed to be readable and intuitive, allowing writers and creators to define scenes, dialogues, branching choices, character expressions, and events without deep programming knowledge, while still offering the ability to embed Python for advanced control and custom gameplay features. ...
    Downloads: 123 This Week
    Last Update:
    See Project
  • 18
    Hazm

    Hazm

    Persian NLP Toolkit

    Hazm is a natural language processing (NLP) library for Persian text, offering various tools for text preprocessing, tokenization, part-of-speech tagging, and more.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    WhisperLive

    WhisperLive

    A nearly-live implementation of OpenAI's Whisper

    WhisperLive is a “nearly live” implementation of OpenAI’s Whisper model focused on real-time transcription. It runs as a server–client system in which the server hosts a Whisper backend and clients stream audio to be transcribed with very low delay. The project supports multiple inference backends, including Faster-Whisper, NVIDIA TensorRT, and OpenVINO, allowing you to target GPUs and different CPU architectures efficiently. It can handle microphone input, pre-recorded audio files, and...
    Downloads: 12 This Week
    Last Update:
    See Project
  • 20
    MedGemma

    MedGemma

    Collection of Gemma 3 variants that are trained for performance

    MedGemma is a collection of specialized open-source AI models created by Google as part of its Health AI Developer Foundations initiative, built on the Gemma 3 family of transformer models and trained for medical text and image comprehension tasks that help accelerate the development of healthcare-focused AI applications. It includes multiple variants such as a 4 billion-parameter multimodal model that can process both medical images and text and a 27 billion-parameter text-only (and multimodal) model that offers deeper clinical reasoning and understanding at higher capacity, making it suitable for complex tasks like medical question answering, summarization of clinical notes, or generating reports from radiology images. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    YandexStation

    YandexStation

    Management of Yandex Station and other smart home devices

    YandexStation is a Home Assistant custom component that integrates Yandex-branded smart speakers and other devices with Alice into a unified smart home automation environment. It supports both local and cloud control, depending on the device type, with Yandex speakers often supporting both modes and third-party speakers typically limited to cloud control. The integration exposes playback and volume controls, as well as text-to-speech capabilities that send spoken messages in Alice’s voice...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 22
    Matcha-TTS

    Matcha-TTS

    A fast TTS architecture with conditional flow matching

    Matcha-TTS is a non-autoregressive neural text-to-speech architecture that uses conditional flow matching to generate speech quickly while maintaining natural quality. It models speech as an ODE-based generative process, and conditional flow matching lets it reach high-quality audio in only a few synthesis steps, which greatly reduces latency compared to score-matching diffusion approaches. The model is fully probabilistic, so it can generate diverse realizations of the same text while still...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 23
    kb

    kb

    A minimalist command line knowledge base manager

    kb is a minimalist command-line knowledge base manager that gives users a fast, organized way to collect, store, search, and retrieve notes, documents, cheatsheets, procedures, and other artifacts directly from the terminal. It was created to solve the common problem of having scattered text files or reference materials on disk that are hard to search or categorize, and it surfaces a simple CLI interface with intuitive commands for adding, viewing, editing, and deleting knowledge items. Each...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    Sopro TTS

    Sopro TTS

    A lightweight text-to-speech model with zero-shot voice cloning

    Sopro TTS is an open-source text-to-speech (TTS) project that implements a lightweight model capable of producing speech from text with zero-shot voice cloning, meaning it can mimic a speaker’s voice from only a few seconds of reference audio. Built with a 169 million-parameter architecture that uses dilated convolutions and cross-attention layers instead of large Transformer stacks, it achieves relatively fast real-time performance even on CPUs (about a 0.25 real-time factor measured on an M3 base). ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    Trafilatura

    Trafilatura

    Python & command-line tool to gather text on the Web

    Trafilatura is a Python package and command-line tool designed to gather text on the Web. It includes discovery, extraction and text-processing components. Its main applications are web crawling, downloads, scraping, and extraction of main texts, metadata and comments. It aims at staying handy and modular: no database is required, the output can be converted to various commonly used formats. Going from raw HTML to essential parts can alleviate many problems related to text quality, first by...
    Downloads: 0 This Week
    Last Update:
    See Project
MongoDB Logo MongoDB