Join/Login
Business Software
Open Source Software
For Vendors
Blog
About
More

For Vendors Help Create Join Login

Business Software

Open Source Software

SourceForge Podcast

Resources

Articles
Case Studies
Blog

Menu

Help
Create
Join
Login

Home
Open Source Software
Artificial Intelligence Software
Search Results

Search Results for "text batch processing tools"

x

Sort By:

Relevance

Clear All Filters

OS

Mac 117
Linux 115
Windows 114
More...
BSD 63
ChromeOS 62
Desktop Operating Systems 3
Game Consoles 1
Mobile Operating Systems 1
Server Operating Systems 1

Category

Artificial Intelligence 117
Software Development 13
Scientific/Engineering 12
Multimedia 7
Business 5
Text Editors 4
Education 2
Formats and Protocols 2
Communications 1
Internet 1

License

OSI-Approved Open Source 96
Creative Commons Attribution License 4
GNU Free Documentation License 1
Other License 1
More...
Public Domain 1

Translations

English 9
French 1
Russian 1
Tamil 1
More...
Turkish 1

Programming Language

Python 69
Java 18
TypeScript 11
JavaScript 9
More...
C 3
Go 3
Perl 3
C++ 2
C# 2
Rust 2
Unix Shell 2
Groovy 1
JSP 1
Julia 1
R 1
Ruby 1
S/R 1

Status

Production/Stable 8
Beta 7
Alpha 5
Pre-Alpha 1

Showing 117 open source projects for "text batch processing tools"

View related business solutions

Artificial Intelligence Mac Clear Filters & Widen Search

Gemini 3 and 200+ AI Models on One Platform
Access Google's best plus Claude, Llama, and Gemma. Fine-tune and deploy from one console.

Build, govern, and optimize agents and models with Gemini Enterprise Agent Platform.

Start Free
Forever Free Full-Stack Observability | Grafana Cloud
Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.

Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.

Create free account
1

Chandra

OCR model for complex documents with layout-aware structured outputs

...Chandra can be run locally using transformer-based inference or deployed with a high-performance server setup for large-scale processing. It also includes command-line tools and optional web-based interfaces to simplify interaction and batch processing workflows.

Downloads: 3 This Week

Last Update: 2026-03-18
See Project
2

text-extract-api

Document (PDF, Word, PPTX ...) extraction and parse API

...Instead of requiring developers to integrate multiple document parsing libraries individually, the system centralizes text extraction capabilities into a unified API that standardizes the output. The platform supports automated processing pipelines that detect file types and apply the appropriate extraction method to obtain the most accurate text representation possible. It can be integrated into document analysis systems, knowledge retrieval tools, and AI pipelines that rely on clean textual data. ...

Downloads: 0 This Week

Last Update: 2026-03-05
See Project
3

abogen

Generate audiobooks from EPUBs, PDFs and text with captions

abogen is a tool designed to generate audiobooks (or speech narrations) from textual sources such as EPUBs, PDFs, or plain text, with synchronized captions. In other words, it automates the pipeline of reading a digital book (or document), converting its text into speech via a TTS engine, and packaging the result into an audiobook format — likely along with timestamped captions or subtitles that align with the spoken audio. This can be very useful for accessibility, content consumption on...

Downloads: 2 This Week

Last Update: 2026-02-06
See Project
4

LLM-Aided OCR Project

Enhances Tesseract OCR output using LLMs (local or API)

...This AI-assisted correction process helps reconstruct missing characters, fix formatting mistakes, and produce more coherent text outputs. The project is particularly useful for digitizing historical documents, research papers, and scanned materials where traditional OCR often struggles. It also includes tools for processing batches of images or documents, enabling automated document digitization workflows.

Downloads: 0 This Week

Last Update: 2026-03-22
See Project
$300 in Free Credit Towards Top Cloud Services
Build VMs, containers, AI, databases, storage—all in one place.

Start your project in minutes. After credits run out, 20+ products include free monthly usage. Only pay when you're ready to scale.

Get Started
5

OpenMed

Open source healthcare AI

...OpenMed can be used in three main ways: as a simple Python API for scripts and notebooks, as a Docker-friendly FastAPI service for backend integration, and as a batch-processing system for multi-document workflows.

Downloads: 12 This Week

Last Update: 3 days ago
See Project
6

deepdoctection

A Repo For Document AI

DeepDoctection is a document AI framework that applies deep learning techniques to analyze and extract structured data from scanned documents, PDFs, and images. deepdoctection is a Python library that orchestrates document extraction and document layout analysis tasks using deep learning models. It does not implement models but enables you to build pipelines using highly acknowledged libraries for object detection, OCR and selected NLP tasks and provides an integrated frameworks for...

Downloads: 0 This Week

Last Update: 2026-05-15
See Project
7

AionUi

Free, local, open-source Cowork for Gemini CLI, Claude Code, Codex

...It enhances productivity by offering smart file management features like batch renaming, automatic organization, and intelligent file classification, thereby reducing manual overhead when working with large datasets or complex document structures. AionUi also supports a remote WebUI mode, allowing users to access their local AI tools securely over a network from other devices while keeping all processing and data on their own hardware.

Downloads: 50 This Week

Last Update: 21 hours ago
See Project
8

AUTOMATIC1111 Stable Diffusion web UI

Stable Diffusion web UI

...The interface also supports prompt editing, batch processing, custom scripts, and many community extensions, making it a highly customizable and continually evolving platform for creative AI art generation.

1 Review

Downloads: 171 This Week

Last Update: 2025-06-02
See Project
9

Umi-OCR

OCR software, free and offline

Umi-OCR is a free and open-source optical character recognition (OCR) tool designed to provide fast, offline text extraction from images, screenshots, PDFs, and more without requiring a network connection. It includes a highly efficient offline OCR engine with built-in multilingual recognition libraries, so users can extract text across multiple languages with high accuracy directly on their machines. The software supports flexible usage patterns including screenshot capture OCR, batch processing of large sets of images or documents, PDF parsing, QR code detection, and layout-aware paragraph output. ...

Downloads: 49 This Week

Last Update: 2026-01-15
See Project
Custom VMs From 1 to 96 vCPUs With 99.95% Uptime
General-purpose, compute-optimized, or GPU/TPU-accelerated. Built to your exact specs.

Live migration and automatic failover keep workloads online through maintenance. One free e2-micro VM every month.

Try Free
10

tidytext

Text mining using tidy tools

tidytext brings tidy data principles to text mining by converting text into a tidy data frame format. It provides tools for tokenization, sentiment analysis, n‑gram creation, and term‑document matrices, enabling interoperability with dplyr, ggplot2, and other tidyverse workflows.

Downloads: 1 This Week

Last Update: 2025-07-30
See Project
11

Voice-Pro

Comprehensive Gradio WebUI for audio processing

Voice-Pro is the best gradio WebUI for transcription, translation and text-to-speech. It can be easily installed with one click. Create a virtual environment using Miniconda, running completely separate from the Windows system (fully portable). Supports real-time transcription and translation, as well as batch mode.

1 Review

Downloads: 48 This Week

Last Update: 2025-12-05
See Project
12

DeepSeek-OCR 2

Visual Causal Flow

DeepSeek-OCR-2 is the second-generation optical character recognition system developed to improve document understanding by introducing a “visual causal flow” mechanism, enabling the encoder to reorder visual tokens in a way that better reflects semantic structure rather than strict raster scan order. It is designed to handle complex layouts and noisy documents by giving the model causal reasoning capabilities that mimic human visual scanning behavior, enhancing OCR performance on documents...

Downloads: 5 This Week

Last Update: 2026-02-03
See Project
13

Stanford CoreNLP

Stanford CoreNLP, a Java suite of core NLP tools

CoreNLP is your one stop shop for natural language processing in Java! CoreNLP enables users to derive linguistic annotations for text, including token and sentence boundaries, parts of speech, named entities, numeric and time values, dependency and constituency parses, coreference, sentiment, quote attributions, and relations. CoreNLP currently supports 6 languages, Arabic, Chinese, English, French, German, and Spanish. The centerpiece of CoreNLP is the pipeline. Pipelines take in raw text,...

Downloads: 4 This Week

Last Update: 2025-06-07
See Project
14

Hazm

Persian NLP Toolkit

Hazm is a natural language processing (NLP) library for Persian text, offering various tools for text preprocessing, tokenization, part-of-speech tagging, and more.

Downloads: 0 This Week

Last Update: 2026-04-01
See Project
15

Hugging Face - Speech To Speech

Open speech-to-speech models and pipelines by Hugging Face toolkit AI

This project from Hugging Face focuses on enabling direct speech-to-speech processing using modern machine learning models. It provides tools and reference implementations that allow audio input to be transformed into audio output without requiring an intermediate text representation. Hugging Face - Speech To Speech builds on recent advances in speech modeling, combining components such as speech recognition, translation, and synthesis into unified pipelines. ...

Downloads: 0 This Week

Last Update: 2026-03-18
See Project
16

DeepSeek-OCR

Contexts Optical Compression

DeepSeek-OCR is an open-source optical character recognition solution built as part of the broader DeepSeek AI vision-language ecosystem. It is designed to extract text from images, PDFs, and scanned documents, and integrates with multimodal capabilities that understand layout, context, and visual elements beyond raw character recognition. The system treats OCR not simply as “read the text” but as “understand what the text is doing in the image”—for example distinguishing captions from body...

Downloads: 8 This Week

Last Update: 2026-01-27
See Project
17

Readest

Readest is a modern, feature-rich ebook reader

Readest is a project meant to facilitate reading, studying, or consuming content by integrating reading tools with AI-powered assistance. Although the repository is not as widely documented or popular as some, the idea is that Readest supports features to help with reading comprehension — likely combining OCR / text retrieval, translation, note-taking, or summarization for reading materials (eBooks, articles, PDFs). The goal appears to be to let users feed in arbitrary reading material and...

Downloads: 25 This Week

Last Update: 2026-05-11
See Project
18

SciSpaCy

A full spaCy pipeline and models for scientific/biomedical documents

ScispaCy is a spaCy extension optimized for processing biomedical and scientific text, providing domain-specific NLP models for tasks like named entity recognition (NER) and dependency parsing.

Downloads: 0 This Week

Last Update: 2025-10-01
See Project
19

edge-tts

Use Microsoft Edge's online text-to-speech service from Python

edge-tts is a Python module and command-line tool that gives you direct access to Microsoft Edge’s online text-to-speech service without needing the Edge browser, Windows, or any API key. It wraps the same cloud voices used by Edge, exposing them through a simple CLI (edge-tts, edge-playback) and a Python API, so you can script high-quality speech generation in your own applications. The tool lets you list available voices, specify locale and voice name, and generate audio files in common...

Downloads: 35 This Week

Last Update: 2026-03-22
See Project
20

Short Video Factory

AI tool for automatic batch short video creation and editing

Short Video Factory is an open source desktop application designed to simplify the creation of short-form videos using AI-driven automation. It enables users to generate product marketing clips and general content videos by combining simple prompt-based input with pre-prepared media assets. Short Video Factory integrates multiple stages of video production, including script generation, voice synthesis, video editing, and subtitle effects, into a single streamlined workflow. By leveraging AI...

Downloads: 1 This Week

Last Update: 2026-04-07
See Project
21

TTime

Screenshots, word marking, OCR, AI, translation software

...It also supports clipboard monitoring and silent OCR processing, enabling seamless workflows where extracted text can be translated automatically without interrupting the user. The interface is designed to be lightweight and responsive, with customizable shortcuts and floating tools that enhance usability during multitasking.

Downloads: 2 This Week

Last Update: 2026-03-18
See Project
22

Pocket TTS

A TTS that fits in your CPU (and pocket)

...Because it is CPU-oriented, it fits well in server environments where GPU access is limited, in desktop apps, or in edge deployments where simplicity matters more than maximum throughput. It also emphasizes developer ergonomics, providing a straightforward API surface that can be integrated into pipelines, assistants, accessibility tools, or batch generation scripts.

Downloads: 12 This Week

Last Update: 2026-05-04
See Project
23

MindNLP

Easy-to-use and high-performance NLP and LLM framework

MindNLP is a natural language processing library built on the MindSpore framework, providing tools and models for various NLP tasks.

Downloads: 0 This Week

Last Update: 2025-11-05
See Project
24

Apache OpenNLP

Apache OpenNLP

Apache OpenNLP is a machine learning-based NLP library that provides tools for text-processing tasks such as tokenization, sentence segmentation, and named entity recognition.

Downloads: 1 This Week

Last Update: 2026-04-28
See Project
25

DeerFlow

Deep Research framework, combining language models with tools

DeerFlow is an open-source, community-driven “deep research” framework / multi-agent orchestration platform developed by ByteDance. It aims to combine the reasoning power of large language models (LLMs) with automated tool-use — such as web search, web crawling, Python execution, and data processing — to enable complex, end-to-end research workflows. Instead of a monolithic AI assistant, DeerFlow defines multiple specialized agents (e.g. “planner,” “searcher,” “coder,” “report generator”)...

Downloads: 114 This Week

Last Update: 6 days ago
See Project

Previous
You're on page 1
2
3
4
5
Next

Related Searches

ocr

umi-ocr

automatic1111

tts

aionui

umi

speech

umi-ocr_paddle_v2.1.5.7z.exe

readest

portable stable diffusion

Related Categories

Artificial Intelligence

Software Development

Scientific/Engineering

Multimedia

Business

SourceForge

Create a Project
Open Source Software
Business Software
Top Downloaded Projects

Company

About
Team
SourceForge Headquarters
1320 Columbia Street Suite 310
San Diego, CA 92101
+1 (858) 422-6466

Resources

Support
Site Documentation
Site Status
SourceForge Reviews

© 2026 Slashdot Media. All Rights Reserved.

Terms Privacy Opt Out Advertise