Join/Login
Business Software
Open Source Software
For Vendors
Blog
About
More

For Vendors Help Create Join Login

Business Software

Open Source Software

SourceForge Podcast

Resources

Articles
Case Studies
Blog

Menu

Help
Create
Join
Login

Home
Open Source Software
Search Results

Search Results for "text batch processing tools"

x

Sort By:

Relevance

Clear All Filters

OS

Linux 364
Windows 334
Mac 298
More...
BSD 198
ChromeOS 152
Desktop Operating Systems 23
Server Operating Systems 7
Mobile Operating Systems 5
Embedded Operating Systems 2
Game Consoles 1

Category

Artificial Intelligence 127
Text Editors 108
Software Development 106
Multimedia 46
Business 39
Scientific/Engineering 31
Formats and Protocols 28
System 24
Internet 22
Education 18
Database 7
Printing 6
Desktop Environment 3
Communications 2
Games 2
Security 2
Terminals 2
Mobile 1
Religion and Philosophy 1

License

OSI-Approved Open Source 301
Creative Commons Attribution License 11
Public Domain 5
Other License 3
More...
GNU Free Documentation License 2

Translations

Programming Language

Status

Production/Stable 82
Beta 59
Alpha 24
Planning 12
More...
Pre-Alpha 11
Mature 10

Showing 364 open source projects for "text batch processing tools"

View related business solutions

Linux Clear Filters & Widen Search

Earn up to 16% annual interest with Nexo.
More flexibility. More control.

Generate interest, access liquidity without selling, and execute trades seamlessly. All in one platform. Geographic restrictions, eligibility, and terms apply.

Get started with Nexo.
MongoDB Atlas runs apps anywhere
Deploy in 115+ regions with the modern database for every enterprise.

MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.

Start Free
1

FFmpeg Batch AV Converter

FFmpeg Batch AV Converter

FFmpeg Batch AV Converter is a graphical front-end for FFmpeg designed to simplify advanced multimedia processing through an intuitive interface while preserving full access to FFmpeg’s capabilities. It allows users to perform complex encoding, conversion, and editing operations using drag-and-drop workflows instead of command-line input. The application supports both single and batch processing, enabling users to handle large volumes of media files efficiently. ...

Downloads: 5 This Week

Last Update: 2026-04-24
See Project
2

text-extract-api

Document (PDF, Word, PPTX ...) extraction and parse API

...Instead of requiring developers to integrate multiple document parsing libraries individually, the system centralizes text extraction capabilities into a unified API that standardizes the output. The platform supports automated processing pipelines that detect file types and apply the appropriate extraction method to obtain the most accurate text representation possible. It can be integrated into document analysis systems, knowledge retrieval tools, and AI pipelines that rely on clean textual data. ...

Downloads: 0 This Week

Last Update: 2026-03-05
See Project
3

Chandra

OCR model for complex documents with layout-aware structured outputs

...Chandra can be run locally using transformer-based inference or deployed with a high-performance server setup for large-scale processing. It also includes command-line tools and optional web-based interfaces to simplify interaction and batch processing workflows.

Downloads: 1 This Week

Last Update: 2026-03-18
See Project
4

LLM-Aided OCR Project

Enhances Tesseract OCR output using LLMs (local or API)

...This AI-assisted correction process helps reconstruct missing characters, fix formatting mistakes, and produce more coherent text outputs. The project is particularly useful for digitizing historical documents, research papers, and scanned materials where traditional OCR often struggles. It also includes tools for processing batches of images or documents, enabling automated document digitization workflows.

Downloads: 1 This Week

Last Update: 2026-03-22
See Project
$300 in Free Credit Towards Top Cloud Services
Build VMs, containers, AI, databases, storage—all in one place.

Start your project in minutes. After credits run out, 20+ products include free monthly usage. Only pay when you're ready to scale.

Get Started
5

abogen

Generate audiobooks from EPUBs, PDFs and text with captions

abogen is a tool designed to generate audiobooks (or speech narrations) from textual sources such as EPUBs, PDFs, or plain text, with synchronized captions. In other words, it automates the pipeline of reading a digital book (or document), converting its text into speech via a TTS engine, and packaging the result into an audiobook format — likely along with timestamped captions or subtitles that align with the spoken audio. This can be very useful for accessibility, content consumption on...

Downloads: 3 This Week

Last Update: 2026-02-06
See Project
6

OpenMed

Open source healthcare AI

...OpenMed can be used in three main ways: as a simple Python API for scripts and notebooks, as a Docker-friendly FastAPI service for backend integration, and as a batch-processing system for multi-document workflows.

Downloads: 13 This Week

Last Update: 6 days ago
See Project
7

deepdoctection

A Repo For Document AI

DeepDoctection is a document AI framework that applies deep learning techniques to analyze and extract structured data from scanned documents, PDFs, and images. deepdoctection is a Python library that orchestrates document extraction and document layout analysis tasks using deep learning models. It does not implement models but enables you to build pipelines using highly acknowledged libraries for object detection, OCR and selected NLP tasks and provides an integrated frameworks for...

Downloads: 0 This Week

Last Update: 2026-05-15
See Project
8

AionUi

Free, local, open-source Cowork for Gemini CLI, Claude Code, Codex

...It enhances productivity by offering smart file management features like batch renaming, automatic organization, and intelligent file classification, thereby reducing manual overhead when working with large datasets or complex document structures. AionUi also supports a remote WebUI mode, allowing users to access their local AI tools securely over a network from other devices while keeping all processing and data on their own hardware.

Downloads: 65 This Week

Last Update: 6 hours ago
See Project
9

Zerox OCR

PDF to Markdown with vision models

A dead simple way of OCR-ing a document for AI ingestion. Documents are meant to be a visual representation after all. With weird layouts, tables, charts, etc. The vision models just make sense. ZeroX is an open-source machine learning framework designed for fast experimentation and production deployment, optimized for speed and ease of use.

Downloads: 0 This Week

Last Update: 2024-12-18
See Project
Our Free Plans just got better! | Auth0
With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.

Try free now
10

FFBox

A multimedia transcoded treasure chest / a FFmpeg case

FFBox is a graphical multimedia processing application that provides an accessible interface for working with FFmpeg operations such as encoding, conversion, and editing. It allows users to perform tasks like trimming, merging, and compressing media files without using command-line tools. The software supports a wide range of audio and video formats, making it suitable for diverse media workflows.

Downloads: 0 This Week

Last Update: 2026-04-30
See Project
11

Umi-OCR

OCR software, free and offline

Umi-OCR is a free and open-source optical character recognition (OCR) tool designed to provide fast, offline text extraction from images, screenshots, PDFs, and more without requiring a network connection. It includes a highly efficient offline OCR engine with built-in multilingual recognition libraries, so users can extract text across multiple languages with high accuracy directly on their machines. The software supports flexible usage patterns including screenshot capture OCR, batch processing of large sets of images or documents, PDF parsing, QR code detection, and layout-aware paragraph output. ...

Downloads: 72 This Week

Last Update: 2026-01-15
See Project
12

AUTOMATIC1111 Stable Diffusion web UI

Stable Diffusion web UI

...The interface also supports prompt editing, batch processing, custom scripts, and many community extensions, making it a highly customizable and continually evolving platform for creative AI art generation.

1 Review

Downloads: 174 This Week

Last Update: 2025-06-02
See Project
13

Shutter Encoder

A professional video compression tool accessible to all

Shutter Encoder is a cross-platform video and audio processing application designed to provide professional-grade encoding and conversion tools through an accessible graphical interface. Built primarily on FFmpeg, it offers a wide range of media operations including transcoding, compression, format conversion, and editing. The software supports numerous codecs and formats, enabling users to prepare media for broadcasting, streaming, or archiving.

Downloads: 5 This Week

Last Update: 2026-05-04
See Project
14

Pathway

Python ETL framework for stream processing, real-time analytics, LLM

...Pathway is especially well-suited for scenarios like financial analytics, IoT, fraud detection, and logistics, where high-velocity and continuously changing data is the norm. Unlike traditional batch processing frameworks, Pathway continuously updates the results of your data logic as new events arrive, functioning more like a database that reacts in real-time. It supports Python, integrates with modern data tools, and offers a deterministic dataflow model to ensure reproducibility and correctness.

Downloads: 0 This Week

Last Update: 2 days ago
See Project
15

Lesan

New way to create web server and NoSQL data model

Lesan is a multilingual text processing and translation library designed for natural language processing (NLP) applications. It provides tools for text normalization, tokenization, and translation across multiple languages.

Downloads: 0 This Week

Last Update: 2026-04-18
See Project
16

tidytext

Text mining using tidy tools

tidytext brings tidy data principles to text mining by converting text into a tidy data frame format. It provides tools for tokenization, sentiment analysis, n‑gram creation, and term‑document matrices, enabling interoperability with dplyr, ggplot2, and other tidyverse workflows.

Downloads: 1 This Week

Last Update: 2025-07-30
See Project
17

Voice-Pro

Comprehensive Gradio WebUI for audio processing

Voice-Pro is the best gradio WebUI for transcription, translation and text-to-speech. It can be easily installed with one click. Create a virtual environment using Miniconda, running completely separate from the Windows system (fully portable). Supports real-time transcription and translation, as well as batch mode.

1 Review

Downloads: 60 This Week

Last Update: 2025-12-05
See Project
18

CompressO

Convert any video/image into a tiny size. 100% free & open-source

compressO is a cross-platform, open-source multimedia compression application designed to reduce the size of videos and images while preserving visual quality. Built using modern frameworks such as Rust and Tauri, it runs locally on the user’s machine, ensuring fast performance and complete privacy without requiring cloud processing. The application supports a variety of media formats and provides controls for adjusting compression levels, resolution, and output quality. In addition to...

Downloads: 6 This Week

Last Update: 2026-04-24
See Project
19

Faster Whisper

Faster Whisper transcription with CTranslate2

Faster Whisper is an optimized implementation of the Whisper speech recognition model designed to deliver significantly faster inference while maintaining comparable accuracy. It leverages efficient inference engines and optimized computation strategies to reduce latency and resource consumption. The system is particularly useful for real-time or large-scale transcription tasks where performance is critical. It supports multiple model sizes, allowing users to balance speed and accuracy based...

Downloads: 36 This Week

Last Update: 2026-04-06
See Project
20

OpenAI Go

The official Go library for the OpenAI API

...It enables developers to integrate OpenAI’s models and features into Go applications with a clean and idiomatic interface. The library provides support for a wide range of API endpoints including chat completions, assistants, embeddings, image generation, audio processing, and batch jobs. It includes built-in tools for handling authentication, managing API requests, and parsing structured responses. The repository also offers examples to help developers quickly set up projects and test different API calls. Designed for reliability and ease of use, it is maintained to stay aligned with the evolving OpenAI API specifications.

Downloads: 3 This Week

Last Update: 5 days ago
See Project
21

DeepSeek-OCR 2

Visual Causal Flow

DeepSeek-OCR-2 is the second-generation optical character recognition system developed to improve document understanding by introducing a “visual causal flow” mechanism, enabling the encoder to reorder visual tokens in a way that better reflects semantic structure rather than strict raster scan order. It is designed to handle complex layouts and noisy documents by giving the model causal reasoning capabilities that mimic human visual scanning behavior, enhancing OCR performance on documents...

Downloads: 3 This Week

Last Update: 2026-02-03
See Project
22

SciSpaCy

A full spaCy pipeline and models for scientific/biomedical documents

ScispaCy is a spaCy extension optimized for processing biomedical and scientific text, providing domain-specific NLP models for tasks like named entity recognition (NER) and dependency parsing.

Downloads: 2 This Week

Last Update: 2025-10-01
See Project
23

Hazm

Persian NLP Toolkit

Hazm is a natural language processing (NLP) library for Persian text, offering various tools for text preprocessing, tokenization, part-of-speech tagging, and more.

Downloads: 0 This Week

Last Update: 2026-04-01
See Project
24

AutoSubSync

Automatic subtitle synchronization tool

...AutoSubSync also includes batch processing capabilities, enabling users to handle entire media libraries efficiently. It supports a wide range of subtitle formats and can synchronize subtitles using either the original video or a reference subtitle file. Overall, it streamlines subtitle correction workflows while maintaining flexibility and precision.

Downloads: 30 This Week

Last Update: 2026-05-19
See Project
25

Stanford CoreNLP

Stanford CoreNLP, a Java suite of core NLP tools

CoreNLP is your one stop shop for natural language processing in Java! CoreNLP enables users to derive linguistic annotations for text, including token and sentence boundaries, parts of speech, named entities, numeric and time values, dependency and constituency parses, coreference, sentiment, quote attributions, and relations. CoreNLP currently supports 6 languages, Arabic, Chinese, English, French, German, and Spanish. The centerpiece of CoreNLP is the pipeline. Pipelines take in raw text,...

Downloads: 2 This Week

Last Update: 2025-06-07
See Project

Previous
You're on page 1
2
3
4
5
Next

Related Searches

ocr

umi-ocr

automatic1111

aionui

umi

umi-ocr_paddle_v2.1.5.7z.exe

portable stable diffusion

whisper-windows-x64.exe

voice cloning

demucs

Related Categories

Artificial Intelligence

Text Editors

Software Development

Multimedia

Business

SourceForge

Create a Project
Open Source Software
Business Software
Top Downloaded Projects

Company

About
Team
SourceForge Headquarters
1320 Columbia Street Suite 310
San Diego, CA 92101
+1 (858) 422-6466

Resources

Support
Site Documentation
Site Status
SourceForge Reviews

© 2026 Slashdot Media. All Rights Reserved.

Terms Privacy Opt Out Advertise