Search Results for "batch text processing"

Sort By:

Showing 1418 open source projects for "batch text processing"

View related business solutions

Mac Clear Filters & Widen Search

Our Free Plans just got better! | Auth0
With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.

Try free now
MongoDB Atlas runs apps anywhere
Deploy in 115+ regions with the modern database for every enterprise.

MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.

Start Free
1

Spring Batch

Spring Batch is a framework for writing batch applications using Java

A lightweight, comprehensive batch framework designed to enable the development of robust batch applications vital for the daily operations of enterprise systems. Spring Batch provides reusable functions that are essential in processing large volumes of records, including logging/tracing, transaction management, job processing statistics, job restart, skip, and resource management.

Downloads: 0 This Week

Last Update: 2026-01-21
See Project
2

Recognizers-Text

Recognition and resolution of numbers, units, date/time, etc.

Recognizers-Text is a multilingual text recognition library that extracts structured information such as dates, numbers, and currency values from unstructured text.

Downloads: 0 This Week

Last Update: 2025-02-12
See Project
3

Automatic text summarizer

Module for automatic summarization of text documents and HTML pages

Sumy is an automatic text summarization library that provides multiple algorithms for extracting key content from documents and articles. Simple library and command line utility for extracting summary from HTML pages or plain texts. The package also contains a simple evaluation framework for text summaries. Implemented summarization methods are described in the documentation. I also maintain a list of alternative implementations of the summarizers in various programming languages.

Downloads: 3 This Week

Last Update: 2026-02-14
See Project
4

Text Generation Inference

Large Language Model Text Generation Inference

Text Generation Inference is a high-performance inference server for text generation models, optimized for Hugging Face's Transformers. It is designed to serve large language models efficiently with optimizations for performance and scalability.

Downloads: 1 This Week

Last Update: 2025-12-18
See Project
Go From Idea to Deployed AI App Fast
One platform to build, fine-tune, and deploy. No MLOps team required.

Access Gemini 3 and 200+ models. Build chatbots, agents, or custom models with built-in monitoring and scaling.

Try Free
5

abogen

Generate audiobooks from EPUBs, PDFs and text with captions

...The repository supports handling common ebook formats and generating outputs that combine audio plus caption metadata. By automating text-to-speech for arbitrary documents, abogen reduces the friction of producing audiobooks and could be integrated into larger workflows (e.g., batch converting a library of texts).

Downloads: 12 This Week

Last Update: 2026-02-06
See Project
6

deepdoctection

A Repo For Document AI

...It does not implement models but enables you to build pipelines using highly acknowledged libraries for object detection, OCR and selected NLP tasks and provides an integrated frameworks for fine-tuning, evaluating and running models. For more specific text processing tasks use one of the many other great NLP libraries.

Downloads: 5 This Week

Last Update: 2026-02-17
See Project
7

rembg

Rembg is a tool to remove images background

Rembg is a powerful tool that utilizes AI (specifically U^2-Net) to automatically remove backgrounds from images, offering a streamlined command-line interface and Docker support. It's ideal for batch processing and integrates smoothly into workflows

Downloads: 11 This Week

Last Update: 2026-01-03
See Project
8

Umi-OCR

OCR software, free and offline

Umi-OCR is a free and open-source optical character recognition (OCR) tool designed to provide fast, offline text extraction from images, screenshots, PDFs, and more without requiring a network connection. It includes a highly efficient offline OCR engine with built-in multilingual recognition libraries, so users can extract text across multiple languages with high accuracy directly on their machines. The software supports flexible usage patterns including screenshot capture OCR, batch processing of large sets of images or documents, PDF parsing, QR code detection, and layout-aware paragraph output. ...

Downloads: 58 This Week

Last Update: 2026-01-15
See Project
9

pdfcpu

A PDF processor written in Go

...The main focus lies on strong support for batch processing and scripting via a rich command line. At the same time pdfcpu wants to make it easy to integrate PDF processing into your Go-based backend system by providing a robust command set. Always make sure your work is based on the latest commit! pdfcpu is still Alpha - bugfixes are committed on the fly and will be mentioned in the next release notes.

Downloads: 19 This Week

Last Update: 2025-10-21
See Project
Host LLMs in Production With On-Demand GPUs
NVIDIA L4 GPUs. 5-second cold starts. Scale to zero when idle.

Deploy your model, get an endpoint, pay only for compute time. No GPU provisioning or infrastructure management required.

Try Free
10

Voice-Pro

Comprehensive Gradio WebUI for audio processing

Voice-Pro is the best gradio WebUI for transcription, translation and text-to-speech. It can be easily installed with one click. Create a virtual environment using Miniconda, running completely separate from the Windows system (fully portable). Supports real-time transcription and translation, as well as batch mode.

1 Review

Downloads: 19 This Week

Last Update: 2025-12-05
See Project
11

AUTOMATIC1111 Stable Diffusion web UI

Stable Diffusion web UI

...The interface also supports prompt editing, batch processing, custom scripts, and many community extensions, making it a highly customizable and continually evolving platform for creative AI art generation.

1 Review

Downloads: 282 This Week

Last Update: 2025-06-02
See Project
12

DeepSeek-OCR

Contexts Optical Compression

DeepSeek-OCR is an open-source optical character recognition solution built as part of the broader DeepSeek AI vision-language ecosystem. It is designed to extract text from images, PDFs, and scanned documents, and integrates with multimodal capabilities that understand layout, context, and visual elements beyond raw character recognition. The system treats OCR not simply as “read the text” but as “understand what the text is doing in the image”—for example distinguishing captions from body text, interpreting tables, or recognizing handwritten versus printed words. ...

Downloads: 5 This Week

Last Update: 2026-01-27
See Project
13

Zerox OCR

PDF to Markdown with vision models

A dead simple way of OCR-ing a document for AI ingestion. Documents are meant to be a visual representation after all. With weird layouts, tables, charts, etc. The vision models just make sense. ZeroX is an open-source machine learning framework designed for fast experimentation and production deployment, optimized for speed and ease of use.

Downloads: 2 This Week

Last Update: 2024-12-18
See Project
14

pyVideoTrans

Translate the video from one language to another and embed dubbing

...The tool supports both command-line and GUI modes, making it accessible to developers and creatives needing batch or automated processing.

Downloads: 10 This Week

Last Update: 2026-02-17
See Project
15

AutoCut

Cut videos with a text editor

...AutoCut supports multiple transcription backends, including Whisper and faster-whisper modes, allowing users to choose based on speed or accuracy needs. After editing the transcript text, the corresponding video clips are merged into the final output, and the tool also produces matching subtitle files. Its command-line interface can be integrated into scripts, making it suitable for automated workflows or batch processing.

Downloads: 3 This Week

Last Update: 2026-02-06
See Project
16

Apache Beam

Unified programming model for Batch and Streaming

Apache Beam is an open source, unified programming model to define both batch and streaming data-parallel processing pipelines, as well as certain language-specific SDKs for constructing pipelines and Runners. These pipelines are executed on one of Beam’s supported distributed processing back-ends, which include Apache Apex, Apache Flink, Apache Spark, and Google Cloud Dataflow. Beam is especially useful for Embarrassingly Parallel data processing tasks, and caters to the different needs and backgrounds of end users, SDK writers and runner writers.

Downloads: 0 This Week

Last Update: 2026-01-22
See Project
17

Lesan

New way to create web server and NoSQL data model

Lesan is a multilingual text processing and translation library designed for natural language processing (NLP) applications. It provides tools for text normalization, tokenization, and translation across multiple languages.

Downloads: 2 This Week

Last Update: 1 day ago
See Project
18

edge-tts

Use Microsoft Edge's online text-to-speech service from Python

...From the CLI you can adjust parameters such as speaking rate, volume, and pitch, giving you some control over prosody without diving into SSML. The library is asynchronous under the hood, which makes it efficient for batch jobs or web services that need to synthesize many utterances concurrently.

Downloads: 17 This Week

Last Update: 2025-12-12
See Project
19

RawTherapee

A powerful cross-platform raw photo processing program

...Users can work non-destructively: adjustments are previewed but stored separately (sidecar files) and only applied during export, which allows experimentation without altering originals. The interface includes a file browser, processing queue, editing pane with full-image previews, history stack, and batch export functionality.

Downloads: 3 This Week

Last Update: 2025-11-24
See Project
20

DeepSeek-OCR 2

Visual Causal Flow

DeepSeek-OCR-2 is the second-generation optical character recognition system developed to improve document understanding by introducing a “visual causal flow” mechanism, enabling the encoder to reorder visual tokens in a way that better reflects semantic structure rather than strict raster scan order. It is designed to handle complex layouts and noisy documents by giving the model causal reasoning capabilities that mimic human visual scanning behavior, enhancing OCR performance on documents...

Downloads: 9 This Week

Last Update: 2026-02-03
See Project
21

PDFCraft

PDFCraft is a free, privacy-focused PDF toolkit

PDFCraft is an extensible toolkit for creating, editing, and transforming PDF documents with both a graphical interface and a scripting API, making it useful for users ranging from casual editors to automated document processors. At its core, the project provides a clean, modern UI where you can rearrange pages, annotate text, insert images, fill forms, and export to multiple formats, all without needing a heavyweight commercial PDF suite. But beyond manual editing, it also offers a programmable layer so developers can write scripts to batch process documents, generate templated reports, or extract structured data from PDFs for integration in workflows. ...

Downloads: 7 This Week

Last Update: 4 days ago
See Project
22

compromise

Modest natural-language processing

Language is complicated and there's a gazillion words. Compromise is a javascript library that interprets and pre-parses text and makes some reasonable decisions so things are way easier. Compromise tries its best to parse text. it is small, quick, and often good-enough. It is not as smart as you'd think. Conjugate and negate verbs in any tense. Play between plural, singular and possessive forms. Interpret plain-text numbers. Handle implicit terms. Use it on the client-side or as an...

Downloads: 1 This Week

Last Update: 3 days ago
See Project
23

TeXworks

A simple interface for working with TeX documents

TeXworks is a free and simple working environment for authoring TeX (LaTeX, ConTeXt and XeTeX) documents. Inspired by Dick Koch's award-winning TeXShop program for Mac OS X, it makes entry into the TeX world easier for those using desktop operating systems other than OS X. It provides an integrated, easy-to-use environment for users on other platforms particularly GNU/Linux and Windows and features a clean, simple interface accessible to casual and non-technical users.

1 Review

Downloads: 109 This Week

Last Update: 2026-02-11
See Project
24

Ultimate Vocal Remover (UVR5)

GUI for a Vocal Remover that uses Deep Neural Networks

This application uses state-of-the-art source separation models to remove vocals from audio files. UVR's core developers trained all of the models provided in this package (except for the Demucs v3 and v4 4-stem models).

Downloads: 575 This Week

Last Update: 2025-01-20
See Project
25

Koodo Reader

A modern ebook manager and reader with sync and backup

...Customize the source folder and synchronize among multiple devices using OneDrive, iCloud, Dropbox, etc. Single-column, two-column, or continuous scrolling layouts. Text-to-speech, translation, progress slider, touch screen support, batch import. Add bookmarks, notes, highlights to your books. Adjust font size, font family, line-spacing, paragraph spacing, background color, text color, margins, and brightness. Night mode and theme color. Text highlight, underline, boldness, italics and shadow. Adjust font size, font family, line-spacing, paragraph spacing, background color, text color, margins, and brightness.

Downloads: 26 This Week

Last Update: 2026-02-21
See Project