Showing 140 open source projects for "batch text processing"

View related business solutions
  • Cut Your Data Warehouse Bill by 54% Icon
    Cut Your Data Warehouse Bill by 54%

    Migrate from Snowflake, Redshift, or Databricks with free tools. No SQL rewrites.

    BigQuery delivers 54% lower TCO with serverless scale and flexible pricing. Free migration tools handle the SQL translation automatically.
    Try Free
  • Fully Managed MySQL, PostgreSQL, and SQL Server Icon
    Fully Managed MySQL, PostgreSQL, and SQL Server

    Automatic backups, patching, replication, and failover. Focus on your app, not your database.

    Cloud SQL handles your database ops end to end. Migrate from on-prem or other clouds with free migration tools.
    Try Free
  • 1
    pdfcpu

    pdfcpu

    A PDF processor written in Go

    ...The main focus lies on strong support for batch processing and scripting via a rich command line. At the same time pdfcpu wants to make it easy to integrate PDF processing into your Go-based backend system by providing a robust command set. Always make sure your work is based on the latest commit! pdfcpu is still Alpha - bugfixes are committed on the fly and will be mentioned in the next release notes.
    Downloads: 19 This Week
    Last Update:
    See Project
  • 2
    PDFCraft

    PDFCraft

    PDFCraft is a free, privacy-focused PDF toolkit

    PDFCraft is an extensible toolkit for creating, editing, and transforming PDF documents with both a graphical interface and a scripting API, making it useful for users ranging from casual editors to automated document processors. At its core, the project provides a clean, modern UI where you can rearrange pages, annotate text, insert images, fill forms, and export to multiple formats, all without needing a heavyweight commercial PDF suite. But beyond manual editing, it also offers a programmable layer so developers can write scripts to batch process documents, generate templated reports, or extract structured data from PDFs for integration in workflows. ...
    Downloads: 7 This Week
    Last Update:
    See Project
  • 3
    TeXworks

    TeXworks

    A simple interface for working with TeX documents

    TeXworks is a free and simple working environment for authoring TeX (LaTeX, ConTeXt and XeTeX) documents. Inspired by Dick Koch's award-winning TeXShop program for Mac OS X, it makes entry into the TeX world easier for those using desktop operating systems other than OS X. It provides an integrated, easy-to-use environment for users on other platforms particularly GNU/Linux and Windows and features a clean, simple interface accessible to casual and non-technical users.
    Downloads: 118 This Week
    Last Update:
    See Project
  • 4
    OCRmyPDF

    OCRmyPDF

    OCRmyPDF adds an OCR text layer to scanned PDF files

    OCRmyPDF adds an optical character recognition (OCR) text layer to scanned PDF files, allowing them to be searched. PDF is the best format for storing and exchanging scanned documents. Unfortunately, PDFs can be difficult to modify. OCRmyPDF makes it easy to apply image processing and OCR (recognized, searchable text) to existing PDFs.
    Downloads: 72 This Week
    Last Update:
    See Project
  • Go From Idea to Deployed AI App Fast Icon
    Go From Idea to Deployed AI App Fast

    One platform to build, fine-tune, and deploy. No MLOps team required.

    Access Gemini 3 and 200+ models. Build chatbots, agents, or custom models with built-in monitoring and scaling.
    Try Free
  • 5
    Unredact

    Unredact

    A simple tool for reading in poorly redacted documents

    Unredact is a specialized tool that attempts to reconstruct redacted or obscured text in images, PDFs, or screenshots using a combination of image processing and generative AI inference to suggest plausible completions of blurred, black-boxed, or jumbled content. Unlike traditional optical character recognition (OCR), which only reads visible text, Unredact focuses on inferring missing content where redaction has been applied by analyzing surrounding context, font characteristics, and linguistic patterns to produce candidate reconstructions. ...
    Downloads: 15 This Week
    Last Update:
    See Project
  • 6
    html-loader

    html-loader

    HTML Loader

    ...Filter can also be used to extend the supported elements and attributes. By default, the parser in html-loader interprets content inside noscript tags as #text, so processing of content inside this tag will be ignored. A very common scenario is exporting the HTML into their own .html file, to serve them directly instead of injecting with javascript.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 7
    mp-html

    mp-html

    Small program rich text component, supports rendering and editing html

    A powerful applet-rich text component. Small program rich text component supports rendering and editing HTML and supports use on WeChat, QQ, Baidu, Alipay, Toutiao, and uni-app platforms. Displaying dynamic HTML rich text is a necessary requirement for many applications. The applet platform does not support dom operations, making this a problem. The built-in rich-text component supports few tags and blocks all events, making it difficult for practical application. Therefore, there is such a...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    Deep Blue Thesaurus Conversion

    Deep Blue Thesaurus Conversion

    An open source free input method thesaurus conversion program

    ...The FIT input method is the most widely used input method under Mac. Support import and export of thesaurus in text format. We have a user thesaurus of the Sogou input method under Windows, which needs to be imported into the FIT input method under Mac, then you can use the user thesaurus as the source in the dark blue thesaurus conversion, and select the FIT input method as the target thesaurus.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 9
    OCRBase

    OCRBase

    MD/.JSON Document OCR and structured data extraction API

    ...It includes real-time job progress updates via WebSockets, which makes it easier to integrate into UIs, dashboards, or ingestion systems where users need feedback on long-running document processing.
    Downloads: 1 This Week
    Last Update:
    See Project
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • 10
    Text Encoding Initiative

    Text Encoding Initiative

    TEI produces the TEI Guidelines and associated software

    The TEI is an international and interdisciplinary standard used by libraries, museums, publishers, and academics to represent all kinds of literary and linguistic texts, using an encoding scheme that is maximally expressive and minimally obsolescent.
    Downloads: 10 This Week
    Last Update:
    See Project
  • 11
    WorkerVless2sub

    WorkerVless2sub

    Automate batch replacement to generate subscription generators

    WorkerVless2sub is a dedicated subscription generator focused on providing “preferred line” subscriptions by processing VMess, VLESS, Trojan nodes, filtering and replacing automatically via a Cloudflare Worker script. The idea is you supply node lists (or use existing APIs/CSV sources) and the worker filters them by criteria like speed or reliability and then emits a subscription link for end-users. It supports deployment on Cloudflare Pages or Workers, offers configuration via variables...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 12
    Biome

    Biome

    A toolchain for web projects, aimed to provide functionalities

    Biome formats and lints your code in a fraction of a second. Biome supports JavaScript, TypeScript, JSON, and CSS. It aims to support all main languages of modern web development. Biome has sane defaults and requires minimal configuration. Biome helps you as much as possible by displaying detailed and contextualized diagnostics. Biome unifies functionality that has previously been separate tools. Building upon a shared base allows us to provide a cohesive experience for processing code,...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 13
    BibDesk

    BibDesk

    Bibliography manager for Mac OS X

    BibDesk is a graphical bibTeX bibliography manager for Mac OS X.
    Leader badge
    Downloads: 2,892 This Week
    Last Update:
    See Project
  • 14
    Amazon ECS Container Agent

    Amazon ECS Container Agent

    Amazon Elastic Container Service Agent

    Run highly secure, reliable, and scalable containers. Launch thousands of containers across the cloud using your preferred continuous integration and delivery (CI/CD) and automation tools. Optimize your time with AWS Fargate serverless compute for containers, which eliminates the need to configure and manage control plane, nodes, and instances. Save up to 50 percent on compute costs with autonomous provisioning, auto-scaling, and pay-as-you-go pricing. Integrate seamlessly with AWS...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    SingleFile

    SingleFile

    Web Extension for Firefox/Chrome/MS Edge and CLI tool

    Web Extension for Firefox/Chrome/MS Edge and CLI tool to save a faithful copy of an entire web page in a single HTML file. SingleFile is a Web Extension (and a CLI tool) compatible with Chrome, Firefox (Desktop and Mobile), Microsoft Edge, Vivaldi, Brave, Waterfox, Yandex Browser, and Opera. It helps you to save a complete web page into a single HTML file. Wait until the page is fully loaded. Click on the SingleFile button in the extension toolbar to save the page. You can click again on the...
    Downloads: 5 This Week
    Last Update:
    See Project
  • 16
    XML Copy Editor
    XML Copy Editor is a fast, free, validating XML editor.
    Leader badge
    Downloads: 1,006 This Week
    Last Update:
    See Project
  • 17
    biber
    Biber is a sophisticated bibliography processing backend for the LaTeX biblatex package. It supports a unsurpassed feature set for automated conformance to complex bibliography style requirements such as labelling, sorting and name handling. It has comprehensive Unicode support.
    Leader badge
    Downloads: 340 This Week
    Last Update:
    See Project
  • 18
    ant4docbook

    ant4docbook

    ANT4DOCBOOK is an ANT task for DOCBOOK

    ANT4DOCBOOK is an ANT task for DOCBOOK, a semantic markup language for technical documentation.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 19
    biblatex
    Biblatex is a LaTeX package which provides full-featured bibliographic facilities
    Leader badge
    Downloads: 72 This Week
    Last Update:
    See Project
  • 20

    xmlj

    XMLJ is a Java XML Editor and validator project.

    XMLJ is a Java XML Editor and validator project.
    Downloads: 6 This Week
    Last Update:
    See Project
  • 21
    Microsoft Works format import library
    libwps is a Microsoft Works file format import filter based on top of the librevenge (see https://sourceforge.net/p/libwpd/wiki/librevenge/ ). Currently, libwps can import all word processing Works formats since about 1995 with some success. It may also be able to import some basic database and spreadsheet files.
    Leader badge
    Downloads: 351 This Week
    Last Update:
    See Project
  • 22
    Chord5

    Chord5

    A version of CHORD4 updated to cooperate with ChordSmith.

    CHORD5 is a ChordPro editor and renderer, useful for formatting and printing song sheets ("lead sheets"). This version of the CHORD program (based on CHORD4) has been modified to add functionality that enables it to cooperate with the ChordSmith program (available at https://sourceforge.net/projects/chordsmith/). This modified version has been renamed to CHORD5. Although it is revised to work with ChordSmith, it also works well as a standalone program. NOTE: If you have...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 23
    TextureAtlas Toolbox

    TextureAtlas Toolbox

    A powerful, free and open-source tool for TextureAtlases/Spritesheets

    TextureAtlas Toolbox is an all-in-one solution for working with texture atlases and sprite sheets. Extract sprites into organized frame collections and GIF/WebP/APNG animations, generate optimized atlases from individual frames, or convert between 15+ atlas formats. Perfect for game developers, modders, and anyone creating showcases of game sprites. Formerly known as TextureAtlas to GIFs and Frames Licensed under AGPL-3.0 Third-party licenses: See...
    Leader badge
    Downloads: 18 This Week
    Last Update:
    See Project
  • 24
    Cclite

    Cclite

    Cclite Alternative Currency Software

    ...Multi-registry (group), multi-currency, with inter-registry transactions using web services (SOAP,REST), with rough templates for 17 languages. Various payment interfaces email, SMS, jabber, batch. User manual. Note Cclite is NOT crypto, it's mutual social credit! https://github.com/hbarnard/cclite-android-app this is now also here as: https://sourceforge.net/projects/cclite-android-app
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    WebHarvest - web data extraction tool
    Web data extraction (web data mining, web scraping) tool. It leverages well proved XML and text processing techologies in order to easely extract useful data from arbitrary web pages.
    Downloads: 10 This Week
    Last Update:
    See Project
MongoDB Logo MongoDB