Showing 54 open source projects for "batch text processing"

View related business solutions
  • Try Google Cloud Risk-Free With $300 in Credit Icon
    Try Google Cloud Risk-Free With $300 in Credit

    No hidden charges. No surprise bills. Cancel anytime.

    Use your credit across every product. Compute, storage, AI, analytics. When it runs out, 20+ products stay free. You only pay when you choose to.
    Start Free
  • Go From Idea to Deployed AI App Fast Icon
    Go From Idea to Deployed AI App Fast

    One platform to build, fine-tune, and deploy. No MLOps team required.

    Access Gemini 3 and 200+ models. Build chatbots, agents, or custom models with built-in monitoring and scaling.
    Try Free
  • 1
    Spring Batch

    Spring Batch

    Spring Batch is a framework for writing batch applications using Java

    A lightweight, comprehensive batch framework designed to enable the development of robust batch applications vital for the daily operations of enterprise systems. Spring Batch provides reusable functions that are essential in processing large volumes of records, including logging/tracing, transaction management, job processing statistics, job restart, skip, and resource management.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    Apache Beam

    Apache Beam

    Unified programming model for Batch and Streaming

    Apache Beam is an open source, unified programming model to define both batch and streaming data-parallel processing pipelines, as well as certain language-specific SDKs for constructing pipelines and Runners. These pipelines are executed on one of Beam’s supported distributed processing back-ends, which include Apache Apex, Apache Flink, Apache Spark, and Google Cloud Dataflow. Beam is especially useful for Embarrassingly Parallel data processing tasks, and caters to the different needs and backgrounds of end users, SDK writers and runner writers.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    Kapacitor

    Kapacitor

    Open source framework for processing, monitoring, and alerting

    Open source framework for processing, monitoring, and alerting on time series data. Kapacitor is a real-time data processing engine for monitoring and alerting, specifically designed to work with time-series data from InfluxDB.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 4
    Apache Spark

    Apache Spark

    A unified analytics engine for large-scale data processing

    Apache Spark is a unified engine for large-scale data processing, offering APIs for batch jobs, streaming, machine learning, and graph computation. It builds on resilient distributed datasets (RDDs) and the newer DataFrame/Dataset abstractions to provide fault-tolerant, in-memory computation across clusters. Spark’s execution engine handles scheduling, shuffles, caching, and data locality so users can focus on transformations rather than infrastructure plumbing.
    Downloads: 3 This Week
    Last Update:
    See Project
  • AI-generated apps that pass security review Icon
    AI-generated apps that pass security review

    Stop waiting on engineering. Build production-ready internal tools with AI—on your company data, in your cloud.

    Retool lets you generate dashboards, admin panels, and workflows directly on your data. Type something like “Build me a revenue dashboard on my Stripe data” and get a working app with security, permissions, and compliance built in from day one. Whether on our cloud or self-hosted, create the internal software your team needs without compromising enterprise standards or control.
    Try Retool free
  • 5
    gse

    gse

    Go efficient multilingual NLP and text segmentation

    Go efficient multilingual NLP and text segmentation; support English, Chinese, Japanese and others. Gse is implements jieba by golang, and try add NLP support and more feature. Support common, search engine, full mode, precise mode and HMM mode multiple word segmentation modes. Support user and embed dictionary, Part-of-speech/POS tagging, analyze segment info, stop and trim words. Support multilingual: English, Chinese, Japanese and others. Support Traditional Chinese. Support HMM cut text...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 6
    Apache Flink

    Apache Flink

    Stream processing framework with powerful stream

    Apache Flink is a distributed engine for stateful computations over data streams and batches, designed for low-latency processing at scale. Its core runtime executes dataflow graphs with fine-grained backpressure and checkpointing, allowing applications to recover consistently from failures. Flink’s event-time model and watermarks enable accurate out-of-order processing, windowing, and complex time semantics that typical real-time systems struggle with. Developers program against high-level...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    SageMaker Spark Container

    SageMaker Spark Container

    Docker image used to run data processing workloads

    Apache Spark™ is a unified analytics engine for large-scale data processing. It provides high-level APIs in Scala, Java, Python, and R, and an optimized engine that supports general computation graphs for data analysis. It also supports a rich set of higher-level tools including Spark SQL for SQL and DataFrames, MLlib for machine learning, GraphX for graph processing, and Structured Streaming for stream processing.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    TorchQuantum

    TorchQuantum

    A PyTorch-based framework for Quantum Classical Simulation

    ...Researchers on quantum algorithm design, parameterized quantum circuit training, quantum optimal control, quantum machine learning, and quantum neural networks. Dynamic computation graph, automatic gradient computation, fast GPU support, batch model terrorized processing.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    Asynq

    Asynq

    Simple, reliable, and efficient distributed task queue in Go

    Asynq is a Go library for queueing tasks and processing them asynchronously with workers. It's backed by Redis and is designed to be scalable yet easy to get started. Client puts tasks on a queue. Server pulls tasks off queues and starts a worker goroutine for each task. Tasks are processed concurrently by multiple workers. Task queues are used as a mechanism to distribute work across multiple machines. A system can consist of multiple worker servers and brokers, giving way to high...
    Downloads: 1 This Week
    Last Update:
    See Project
  • Custom VMs From 1 to 96 vCPUs With 99.95% Uptime Icon
    Custom VMs From 1 to 96 vCPUs With 99.95% Uptime

    General-purpose, compute-optimized, or GPU/TPU-accelerated. Built to your exact specs.

    Live migration and automatic failover keep workloads online through maintenance. One free e2-micro VM every month.
    Try Free
  • 10
    BentoML

    BentoML

    Unified Model Serving Framework

    BentoML simplifies ML model deployment and serves your models at a production scale. Support multiple ML frameworks natively: Tensorflow, PyTorch, XGBoost, Scikit-Learn and many more! Define custom serving pipeline with pre-processing, post-processing and ensemble models. Standard .bento format for packaging code, models and dependencies for easy versioning and deployment. Integrate with any training pipeline or ML experimentation platform. Parallelize compute-intense model inference...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    Botonic

    Botonic

    Build chatbots and conversational experiences using React

    Botonic is a full-stack Javascript framework to create chatbots and modern conversational apps that work on multiple platforms, web, mobile and messaging apps (Messenger, Whatsapp, Telegram, etc). Building modern applications on top of messaging apps like Whatsapp or Messenger is much more than creating simple text-based chatbots. Botonic is a full-stack serverless framework that combines the power of React and Tensorflow.js to create amazing experiences at the intersection of text and...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    Apache InLong

    Apache InLong

    Apache InLong - a one-stop integration framework for massive data

    Apache InLong is a one-stop integration framework for massive data that provides automatic, secure and reliable data transmission capabilities. InLong supports both batch and stream data processing at the same time, which offers great power to build data analysis, modeling and other real-time applications based on streaming data. InLong (应龙) is a divine beast in Chinese mythology who guides the river into the sea, and it is regarded as a metaphor of the InLong system for reporting data streams. InLong was originally built at Tencent, which has served online businesses for more than 8 years, to support massive data (data scale of more than 80 trillion pieces of data per day) reporting services in big data scenarios. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 13
    GoAWK

    GoAWK

    A POSIX-compliant AWK interpreter written in Go, with CSV support

    GoAWK now uses a bytecode compiler and includes native support for CSV files. AWK is a fascinating text processing language, and The AWK Programming Language is a wonderfully concise book describing it. The A, W, and K in AWK stand for the surnames of the three original creators: Alfred Aho, Peter Weinberger, and Brian Kernighan. Kernighan is also an author of The C Programming Language (“K&R”), and the two books have that same each-page-packs-a-punch feel.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 14
    GATE
    NOTE THAT THE SOURCE CODE AND ISSUE TRACKER HAVE NOW MOVED TO GITHUB. FIND US AT https://github.com/GateNLP/ GATE (General Architecture for Text Engineering) is an architecture, framework and development environment for developing, evaluating and embedding Human Language Technology. See http://gate.ac.uk for full details.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 15
    threeddonut

    threeddonut

    3D donut. Example of frojasg1.com libraries usage

    The application shows a 3D donut, that can be rotated with two sliders in both axis. It is a simple example of what can be done with frojasg1.com platform libraries: - Zoom option for components - Multi language - Dark mode option - Automatic Undo-Redo for text components, with popup menu included - Text Search/Replace window prepared to be used. - Base components for auto-completion windows. - Automatic component relocation after redimensioning a...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    towhee

    towhee

    Framework that is dedicated to making neural data processing

    ...Towhee includes a pythonic method-chaining API for describing custom data processing pipelines. We also support schemas, making processing unstructured data as easy as handling tabular data.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    WebHarvest - web data extraction tool
    Web data extraction (web data mining, web scraping) tool. It leverages well proved XML and text processing techologies in order to easely extract useful data from arbitrary web pages.
    Downloads: 10 This Week
    Last Update:
    See Project
  • 18

    libCIGI

    C++ Library for the Common Image Generator Interface

    ...Currently versions 3.0, 3.2 and 3.3 are supported as well as support for the *draft* V4.0 CIGI standard. Additional functionality to the base packet interfaces is provided through external classes so that to the packet headers have no further dependencies. Packet processing is kept simple and a couple of simple helper functions / classes are provided in CIGIGeneric.h and some other headers to support this. Helpers are provided for version interpretation and conversion of packet parameters to text. Testing is supported using the boost test framework to develop unit tests for each of the packets. ...
    Downloads: 6 This Week
    Last Update:
    See Project
  • 19
    Scotty

    Scotty

    Haskell web framework inspired by Ruby's Sinatra, using WAI and Warp

    Scotty is a lightweight Haskell web framework inspired by Ruby’s Sinatra. It allows developers to build RESTful web applications and APIs with minimal boilerplate. Scotty is built on top of the WAI (Web Application Interface) and Warp server, making it fast and scalable. It emphasizes simplicity and ease of use, making it ideal for small- to medium-sized services or for developers learning web programming in Haskell.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    VideoSrt

    VideoSrt

    Windows-GUI

    ...VideoSrtIt is written in Golanglanguage and developed based on lxn/walk Windows-GUI toolkit. Open source software tool that can recognize video speech and automatically generate subtitle SRT files. It is suitable for business scenarios that quickly and batch generate Chinese/English subtitles and text files for media (video/audio). Recognize video/audio speech to generate subtitle files (support Chinese-English translation, bilingual subtitles) Extract speech text from video/audio. Batch translation, filter processing/encoding SRT subtitle files. Using the Alibaba Cloud speech recognition interface, the accuracy is high, and the standard Mandarin/English recognition rate is over 95%. ...
    Downloads: 48 This Week
    Last Update:
    See Project
  • 21
    OpenPrompt

    OpenPrompt

    An Open-Source Framework for Prompt-Learning

    Prompt-learning is the latest paradigm to adapt pre-trained language models (PLMs) to downstream NLP tasks, which modifies the input text with a textual template and directly uses PLMs to conduct pre-trained tasks. OpenPrompt is a library built upon PyTorch and provides a standard, flexible and extensible framework to deploy the prompt-learning pipeline. OpenPrompt supports loading PLMs directly from huggingface transformers. In the future, we will also support PLMs implemented by other...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    SpringAll

    SpringAll

    Step by step, learn Spring Boot, Spring Boot & Shiro, Spring Batch

    SpringAll is a comprehensive learning project that gathers a wide range of Spring, Spring Boot, and Spring Cloud demos in one repository. It is designed for developers who want to deepen their understanding of the Spring ecosystem by exploring concrete, runnable code samples. Each module focuses on a specific technology or integration—covering web applications, ORM frameworks, microservices, caching, messaging, security, distributed systems, and monitoring. The repository emphasizes both...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 23
    Twint

    Twint

    An advanced Twitter scraping & OSINT tool written in Python

    Twint is an advanced open-source Twitter scraping and OSINT tool written in Python that extracts tweets, user data, followers, likes, and more—without relying on Twitter’s API—making it highly useful for researchers, analysts, and hobbyists who want to bypass rate limits and access public Twitter data.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 24
    X-DeepLearning

    X-DeepLearning

    An industrial deep learning framework for high-dimension sparse data

    ...Background: XDL1.0 focuses on throughput optimization and adopts the one request per thread processing model, which can significantly improve the limit throughput under ultra-high concurrency.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25

    Safe Harbor Deidentification

    Safe Harbor Deidentification for medical documents

    Phalanx - Deidentify Safe Harbor Deidentification Mode of Phalanx is an abridged pipeline of NLP annotators culminating in NER annotators which write output of text offsets. It uses the Safe Harbor deidentification method.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • Next
MongoDB Logo MongoDB