Showing 117 open source projects for "text batch processing tools"

View related business solutions
  • Go From AI Idea to AI App Fast Icon
    Go From AI Idea to AI App Fast

    One platform to build, fine-tune, and deploy ML models. No MLOps team required.

    Access Gemini 3 and 200+ models. Build chatbots, agents, or custom models with built-in monitoring and scaling.
    Try Free
  • Try Google Cloud Risk-Free With $300 in Credit Icon
    Try Google Cloud Risk-Free With $300 in Credit

    No hidden charges. No surprise bills. Cancel anytime.

    Use your credit across every product. Compute, storage, AI, analytics. When it runs out, 20+ products stay free. You only pay when you choose to.
    Start Free
  • 1
    funNLP

    funNLP

    Resources, corpora, and tools for Chinese natural language processing

    FunNLP is a large, curated collection of resources, corpora, and tools for Chinese natural language processing (NLP). It aggregates datasets, lexicons, wordlists, sentiment dictionaries, knowledge graphs, and pretrained model references, serving as a one-stop resource hub for Chinese NLP practitioners. The repository is organized into categories such as sentiment analysis, text classification, named entity recognition, knowledge graphs, and various lexicons (e.g. sensitive words, emotion dictionaries, stopwords). ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    UniEM

    UniEM

    Unified embedding model

    UniEM is a unified embedding model designed to create high-quality text embeddings for various natural language processing tasks.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    Demucs

    Demucs

    Code for the paper Hybrid Spectrogram and Waveform Source Separation

    Demucs (Deep Extractor for Music Sources) is a deep-learning framework for music source separation—extracting individual instrument or vocal tracks from a mixed audio file. The system is based on a U-Net-like convolutional architecture combined with recurrent and transformer elements to capture both short-term and long-term temporal structure. It processes raw waveforms directly rather than spectrograms, allowing for higher-quality reconstruction and fewer artifacts in separated tracks. The...
    Downloads: 63 This Week
    Last Update:
    See Project
  • 4
    TXM

    TXM

    Unicode XML TEI text analysis platform

    TXM is a free and open-source cross-platform Unicode & XML based text analysis environment and graphical client, supporting Windows, Linux and Mac OS X. It can also be used online as a J2EE standard compliant web portal (GWT based) with access control built in. DOWNLOAD LATEST VERSION OF TXM : http://textometrie.ens-lyon.fr/spip.php?rubrique61&lang=en TXM offers a comprehensive range of analysis tools (concordances, collocate search, frequency lists, etc.) based on the powerfull CQP...
    Downloads: 9 This Week
    Last Update:
    See Project
  • AI-powered service management for IT and enterprise teams Icon
    AI-powered service management for IT and enterprise teams

    Enterprise-grade ITSM, for every business

    Give your IT, operations, and business teams the ability to deliver exceptional services—without the complexity. Maximize operational efficiency with refreshingly simple, AI-powered Freshservice.
    Try it Free
  • 5
    Botpress

    Botpress

    Dev tools to reliably understand text and automate conversations

    We make building chatbots much easier for developers. We have put together the boilerplate code and infrastructure you need to get a chatbot up and running. We propose you a complete dev-friendly platform that ships with all the tools you need to build, deploy and manage production-grade chatbots in record time. Built-in Natural Language Processing tasks such as intent recognition, spell checking, entity extraction, and slot tagging (and many others). A visual conversation studio to design...
    Downloads: 12 This Week
    Last Update:
    See Project
  • 6
    Promptify

    Promptify

    se GPT or other prompt based models to get structured output

    Promptify is an open-source Python library designed to simplify prompt engineering and the development of natural language processing pipelines using large language models. The project provides tools that help developers generate structured prompts for different NLP tasks and apply them across multiple generative AI systems. Instead of manually crafting prompts for each task, Promptify introduces a unified architecture that combines prompt templates, language model interfaces, and processing pipelines into a single framework. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    ChatGPT Advanced

    ChatGPT Advanced

    Browser extension adding web search results to ChatGPT prompts easily

    chatgpt-advanced, commonly known as WebChatGPT, is an open source browser extension designed to enhance the capabilities of ChatGPT by integrating real-time web search results into user prompts. It works by intercepting queries submitted to the ChatGPT interface and optionally augmenting them with information gathered from search engines before sending the prompt to the chatbot. This approach allows the model to generate responses that are more current and contextually relevant compared to...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    Riffusion

    Riffusion

    Real-time music generation using stable diffusion techniques AI

    ...It implements a diffusion pipeline that supports prompt interpolation, allowing smooth transitions between different musical styles or prompts over time. Riffusion (hobby) serves as the core implementation for audio and image processing, providing essential building blocks for generating music from text prompts. It includes both developer-oriented tools and user-facing components such as a command-line interface and an interactive Streamlit application for experimentation. Additionally, it can run as a Flask server to expose model inference through an API, enabling integration with other applications or services.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 9
    FrankMocap

    FrankMocap

    A Strong and Easy-to-use Single View 3D Hand+Body Pose Estimator

    ...The pipeline couples a robust 2D keypoint detector with 3D mesh regression networks and priors that keep results anatomically plausible. It can run frame-by-frame or with temporal smoothing, and includes demo apps for live webcam capture as well as batch processing. Outputs include textured meshes, joint locations, and model parameters that can be exported to common DCC tools and game engines. The codebase offers pretrained models, clear inference scripts, and utilities to visualize results, making single-camera motion capture approachable on commodity hardware. Researchers and creators use it for motion studies, AR/VR prototyping, character animation, and human-in-the-loop editing.
    Downloads: 2 This Week
    Last Update:
    See Project
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • 10
    Super-PDF-Editor

    Super-PDF-Editor

    World's most comprehensive, powerful, process-based PDF editor

    World's most comprehensive, powerful, process-based and lighting fast PDF reader, editor and batch processor. PDF editing with 60+ features rich tools and function like OCR pdf and images and produce output like searchable PDF, Text, Hocr, Box, Unlv. Also, improve image enhancement before OCR operation for better OCR performance. pdf Imposition, etc. Super PDF Editor is best for bulk pdf processing, especially for the printing industry.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 11
    Super-PDF-Editor-Lite

    Super-PDF-Editor-Lite

    World's most comprehensive, powerful, process-based PDF editor

    World's most comprehensive, powerful, process-based and lighting fast PDF reader, editor and batch processor. Includes features like Create PDF from Images, HTML, Text files. Create a processing log file. Extract Page, Split Page, Rotate Page, Merge Page, Duplicate page, Move Page, Printing, and Compress Page. Improve image enhancement before OCR operation for better OCR performance. pdf Imposition, etc. Super PDF Editor is best for bulk pdf processing, especially for the printing industry. ...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 12
    Pattern

    Pattern

    Web mining module for Python, with tools for scraping

    ...In addition to data mining features, the library offers natural language processing functionality including part-of-speech tagging, sentiment analysis, and n-gram extraction. The framework also includes machine learning algorithms that support classification, clustering, and vector space modeling for text analysis tasks. Another component of the library provides tools for analyzing and visualizing networks, making it useful for studying relationships between entities in large datasets.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    VideoSrt

    VideoSrt

    Windows-GUI

    ...VideoSrtIt is written in Golanglanguage and developed based on lxn/walk Windows-GUI toolkit. Open source software tool that can recognize video speech and automatically generate subtitle SRT files. It is suitable for business scenarios that quickly and batch generate Chinese/English subtitles and text files for media (video/audio). Recognize video/audio speech to generate subtitle files (support Chinese-English translation, bilingual subtitles) Extract speech text from video/audio. Batch translation, filter processing/encoding SRT subtitle files. Using the Alibaba Cloud speech recognition interface, the accuracy is high, and the standard Mandarin/English recognition rate is over 95%. ...
    Downloads: 25 This Week
    Last Update:
    See Project
  • 14
    Next Generation Programming

    Next Generation Programming

    Compose Software Without Writing Any Programing Code

    "Next Generation Programming - Programming Without Coding Software" is a drag-drop wizard for creating simple or complex applications without writing any programming language code The Software is coded/designed with "Java Programming Language" for novice/expert programmers; Programmers can write softwares with visual tools : drag-drop components;visual editors... Programmers can use the software to compose of simple/complex applications : Database programs, circuit design, generate...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 15

    MITRE Annotation Toolkit

    A toolkit for managing and manipulating text annotations

    The MITRE Annotation Toolkit (MAT) is a suite of tools which can be used for automated and human tagging of annotations. Annotation is a process, used mostly by researchers in natural language processing, of enhancing documents with information about the various phrase types the documents contain. MAT supports both UI interaction and command-line interaction, and provides various levels of control over the overall annotation process. It can be customized for specific tasks (e.g.,...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    Self-Attentive Parser

    Self-Attentive Parser

    High-accuracy NLP parser with models for 11 languages

    LightAutoML is an automated machine learning (AutoML) framework developed by Sberbank AI Lab, designed to facilitate the development of machine learning models with minimal human intervention.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    fastNLP

    fastNLP

    fastNLP: A Modularized and Extensible NLP Framework

    fastNLP is a lightweight framework for natural language processing (NLP), the goal is to quickly implement NLP tasks and build complex models. A unified Tabular data container simplifies the data preprocessing process. Built-in Loader and Pipe for multiple datasets, eliminating the need for preprocessing code. Various convenient NLP tools, such as Embedding loading (including ELMo and BERT), intermediate data cache, etc..
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    CC-Net

    CC-Net

    Tools to download and cleanup Common Crawl data

    cc_net provides tools to download, segment, clean, and filter Common Crawl to build large-scale text corpora, including monolingual datasets and the multilingual CC-100 collection introduced in the associated paper. It includes pipelines to fetch snapshots, extract text, de-duplicate, identify language, and apply quality filtering based on heuristics and language models. The outputs are intended for pretraining language models and for creating standardized corpora that can be reproduced or...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    NLP-Models-Tensorflow

    NLP-Models-Tensorflow

    Gathers machine learning and Tensorflow deep learning models for NLP

    NLP-Models-Tensorflow is a collection of natural language processing model implementations built using the TensorFlow deep learning framework. The repository provides numerous examples of neural network architectures used in modern NLP research and applications, including text classification, language modeling, machine translation, and sentiment analysis. Each model implementation is designed to illustrate how common NLP architectures operate, such as recurrent neural networks, convolutional...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    NLP Best Practices

    NLP Best Practices

    Natural Language Processing Best Practices & Examples

    In recent years, natural language processing (NLP) has seen quick growth in quality and usability, and this has helped to drive business adoption of artificial intelligence (AI) solutions. In the last few years, researchers have been applying newer deep learning methods to NLP. Data scientists started moving from traditional methods to state-of-the-art (SOTA) deep neural network (DNN) algorithms which use language models pretrained on large text corpora.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    PyTorch Natural Language Processing

    PyTorch Natural Language Processing

    Basic Utilities for PyTorch Natural Language Processing (NLP)

    PyTorch-NLP is a library for Natural Language Processing (NLP) in Python. It’s built with the very latest research in mind, and was designed from day one to support rapid prototyping. PyTorch-NLP comes with pre-trained embeddings, samplers, dataset loaders, metrics, neural network modules and text encoders. It’s open-source software, released under the BSD3 license. With your batch in hand, you can use PyTorch to develop and train your model using gradient descent. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    gpt2-client

    gpt2-client

    Easy-to-use TensorFlow Wrapper for GPT-2 117M, 345M, 774M, etc.

    GPT-2 is a Natural Language Processing model developed by OpenAI for text generation. It is the successor to the GPT (Generative Pre-trained Transformer) model trained on 40GB of text from the internet. It features a Transformer model that was brought to light by the Attention Is All You Need paper in 2017. The model has 4 versions - 124M, 345M, 774M, and 1558M - that differ in terms of the amount of training data fed to it and the number of parameters they contain. Finally, gpt2-client is a...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    FalaBrasil

    FalaBrasil

    Resources for speech processing in Brazilian Portuguese

    The FalaBrasil Group provides free tools and resources for speech and natural language processing in Brazilian Portuguese, most of them under the BSD license. Tools include mainly scripts to do all sort of things with audio and text, whereas resources include ready-to-used acoustic and languages models, phonetic dictionaries, etc.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 24
    Welsh Natural Language Toolkit
    The project supports the Welsh Language Technology domain with a set of NLP tools that drive innovation and advance the development of sophisticated textual analysis solutions. The WNLT project delivers four core NLP modules; a) Word Segmentation for separating text into words b) Sentence Boundary Disambiguation for finding sentence boundaries c) Part of Speech Tagger for determining the part of speech of each word d) Morphological Analyser for identifying the root form (lemma) of words....
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25

    JCLTP

    A Java Class Library for Text Processing

    JCLTP is a class library designed for processing text. JCLTP is free, open source and developed with the Java programming language. JCLTP is distributed under the GNU license. It incorporates several technologies that enable process information while applying AI techniques, in order to build predictive models for text classification. Through a flexible structure of interfaces and classes, the opportunity to extend, adapt and add functionality JCLTP is provided. Thus, analysis of new types...
    Downloads: 0 This Week
    Last Update:
    See Project
MongoDB Logo MongoDB