Search Results for "text batch processing tools" - Page 4

Showing 122 open source projects for "text batch processing tools"

View related business solutions
  • Go From AI Idea to AI App Fast Icon
    Go From AI Idea to AI App Fast

    One platform to build, fine-tune, and deploy ML models. No MLOps team required.

    Access Gemini 3 and 200+ models. Build chatbots, agents, or custom models with built-in monitoring and scaling.
    Try Free
  • Compliant and Reliable File Transfers Backed by Top Security Certifications Icon
    Compliant and Reliable File Transfers Backed by Top Security Certifications

    Cerberus FTP Server delivers SOC 2 Type II certified security and FIPS 140-2 validated encryption.

    Stop relying on non-certified, legacy file transfer tools that creak under the weight of modern security demands. Get full audit trails, advanced access controls and more supported by an award-winning team of experts. Start your free 25-day trial today.
    Start Free Trial
  • 1
    EmotiVoice

    EmotiVoice

    Multi-Voice and Prompt-Controlled TTS Engine

    EmotiVoice is a multi-voice, prompt-controlled text-to-speech engine designed to generate highly expressive speech across thousands of voices. It supports both English and Chinese and ships with over 2,000 preset voices, making it suitable for everything from characters and virtual anchors to narration and dialogue. The core idea is prompt-based emotional and style control: you can ask the engine to speak “happy,” “sad,” “excited,” or with other high-level style prompts that shape prosody,...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 2
    text-dedup

    text-dedup

    All-in-one text de-duplication

    text-dedup is a Python library that enables efficient deduplication of large text corpora by using MinHash and other probabilistic techniques to detect near-duplicate content. This is especially useful for NLP tasks where duplicated training data can skew model performance. text-dedup scales to billions of documents and offers tools for chunking, hashing, and comparing text efficiently with low memory usage. It supports Jaccard similarity thresholding, parallel execution, and flexible...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    ddgr

    ddgr

    DuckDuckGo from the terminal

    ...It fetches search results via DuckDuckGo’s API or HTML output and presents links, snippets, and metadata in a clean terminal format, making it useful for programmers, sysadmins, and privacy advocates who prefer keyboard-driven workflows. The tool also supports options like opening a selected result in a web browser, piping results into other tools, and restricting searches to specific formats such as text-only or JSON for further processing. Because it avoids third-party tracking and ads built into many browser search experiences, ddgr appeals to users seeking greater control over data and a faster, distraction-free search flow.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    towhee

    towhee

    Framework that is dedicated to making neural data processing

    ...Towhee includes a pythonic method-chaining API for describing custom data processing pipelines. We also support schemas, making processing unstructured data as easy as handling tabular data.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Custom VMs From 1 to 96 vCPUs With 99.95% Uptime Icon
    Custom VMs From 1 to 96 vCPUs With 99.95% Uptime

    General-purpose, compute-optimized, or GPU/TPU-accelerated. Built to your exact specs.

    Live migration and automatic failover keep workloads online through maintenance. One free e2-micro VM every month.
    Try Free
  • 5
    funNLP

    funNLP

    Resources, corpora, and tools for Chinese natural language processing

    FunNLP is a large, curated collection of resources, corpora, and tools for Chinese natural language processing (NLP). It aggregates datasets, lexicons, wordlists, sentiment dictionaries, knowledge graphs, and pretrained model references, serving as a one-stop resource hub for Chinese NLP practitioners. The repository is organized into categories such as sentiment analysis, text classification, named entity recognition, knowledge graphs, and various lexicons (e.g. sensitive words, emotion dictionaries, stopwords). ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    UniEM

    UniEM

    Unified embedding model

    UniEM is a unified embedding model designed to create high-quality text embeddings for various natural language processing tasks.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    Demucs

    Demucs

    Code for the paper Hybrid Spectrogram and Waveform Source Separation

    Demucs (Deep Extractor for Music Sources) is a deep-learning framework for music source separation—extracting individual instrument or vocal tracks from a mixed audio file. The system is based on a U-Net-like convolutional architecture combined with recurrent and transformer elements to capture both short-term and long-term temporal structure. It processes raw waveforms directly rather than spectrograms, allowing for higher-quality reconstruction and fewer artifacts in separated tracks. The...
    Downloads: 70 This Week
    Last Update:
    See Project
  • 8
    Promptify

    Promptify

    se GPT or other prompt based models to get structured output

    Promptify is an open-source Python library designed to simplify prompt engineering and the development of natural language processing pipelines using large language models. The project provides tools that help developers generate structured prompts for different NLP tasks and apply them across multiple generative AI systems. Instead of manually crafting prompts for each task, Promptify introduces a unified architecture that combines prompt templates, language model interfaces, and processing pipelines into a single framework. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    EpiDoc: Epigraphic Documents in TEI XML

    EpiDoc: Epigraphic Documents in TEI XML

    XML text markup for ancient documents

    The EpiDoc Collaborative is developing specifications and tools for standards-based, digital publication and interchange of scholarly and educational editions of documentary and literary texts like inscriptions and papyri. The link below will take you to the EpiDoc home page on this site.
    Leader badge
    Downloads: 5 This Week
    Last Update:
    See Project
  • Full-stack observability with actually useful AI | Grafana Cloud Icon
    Full-stack observability with actually useful AI | Grafana Cloud

    Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.

    Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.
    Create free account
  • 10
    Riffusion

    Riffusion

    Real-time music generation using stable diffusion techniques AI

    ...It implements a diffusion pipeline that supports prompt interpolation, allowing smooth transitions between different musical styles or prompts over time. Riffusion (hobby) serves as the core implementation for audio and image processing, providing essential building blocks for generating music from text prompts. It includes both developer-oriented tools and user-facing components such as a command-line interface and an interactive Streamlit application for experimentation. Additionally, it can run as a Flask server to expose model inference through an API, enabling integration with other applications or services.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 11
    Automatic YouTube subtitle generation

    Automatic YouTube subtitle generation

    Using OpenAI's Whisper to automatically generate YouTube subtitles

    ...It allows users to download videos or audio from YouTube and automatically generate subtitles or transcripts. The tool processes media locally, extracting audio and applying speech recognition to produce accurate text outputs. It supports multiple languages and can handle different Whisper model sizes, balancing performance and accuracy. yt-whisperc is designed for automation, enabling batch processing of multiple videos for transcription workflows. It also provides options for exporting subtitles in common formats such as SRT. Overall, it simplifies the process of converting video content into searchable and accessible text.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    FrankMocap

    FrankMocap

    A Strong and Easy-to-use Single View 3D Hand+Body Pose Estimator

    ...The pipeline couples a robust 2D keypoint detector with 3D mesh regression networks and priors that keep results anatomically plausible. It can run frame-by-frame or with temporal smoothing, and includes demo apps for live webcam capture as well as batch processing. Outputs include textured meshes, joint locations, and model parameters that can be exported to common DCC tools and game engines. The codebase offers pretrained models, clear inference scripts, and utilities to visualize results, making single-camera motion capture approachable on commodity hardware. Researchers and creators use it for motion studies, AR/VR prototyping, character animation, and human-in-the-loop editing.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 13
    Pattern

    Pattern

    Web mining module for Python, with tools for scraping

    ...In addition to data mining features, the library offers natural language processing functionality including part-of-speech tagging, sentiment analysis, and n-gram extraction. The framework also includes machine learning algorithms that support classification, clustering, and vector space modeling for text analysis tasks. Another component of the library provides tools for analyzing and visualizing networks, making it useful for studying relationships between entities in large datasets.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 14
    Buzz is a fast graphical editor for XML files with special support for OPML. Using the OPML convergence tools it will edit about any outline and many forms of indented text, including Python. In fact, Buzz was written with Buzz! It is written in P
    Downloads: 6 This Week
    Last Update:
    See Project
  • 15
    A set of tools (command line and GUI) to provide a complete digital photo workflow for Unixes. EXIF headers are used as the central information repository, so users may change their software at any time without loosing any data.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16

    MITRE Annotation Toolkit

    A toolkit for managing and manipulating text annotations

    The MITRE Annotation Toolkit (MAT) is a suite of tools which can be used for automated and human tagging of annotations. Annotation is a process, used mostly by researchers in natural language processing, of enhancing documents with information about the various phrase types the documents contain. MAT supports both UI interaction and command-line interaction, and provides various levels of control over the overall annotation process. It can be customized for specific tasks (e.g.,...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    Self-Attentive Parser

    Self-Attentive Parser

    High-accuracy NLP parser with models for 11 languages

    LightAutoML is an automated machine learning (AutoML) framework developed by Sberbank AI Lab, designed to facilitate the development of machine learning models with minimal human intervention.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    fastNLP

    fastNLP

    fastNLP: A Modularized and Extensible NLP Framework

    fastNLP is a lightweight framework for natural language processing (NLP), the goal is to quickly implement NLP tasks and build complex models. A unified Tabular data container simplifies the data preprocessing process. Built-in Loader and Pipe for multiple datasets, eliminating the need for preprocessing code. Various convenient NLP tools, such as Embedding loading (including ELMo and BERT), intermediate data cache, etc..
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    CC-Net

    CC-Net

    Tools to download and cleanup Common Crawl data

    cc_net provides tools to download, segment, clean, and filter Common Crawl to build large-scale text corpora, including monolingual datasets and the multilingual CC-100 collection introduced in the associated paper. It includes pipelines to fetch snapshots, extract text, de-duplicate, identify language, and apply quality filtering based on heuristics and language models. The outputs are intended for pretraining language models and for creating standardized corpora that can be reproduced or...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    GluonNLP

    GluonNLP

    NLP made easy

    GluonNLP is a toolkit that helps you solve NLP problems. It provides easy-to-use tools that helps you load the text data, process the text data, and train models. To facilitate both the engineers and researchers, we provide command-line-toolkits for downloading and processing the NLP datasets. Gluon NLP makes it easy to evaluate and train word embeddings. Here are examples to evaluate the pre-trained embeddings included in the Gluon NLP toolkit as well as example scripts for training embeddings on custom datasets. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    NLP-Models-Tensorflow

    NLP-Models-Tensorflow

    Gathers machine learning and Tensorflow deep learning models for NLP

    NLP-Models-Tensorflow is a collection of natural language processing model implementations built using the TensorFlow deep learning framework. The repository provides numerous examples of neural network architectures used in modern NLP research and applications, including text classification, language modeling, machine translation, and sentiment analysis. Each model implementation is designed to illustrate how common NLP architectures operate, such as recurrent neural networks, convolutional...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    Monkey-DL

    Monkey-DL

    Bulk download your favourite anime episodes from your favourite anime

    Monkey-DL is a command-line media downloader designed to retrieve video and audio content from online platforms with flexibility and automation. It integrates with tools like FFmpeg to handle post-processing tasks such as merging streams, converting formats, and optimizing output quality. The tool supports downloading single media files or entire playlists, enabling efficient batch operations. It includes options for selecting resolution, format, and output structure, giving users fine control over downloads. monkey-dl is built for simplicity, providing straightforward commands while still supporting advanced configurations. ...
    Downloads: 6 This Week
    Last Update:
    See Project
  • 23
    MystiQ

    MystiQ

    Qt5/C++ FFmpeg Media Converter

    MystiQ is a cross-platform multimedia converter built with Qt and FFmpeg, designed to provide a modern graphical interface for video and audio processing tasks. It allows users to perform operations such as transcoding, trimming, and format conversion without needing to use command-line tools. The application supports a wide range of codecs and formats, enabling compatibility across devices and platforms. It includes batch processing capabilities, allowing multiple files to be converted simultaneously. ...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 24
    NLP Best Practices

    NLP Best Practices

    Natural Language Processing Best Practices & Examples

    In recent years, natural language processing (NLP) has seen quick growth in quality and usability, and this has helped to drive business adoption of artificial intelligence (AI) solutions. In the last few years, researchers have been applying newer deep learning methods to NLP. Data scientists started moving from traditional methods to state-of-the-art (SOTA) deep neural network (DNN) algorithms which use language models pretrained on large text corpora.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    PyTorch Natural Language Processing

    PyTorch Natural Language Processing

    Basic Utilities for PyTorch Natural Language Processing (NLP)

    PyTorch-NLP is a library for Natural Language Processing (NLP) in Python. It’s built with the very latest research in mind, and was designed from day one to support rapid prototyping. PyTorch-NLP comes with pre-trained embeddings, samplers, dataset loaders, metrics, neural network modules and text encoders. It’s open-source software, released under the BSD3 license. With your batch in hand, you can use PyTorch to develop and train your model using gradient descent. ...
    Downloads: 0 This Week
    Last Update:
    See Project
MongoDB Logo MongoDB