Search Results for "text batch processing tools" - Page 8

Showing 304 open source projects for "text batch processing tools"

View related business solutions
  • Our Free Plans just got better! | Auth0 Icon
    Our Free Plans just got better! | Auth0

    With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

    You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.
    Try free now
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • 1
    Pure Bash Bible

    Pure Bash Bible

    A collection of pure bash alternatives to external processes

    pure-bash-bible is a collection of pure Bash scripting techniques that demonstrate how to accomplish common and complex tasks using only built-in Bash features. Its goal is to reduce reliance on external tools like sed, awk, or grep, which can slow down scripts and add unnecessary dependencies. The project is organized as a reference book of function-based code snippets, each showcasing practical solutions for string manipulation, text processing, file operations, and more. By relying exclusively on Bash built-ins, these methods can make scripts faster, more portable, and easier to maintain. ...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 2
    Moritz

    Moritz

    transfer xml into specific text-formats (html, dot, source-code, ...)

    Moritz is an "addon" to the well known tool doxygen. It generates nassi shneiderman diagramms of functions and methods in a c/c++ source as html-files, which could be included in a software-dokumentaion or simple whached by using a html-browser.
    Downloads: 6 This Week
    Last Update:
    See Project
  • 3
    CSVfix

    CSVfix

    Command-line tool specifically designed to deal with CSV data

    ...Unfortunately, the CSV files you are given, or are required to produce, never seem to be in quite the right format for your particular business application. And because of the structure of CSV records, using standard text processing tools like sed, awk and perl is not as simple as it might be. Usage: http://csvfix.byethost5.com/csvfix15/csvfix.html?csvfix.html?Usage.html?i=1&i=2 CSVfix aims to provide a solution to these problems. It is a command-line stream editor specifically designed to deal with CSV data. With it you can, among other things:
    Downloads: 74 This Week
    Last Update:
    See Project
  • 4
    MuLanPa

    MuLanPa

    transfer text in diverse formats into specific xml parser-trees

    MuLanPa is a source-analyser with a configurable parser and may be may be used for several programming-languages. Its xml-output should be used for tools like project-browsers or code-viewers like moritz (www.sourceforge.net/projects/moritz/) .
    Downloads: 0 This Week
    Last Update:
    See Project
  • Compliant and Reliable File Transfers Backed by Top Security Certifications Icon
    Compliant and Reliable File Transfers Backed by Top Security Certifications

    Cerberus FTP Server delivers SOC 2 Type II certified security and FIPS 140-2 validated encryption.

    Stop relying on non-certified, legacy file transfer tools that creak under the weight of modern security demands. Get full audit trails, advanced access controls and more supported by an award-winning team of experts. Start your free 25-day trial today.
    Start Free Trial
  • 5
    NLP-Models-Tensorflow

    NLP-Models-Tensorflow

    Gathers machine learning and Tensorflow deep learning models for NLP

    NLP-Models-Tensorflow is a collection of natural language processing model implementations built using the TensorFlow deep learning framework. The repository provides numerous examples of neural network architectures used in modern NLP research and applications, including text classification, language modeling, machine translation, and sentiment analysis. Each model implementation is designed to illustrate how common NLP architectures operate, such as recurrent neural networks, convolutional...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    BRIC

    BRIC

    BRIC is a powerful tool for batch image processing.

    Bric is a cross-platform batch image processor. You can convert, resize, rotate and add watermark to your images. Multiple file types are supported for input and output. The project started back in 2011 and was maintained for a couple of years. In 2020 BRIC is again in active development, so some of the features written below might be outdated. Please be patient, until everything is reviewed and rewritten.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    MystiQ

    MystiQ

    Qt5/C++ FFmpeg Media Converter

    MystiQ is a cross-platform multimedia converter built with Qt and FFmpeg, designed to provide a modern graphical interface for video and audio processing tasks. It allows users to perform operations such as transcoding, trimming, and format conversion without needing to use command-line tools. The application supports a wide range of codecs and formats, enabling compatibility across devices and platforms. It includes batch processing capabilities, allowing multiple files to be converted simultaneously. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 8
    Monkey-DL

    Monkey-DL

    Bulk download your favourite anime episodes from your favourite anime

    Monkey-DL is a command-line media downloader designed to retrieve video and audio content from online platforms with flexibility and automation. It integrates with tools like FFmpeg to handle post-processing tasks such as merging streams, converting formats, and optimizing output quality. The tool supports downloading single media files or entire playlists, enabling efficient batch operations. It includes options for selecting resolution, format, and output structure, giving users fine control over downloads. monkey-dl is built for simplicity, providing straightforward commands while still supporting advanced configurations. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    Data Science at the Command Line

    Data Science at the Command Line

    Data science at the command line

    Command Line by Jeroen Janssens, published by O’Reilly Media in October 2021. Obtain, scrub, explore, and model data with Unix Power Tools. This repository contains the full text, data, and scripts used in the second edition of the book Data Science at the Command Line by Jeroen Janssens. This thoroughly revised guide demonstrates how the flexibility of the command line can help you become a more efficient and productive data scientist. You’ll learn how to combine small yet powerful command-line tools to quickly obtain, scrub, explore, and model your data. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • Try Google Cloud Risk-Free With $300 in Credit Icon
    Try Google Cloud Risk-Free With $300 in Credit

    No hidden charges. No surprise bills. Cancel anytime.

    Use your credit across every product. Compute, storage, AI, analytics. When it runs out, 20+ products stay free. You only pay when you choose to.
    Start Free
  • 10
    NLP Best Practices

    NLP Best Practices

    Natural Language Processing Best Practices & Examples

    In recent years, natural language processing (NLP) has seen quick growth in quality and usability, and this has helped to drive business adoption of artificial intelligence (AI) solutions. In the last few years, researchers have been applying newer deep learning methods to NLP. Data scientists started moving from traditional methods to state-of-the-art (SOTA) deep neural network (DNN) algorithms which use language models pretrained on large text corpora.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    AhoCorasickDoubleArrayTrie

    AhoCorasickDoubleArrayTrie

    An extremely fast implementation of Aho Corasick algorithm

    AhoCorasickDoubleArrayTrie is a Java implementation of the Aho–Corasick multi-pattern matching algorithm that is optimized using a Double-Array Trie data structure. It is designed for fast keyword scanning across large texts, where you want to search for many patterns simultaneously and efficiently. The core idea is to build an automaton from a dictionary of patterns, then stream through input text to emit matches with minimal overhead. By using a double-array trie representation, the...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    PyTorch Natural Language Processing

    PyTorch Natural Language Processing

    Basic Utilities for PyTorch Natural Language Processing (NLP)

    PyTorch-NLP is a library for Natural Language Processing (NLP) in Python. It’s built with the very latest research in mind, and was designed from day one to support rapid prototyping. PyTorch-NLP comes with pre-trained embeddings, samplers, dataset loaders, metrics, neural network modules and text encoders. It’s open-source software, released under the BSD3 license. With your batch in hand, you can use PyTorch to develop and train your model using gradient descent. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    gpt2-client

    gpt2-client

    Easy-to-use TensorFlow Wrapper for GPT-2 117M, 345M, 774M, etc.

    GPT-2 is a Natural Language Processing model developed by OpenAI for text generation. It is the successor to the GPT (Generative Pre-trained Transformer) model trained on 40GB of text from the internet. It features a Transformer model that was brought to light by the Attention Is All You Need paper in 2017. The model has 4 versions - 124M, 345M, 774M, and 1558M - that differ in terms of the amount of training data fed to it and the number of parameters they contain. Finally, gpt2-client is a...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    FalaBrasil

    FalaBrasil

    Resources for speech processing in Brazilian Portuguese

    The FalaBrasil Group provides free tools and resources for speech and natural language processing in Brazilian Portuguese, most of them under the BSD license. Tools include mainly scripts to do all sort of things with audio and text, whereas resources include ready-to-used acoustic and languages models, phonetic dictionaries, etc.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 15
    Command-Line Text Processing

    Command-Line Text Processing

    From finding text to search and replace

    Command-line-text-processing is a curated educational repository providing many examples and tutorials on how to use various command-line tools for processing text: searching, replacing, sorting, transforming, filtering, etc. It covers tools like grep, sed, awk, perl, Ruby one-liners, file attribute commands, sorting, tail/head/less/cat, and many more.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    bsed

    bsed

    Simple SQL-like syntax on top of Perl text processing

    bsed is a stream editor that offers a simple SQL-like syntax for text processing tasks. Designed to replace basic uses of tools like sed, grep, AWK, and Perl, it allows users to perform complex text manipulations with intuitive commands.​
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    YouTubeCrawler

    YouTubeCrawler

    Go-based automation utility that downloads YouTube videos

    This tool is a Go-based automation utility that downloads YouTube videos and permanently embeds or “hard-codes” their subtitles (typically English) into MP4 output files. The workflow involves specifying one or more URLs (via a simple “url” text file in each folder) and the program uses youtube-dl to fetch video and subtitle, then ffmpeg to overlay the subtitles onto the video track. The architecture follows a command-pattern setup: tasks implement a common interface and are scheduled and...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    SIVIC
    SIVIC is an open-source, standards-based software framework and application suite for processing and visualization of DICOM MR Spectroscopy data. Through the use of DICOM, SIVIC aims to facilitate the application of MRS in medical imaging studies.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 19
    iText®, a JAVA PDF library

    iText®, a JAVA PDF library

    PDF Library for Developers

    iText is an open-source PDF library available for Java and .NET (C#). iText allows you to effortlessly generate and manipulate standards-compliant PDF documents with a powerful and feature-rich SDK. With iText, you can create archivable and accessible PDFs, split and merge documents, fill and flatten forms, digitally sign documents, and more. iText add-ons enable additional functionality, such as PDF creation from HTML templates, secure redaction, OCR, and much more. The latest...
    Leader badge
    Downloads: 200 This Week
    Last Update:
    See Project
  • 20
    Command-line/Ant-task/embeddable text file preprocessor. Macros, flow control, expressions. Recursive directory processing. Extensible in Java to display data from any data sources (as database). Can generate complete homepages (tree of HTML-s, images, etc.)
    Leader badge
    Downloads: 108 This Week
    Last Update:
    See Project
  • 21
    Betty

    Betty

    Holberton-style C code checker written in Perl

    Betty is a Perl-based coding style checker that enforces the Holberton School coding style (inspired by the Linux kernel style) for C code and documentation. It identifies inconsistencies, style violations, and formatting issues in C source files. You should be aware that by default, some text editors are using spaces instead of tabs. For instance, when you press tab key on emacs, by default, leading spaces will be put, and that will cause Betty to raise a lot of warnings. Please find some...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    unfluff

    unfluff

    Automatically extract body content (and other cool stuff) from HTML

    unfluff is a Node.js library designed to automatically extract the main content from an HTML document — stripping away navigation bars, ads, footers and other boilerplate to leave you with the “body content”, metadata (title, author, date) and other useful fields. It’s a tool very much aimed at content-analysis, web scraping, building datasets, or repurposing article text for downstream processing (like machine-learning or summarization). The API is simple: you feed in raw HTML and it returns a structured object with the extracted text and other fields. It supports caching internal representations to speed up repeated extractions. While its language support is best for English, it is still widely used in web-content-processing pipelines. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    Create beautiful song books for your church or fellowship using this LaTeX package and related tools.
    Leader badge
    Downloads: 2 This Week
    Last Update:
    See Project
  • 24
    Absentia DX

    Absentia DX

    An Intelligent Algorithm that cleans voice recordings

    The Absentia DX algorithm analyzes production dialog recordings and then removes obvious hums, wireless rings, and ticks, while maintaining the integrity of the human voice. ABDX was developed for a network television show with difficult production sound that resulted in substantial repetitive manual labor. Simply drag and drop volumes, folders, or sound files directly onto the application or the settings window and files will begin processing. An Absentia DX progress window will appear,...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    Clu-Linux-Live

    Clu-Linux-Live

    Various Processing and Data Rescue Tools over Wired or Wireless Networ

    This Linux Live CD provides Various Processing Command Line Utilities (Clu) and Data Rescue Tools which can be used on a Wired or Wireless Network. On Startup it prompts the user to change password, mount all filesystems available locally, start wireless network ( if wifi interface present ), start network services (samba/ssh/sftp) and present user with a console for executing various utilities i.e Text, Image, Audio, Video, Downloading etc. on their FileSystems that are mounted. ...
    Downloads: 3 This Week
    Last Update:
    See Project
MongoDB Logo MongoDB