Search Results for "text batch processing tools" - Page 10

Showing 401 open source projects for "text batch processing tools"

View related business solutions
  • Try Google Cloud Risk-Free With $300 in Credit Icon
    Try Google Cloud Risk-Free With $300 in Credit

    No hidden charges. No surprise bills. Cancel anytime.

    Use your credit across every product. Compute, storage, AI, analytics. When it runs out, 20+ products stay free. You only pay when you choose to.
    Start Free
  • Forever Free Full-Stack Observability | Grafana Cloud Icon
    Forever Free Full-Stack Observability | Grafana Cloud

    Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.

    Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.
    Create free account
  • 1
    gpt2-client

    gpt2-client

    Easy-to-use TensorFlow Wrapper for GPT-2 117M, 345M, 774M, etc.

    GPT-2 is a Natural Language Processing model developed by OpenAI for text generation. It is the successor to the GPT (Generative Pre-trained Transformer) model trained on 40GB of text from the internet. It features a Transformer model that was brought to light by the Attention Is All You Need paper in 2017. The model has 4 versions - 124M, 345M, 774M, and 1558M - that differ in terms of the amount of training data fed to it and the number of parameters they contain. Finally, gpt2-client is a...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    FalaBrasil

    FalaBrasil

    Resources for speech processing in Brazilian Portuguese

    The FalaBrasil Group provides free tools and resources for speech and natural language processing in Brazilian Portuguese, most of them under the BSD license. Tools include mainly scripts to do all sort of things with audio and text, whereas resources include ready-to-used acoustic and languages models, phonetic dictionaries, etc.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 3
    YouTubeCrawler

    YouTubeCrawler

    Go-based automation utility that downloads YouTube videos

    This tool is a Go-based automation utility that downloads YouTube videos and permanently embeds or “hard-codes” their subtitles (typically English) into MP4 output files. The workflow involves specifying one or more URLs (via a simple “url” text file in each folder) and the program uses youtube-dl to fetch video and subtitle, then ffmpeg to overlay the subtitles onto the video track. The architecture follows a command-pattern setup: tasks implement a common interface and are scheduled and...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    bsed

    bsed

    Simple SQL-like syntax on top of Perl text processing

    bsed is a stream editor that offers a simple SQL-like syntax for text processing tasks. Designed to replace basic uses of tools like sed, grep, AWK, and Perl, it allows users to perform complex text manipulations with intuitive commands.​
    Downloads: 0 This Week
    Last Update:
    See Project
  • AI-powered service management for IT and enterprise teams Icon
    AI-powered service management for IT and enterprise teams

    Enterprise-grade ITSM, for every business

    Give your IT, operations, and business teams the ability to deliver exceptional services—without the complexity. Maximize operational efficiency with refreshingly simple, AI-powered Freshservice.
    Try it Free
  • 5
    TIES

    TIES

    A smart search engine for medical documents

    TIES (Text Information Extraction System) is a clinical text search engine that uses Natural Language Processing techniques to extract medical concepts from free text clinical reports. It provides secure de-identified access to this information and has in built collaboration tools and honest broker functionality. It is licensed for academic use under the BSD license.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    Win32Forth is ANS compatible, Forth language application development system with many tools; Interactive console, integrated extensible debugger, a GUI file editor, hypertext rendering, hyperlinked source files. VIEW <word-name> to explore the many files
    Leader badge
    Downloads: 61 This Week
    Last Update:
    See Project
  • 7
    XSH is a powerfull command-line XML editing tool/programming language in the manner of Unix shell interpreters and line-oriented text editors like ed which can be used either interactively or for batch-mode XML processing.
    Leader badge
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    SIVIC
    SIVIC is an open-source, standards-based software framework and application suite for processing and visualization of DICOM MR Spectroscopy data. Through the use of DICOM, SIVIC aims to facilitate the application of MRS in medical imaging studies.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 9
    SpeedEULA

    SpeedEULA

    Magyar szövegszerkesztő

    Sziasztok! Ez egy magyar szövegszerkesztő program lenne! PRO licenc kód: 74HVR-7ENS9-NDH73-HDM48 Hivatalos discord szerver: https://discord.gg/VUw6DkZ
    Downloads: 0 This Week
    Last Update:
    See Project
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • 10
    PDF To Text Watcher

    PDF To Text Watcher

    Profile-based watcher for automated processing of PDF tiles to text.

    Watches folders to automate transforming PDF files into text with optional metadata extraction. Requires the XPDF tools, which you must source separately. Lets you set up multiple profiles, modify profiles 'hot' without saving and move or delete the source PDFs after processing.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    iText®, a JAVA PDF library

    iText®, a JAVA PDF library

    PDF Library for Developers

    iText is an open-source PDF library available for Java and .NET (C#). iText allows you to effortlessly generate and manipulate standards-compliant PDF documents with a powerful and feature-rich SDK. With iText, you can create archivable and accessible PDFs, split and merge documents, fill and flatten forms, digitally sign documents, and more. iText add-ons enable additional functionality, such as PDF creation from HTML templates, secure redaction, OCR, and much more. The latest...
    Leader badge
    Downloads: 200 This Week
    Last Update:
    See Project
  • 12
    Command-line/Ant-task/embeddable text file preprocessor. Macros, flow control, expressions. Recursive directory processing. Extensible in Java to display data from any data sources (as database). Can generate complete homepages (tree of HTML-s, images, etc.)
    Leader badge
    Downloads: 108 This Week
    Last Update:
    See Project
  • 13
    Betty

    Betty

    Holberton-style C code checker written in Perl

    Betty is a Perl-based coding style checker that enforces the Holberton School coding style (inspired by the Linux kernel style) for C code and documentation. It identifies inconsistencies, style violations, and formatting issues in C source files. You should be aware that by default, some text editors are using spaces instead of tabs. For instance, when you press tab key on emacs, by default, leading spaces will be put, and that will cause Betty to raise a lot of warnings. Please find some...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    unfluff

    unfluff

    Automatically extract body content (and other cool stuff) from HTML

    unfluff is a Node.js library designed to automatically extract the main content from an HTML document — stripping away navigation bars, ads, footers and other boilerplate to leave you with the “body content”, metadata (title, author, date) and other useful fields. It’s a tool very much aimed at content-analysis, web scraping, building datasets, or repurposing article text for downstream processing (like machine-learning or summarization). The API is simple: you feed in raw HTML and it returns a structured object with the extracted text and other fields. It supports caching internal representations to speed up repeated extractions. While its language support is best for English, it is still widely used in web-content-processing pipelines. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    Create beautiful song books for your church or fellowship using this LaTeX package and related tools.
    Leader badge
    Downloads: 2 This Week
    Last Update:
    See Project
  • 16
    Absentia DX

    Absentia DX

    An Intelligent Algorithm that cleans voice recordings

    The Absentia DX algorithm analyzes production dialog recordings and then removes obvious hums, wireless rings, and ticks, while maintaining the integrity of the human voice. ABDX was developed for a network television show with difficult production sound that resulted in substantial repetitive manual labor. Simply drag and drop volumes, folders, or sound files directly onto the application or the settings window and files will begin processing. An Absentia DX progress window will appear,...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    Clu-Linux-Live

    Clu-Linux-Live

    Various Processing and Data Rescue Tools over Wired or Wireless Networ

    This Linux Live CD provides Various Processing Command Line Utilities (Clu) and Data Rescue Tools which can be used on a Wired or Wireless Network. On Startup it prompts the user to change password, mount all filesystems available locally, start wireless network ( if wifi interface present ), start network services (samba/ssh/sftp) and present user with a console for executing various utilities i.e Text, Image, Audio, Video, Downloading etc. on their FileSystems that are mounted. ...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 18
    XMLStarlet is a set of command line utilities (tools) to transform, query, validate, and edit XML documents and files using simple set of shell commands in similar way it is done for text files with UNIX grep, sed, awk, diff, patch, join, etc utilities.
    Leader badge
    Downloads: 1,116 This Week
    Last Update:
    See Project
  • 19
    Vim provides a rich set of tools which makes generating latex easy, pain-free and quite pleasurable. This web-site aims at bringing together the rich set of tools the vim community has produced over the years into a central repository
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    subs2srs

    subs2srs

    Convert movies and TV shows to flashcards

    subs2srs allows you to create import files for Anki or other Spaced Repetition Systems (SRS) based on your favorite foreign language movies and TV shows to aid in the language learning process. See http://subs2srs.sourceforge.net/ for more information.
    Leader badge
    Downloads: 30 This Week
    Last Update:
    See Project
  • 21

    Musaheb

    An Arabic collocation extraction tool

    “Musaheb”, an Arabic collocation extraction tool that has been designed and implemented to overcome the limitations of existing collocation extraction tools. “Musaheb” is able to extract n-gram collocations up to 5-gram, in addition to extracting the collocates of the nodes (the word-types we are looking for its collocates) within a window size of zero to 15 words. Moreover, it provides eight collocation statistics to calculate the strength of the collocation, and permits the input of...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    Mail Alert Simple Mailer

    Mail Alert Simple Mailer

    Mail Alert Simple Mailer

    Mail Alert Simple Mailer is a simple command-line utility designed for IT administrators for sending e-mail from Microsoft Windows OS. It can be executed from command line, Windows batch file or PowerShell scripts. My main reason for writing this software was to handle events generated by Dell OpenManage Server Administrator (OMSA), APC PowerChute and Windows Events to send hardware status and alerts like temperature alerts, UPS battery status, powerline status and RAID controller alerts...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 23
    aeneas

    aeneas

    Automagically synchronize audio and text (aka forced alignment)

    aeneas is a Python/C library and a set of tools to automagically synchronize audio and text (aka forced alignment). aeneas automatically generates a synchronization map between a list of text fragments and an audio file containing the narration of the text. In computer science this task is known as (automatically computing a) forced alignment.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    Welsh Natural Language Toolkit
    The project supports the Welsh Language Technology domain with a set of NLP tools that drive innovation and advance the development of sophisticated textual analysis solutions. The WNLT project delivers four core NLP modules; a) Word Segmentation for separating text into words b) Sentence Boundary Disambiguation for finding sentence boundaries c) Part of Speech Tagger for determining the part of speech of each word d) Morphological Analyser for identifying the root form (lemma) of words....
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    The-New-M3U8-Downloader

    The-New-M3U8-Downloader

    Rebulid of M3U8-Downloader

    ...The project introduced a refreshed interface and added real-time feedback features such as taskbar progress indicators and download speed calculation. It improved regex parsing accuracy and optimized internal logic to increase reliability when processing streaming playlists. The tool also displays file sizes and integrates batch download functionality for handling multiple tasks. Although the repository is now archived and no longer actively updated, it represents an important evolutionary step that later influenced more advanced tools by the same author. Overall, it serves as a legacy but functional solution for downloading HLS-based media streams.
    Downloads: 20 This Week
    Last Update:
    See Project
MongoDB Logo MongoDB