Showing 214 open source projects for "batch text processing"

View related business solutions
  • Fully Managed MySQL, PostgreSQL, and SQL Server Icon
    Fully Managed MySQL, PostgreSQL, and SQL Server

    Automatic backups, patching, replication, and failover. Focus on your app, not your database.

    Cloud SQL handles your database ops end to end. Migrate from on-prem or other clouds with free migration tools.
    Try Free
  • Custom VMs From 1 to 96 vCPUs With 99.95% Uptime Icon
    Custom VMs From 1 to 96 vCPUs With 99.95% Uptime

    General-purpose, compute-optimized, or GPU/TPU-accelerated. Built to your exact specs.

    Live migration and automatic failover keep workloads online through maintenance. One free e2-micro VM every month.
    Try Free
  • 1
    Virastyar

    Virastyar

    Virastyar is an spell checker for low-resource languages

    Virastyar is a free and open-source (FOSS) spell checker. It stands upon the shoulders of many free/libre/open-source (FLOSS) libraries developed for processing low-resource languages, especially Persian and RTL languages Publications: Kashefi, O., Nasri, M., & Kanani, K. (2010). Towards Automatic Persian Spell Checking. SCICT. Kashefi, O., Sharifi, M., & Minaie, B. (2013). A novel string distance metric for ranking Persian respelling suggestions. Natural Language Engineering,...
    Leader badge
    Downloads: 272 This Week
    Last Update:
    See Project
  • 2
    MARF is a general cross-platform framework with a collection of algorithms for audio (voice, speech, and sound) and natural language text analysis and recognition along with sample applications (identification, NLP, etc.) of its use, implemented in Java.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    The Java Data Mining Package (JDMP) is a library that provides methods for analyzing data with the help of machine learning algorithms (e.g. clustering, classification, graphical models, neural networks, Bayesian networks, text processing, optimization).
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    JInsect
    The JINSECT toolkit is a Java-based toolkit and library that supports and demonstrates the use of n-gram graphs within Natural Language Processing applications, ranging from summarization and summary evaluation to text classification and indexing.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Build on Google Cloud with $300 in Free Credit Icon
    Build on Google Cloud with $300 in Free Credit

    New to Google Cloud? Get $300 in free credit to explore Compute Engine, BigQuery, Cloud Run, Vertex AI, and 150+ other products.

    Start your next project with $300 in free Google Cloud credit. Spin up VMs, run containers, query exabytes in BigQuery, or build AI apps with Vertex AI and Gemini. Once your credits are used, keep building with 20+ products with free monthly usage, including Compute Engine, Cloud Storage, GKE, and Cloud Run functions. Sign up to start building right away.
    Start Free Trial
  • 5
    Stemmer Gujarati

    Stemmer Gujarati

    Offline stemmer for Gujarati , which is one of 22 Indian languages.

    ...There has been lot of significant work in the development and evaluation of stemmer for non-Indian languages, but very less or no significant work has been done on Indian front especially for Gujarati language.The code of this stemmer is based on algorithm designed under guidance of Prof. Nikita Desai, India. It takes input file of type .txt containing Gujarati text encoded as UTF-8 and then removes stop words which are unessential. After processing rest of the words, it outputs corresponding file containing all stem words plus other details.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6

    iMir

    Integrated pipeline for HT miRNA-Seq data analysis

    Processing of smallRNA-Seq data to gather biologically relevant information requires application of multiple statistical and bioinformatics tools from different sources, each focusing on a specific step of the analysis pipeline. The analytical workflow can be challenging for the continuous interventions by the operator, a critical factor when large numbers of datasets need to be analyzed at once.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7

    Text Analyzer

    Text analyzing software

    An application developed in C using the list and the AVL tree data structures, which analyzes a text (.txt file) giving the following information as an output: 1. the total occurrences of every word in the text 2. the exact line of every occurrence of every word 3. the exact position in the line of every occurrence of every word 4. the exact paragraph of every occurrence of every word 5. the exact sentence of every occurrence of every word The output is also written in a...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    ArabicDiacritizer

    ArabicDiacritizer

    An automatic restoration of Arabic diacritic marks

    This is a software of Arabic diacritical marks restoration. It is based mainly on deep architectures using deep neural network. The algorithm generates diacritized text with determined end case. The algorithm is described in detail in: Ilyes Rebai, and Yassine BenAyed 'Text-to-speech synthesis system with Arabic diacritic recognition system', Computer Speech & Language, 2015. We appreciate it very much if you can cite our related work. ************** Installation...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    This project is devoted to the development of natural language processing tools and resources for the Lingala language, which is spoken by tens of millions of people in central Africa.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Deploy Apps in Seconds with Cloud Run Icon
    Deploy Apps in Seconds with Cloud Run

    Host and run your applications without the need to manage infrastructure. Scales up from and down to zero automatically.

    Cloud Run is the fastest way to deploy containerized apps. Push your code in Go, Python, Node.js, Java, or any language and Cloud Run builds and deploys it automatically. Get fast autoscaling, pay only when your code runs, and skip the infrastructure headaches. Two million requests free per month. And new customers get $300 in free credit.
    Try Cloud Run Free
  • 10

    grepp

    An ultimate text-analysing tool

    A command line tool for text file analyis, filtering, splitting and reporting. Runs under Java (1.5+), supports plugins written in Groovy. Has nix and win batch files in distributions.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11

    Khawas

    An Arabic Corpora Processing Tool

    The new version is available at https://sourceforge.net/projects/ghawwasv4/
    Downloads: 3 This Week
    Last Update:
    See Project
  • 12
    MatrixUser

    MatrixUser

    A Multi-functional GUI-based Program for Image Processing and Analysis

    The MatrixUser project is moving to GitHub, the latest version can be obtained from https://leoliuf.github.io/MatrixUser/ Most of the medical images (e.g. CT, MRI, PET, etc.) comprises multiple frames which represent slices, phases, timing etc. from the same imaging object. Those images can be saved as multidimensional matrices in Matlab thanks to Matlab's powerful support of multidimensional data representation. However, within Matlab, most of image manipulation functions are limited or...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13

    T.H.O.R.I.U.M.

    T.H.O.R.I.U.M. - Thermooptic radiation iterative universal module.

    The purpose of this project is to develop open source, precise, fast and easy-to-use software for radiation heat transfer analysis.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 14
    SetFon focus is an interface web based for Praat resources (www.praat.org) wich focus speech sound annalysis; it is a gerent program for acoustic analysis PHP/Mysql based. Developed with the framework SIMP.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    The BioNLP UIMA Component Repository provides UIMA wrappers for novel and well-known 3rd-party NLP tools used in biomedical text prosessing, such as tokenizers, parsers, named entity taggers, and tools for evaluation.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    FALCON - Text Search Java Project

    FALCON - Text Search Java Project

    JSON based text search Java Project

    ----------------- - What is it? - ----------------- The "Falcon Search" is a JAVA API and tool to search inside the documents. It was originally started to search the content in pdf files under the project "HAWK Search". Searching with this tool is query-based not word-based as in most of the document search tools OR document readers. It also takes care of jumbling of words within query and spelling mistakes. Commonly used techniques in this project are Natural Language...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17

    Bermuda Text-to-Speech

    This project includes basic NLP and DSP techniques for Text-to-Speech

    See TTS demo at: http://rslp.racai.ro/index.php?page=tts This is an entirely written in JAVA project which includes a set of tools and methods designed to enable Multilingual Text-to-Speech (TTS) synthesis. We currently support English and Romanian but we will soon train more models and make them available for download. If you want to read more about our other NLP and TTS tools check out http://nlptools.racai.ro.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    HAWK - PDF Text Search Java Project

    HAWK - PDF Text Search Java Project

    No more support for this project - TAKE A LOOK AT FALCONSEARCH

    No more support for this project - TAKE A LOOK AT FALCONSEARCH "https://sourceforge.net/projects/falcontextsearch/"
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    TF-IDF.jar is a Java Archive file to measure TF-IDF of each document in a document collection (corpus). The jar can be used to (a) get all the terms in the corpus (b) get the document frequency (DF) and inverse document frequency (IDF) of all the terms in the corpus (c) get the TF-IDF of each document in the corpus (d) get each term with their frequency (no. of presence), term frequency (TF) and TF-IDF in every document
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    Knowtator is a general-purpose text annotation tool that is integrated with the Protégé knowledge representation system. Knowtator facilitates the manual creation of training and evaluation corpora for a variety of biomedical language processing tasks.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    gannu

    gannu

    Java API and tools for performing NLP and other AI tasks

    Java API and tools for performing a wide range of AI tasks such as: word sense disambiguation (released), optimization (5 Evolutionary Algorithms Implemented ETA February 2014), opinion mining (ETA November 2014) and text wikification (ETA July 2014). Gannu includes some graphical interfaces for scientific purposes. When using Gannu please cite: *Jiménez, F. V., Gelbukh, A. F. & Sidorov, G. (2013). Simple Window Selection Strategies for the Simplified Lesk Algorithm for Word Sense...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    WM Hyperintensities Segmentation Toolbox

    WM Hyperintensities Segmentation Toolbox

    Open Source White Matter Hyperintensities Segmentation Toolbox

    Wisconsin White Matter Hyperintensity Segmentation [W2MHS] and Quantification Toolbox is an open source MatLab toolbox designed for detecting and quantifying White Matter Hyperintensities (WMH) in Alzheimer’s and aging related neurological disorders. WMHs arise as bright regions on T2- weighted FLAIR images. They reflect comorbid neural injury or cerebral vascular disease burden. Their precise detection is of interest in Alzheimer’s disease (AD) with regard to its prognosis. Our toolbox...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23

    BioLemmatizer

    Lemmatization tool for morphological analysis of biomedical literature

    ...If you use the BioLemmatizer to support academic research, please cite the following paper: Haibin Liu, Tom Christiansen, William A Baumgartner Jr, and Karin Verspoor BioLemmatizer: a lemmatization tool for morphological processing of biomedical text Journal of Biomedical Semantics 2012, 3:3.
    Downloads: 8 This Week
    Last Update:
    See Project
  • 24

    An ethernet sniffer for BrainNet36®

    An ethernet sniffer for the EEG acquisition system BrainNet36®

    ...BrainNet36® has 36 channels, A/D converters with 16 bit accuracy, conversion time of 10 µs and Ethernet communication interface. Being a device for clinical purposes, BrainNet36® does not export data online. This sniffer was developed to allow online processing by working in promiscuous mode and recording data in a plain text file.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    Java application for training and deploying text processing applications such as part-of-speech taggers, based on a re-implementation of Brill's algorithm in Java.
    Downloads: 0 This Week
    Last Update:
    See Project
MongoDB Logo MongoDB