Showing 151 open source projects for "text processing"

View related business solutions
  • Fully Managed MySQL, PostgreSQL, and SQL Server Icon
    Fully Managed MySQL, PostgreSQL, and SQL Server

    Automatic backups, patching, replication, and failover. Focus on your app, not your database.

    Cloud SQL handles your database ops end to end, so you can focus on your app.
    Try Free
  • Our Free Plans just got better! | Auth0 Icon
    Our Free Plans just got better! | Auth0

    With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

    You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.
    Try free now
  • 1
    ATTENTION! Morfologik is now at GitHub: https://github.com/morfologik/
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    Virastyar

    Virastyar

    Virastyar is an spell checker for low-resource languages

    Virastyar is a free and open-source (FOSS) spell checker. It stands upon the shoulders of many free/libre/open-source (FLOSS) libraries developed for processing low-resource languages, especially Persian and RTL languages Publications: Kashefi, O., Nasri, M., & Kanani, K. (2010). Towards Automatic Persian Spell Checking. SCICT. Kashefi, O., Sharifi, M., & Minaie, B. (2013). A novel string distance metric for ranking Persian respelling suggestions. Natural Language Engineering,...
    Downloads: 56 This Week
    Last Update:
    See Project
  • 3
    MARF is a general cross-platform framework with a collection of algorithms for audio (voice, speech, and sound) and natural language text analysis and recognition along with sample applications (identification, NLP, etc.) of its use, implemented in Java.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    The Java Data Mining Package (JDMP) is a library that provides methods for analyzing data with the help of machine learning algorithms (e.g. clustering, classification, graphical models, neural networks, Bayesian networks, text processing, optimization).
    Downloads: 0 This Week
    Last Update:
    See Project
  • $300 in Free Credit Towards Top Cloud Services Icon
    $300 in Free Credit Towards Top Cloud Services

    Build VMs, containers, AI, databases, storage—all in one place.

    Start your project in minutes. After credits run out, 20+ products include free monthly usage. Only pay when you're ready to scale.
    Get Started
  • 5
    JInsect
    The JINSECT toolkit is a Java-based toolkit and library that supports and demonstrates the use of n-gram graphs within Natural Language Processing applications, ranging from summarization and summary evaluation to text classification and indexing.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    Stemmer Gujarati

    Stemmer Gujarati

    Offline stemmer for Gujarati , which is one of 22 Indian languages.

    ...There has been lot of significant work in the development and evaluation of stemmer for non-Indian languages, but very less or no significant work has been done on Indian front especially for Gujarati language.The code of this stemmer is based on algorithm designed under guidance of Prof. Nikita Desai, India. It takes input file of type .txt containing Gujarati text encoded as UTF-8 and then removes stop words which are unessential. After processing rest of the words, it outputs corresponding file containing all stem words plus other details.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7

    Text Analyzer

    Text analyzing software

    An application developed in C using the list and the AVL tree data structures, which analyzes a text (.txt file) giving the following information as an output: 1. the total occurrences of every word in the text 2. the exact line of every occurrence of every word 3. the exact position in the line of every occurrence of every word 4. the exact paragraph of every occurrence of every word 5. the exact sentence of every occurrence of every word The output is also written in a...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    ArabicDiacritizer

    ArabicDiacritizer

    An automatic restoration of Arabic diacritic marks

    This is a software of Arabic diacritical marks restoration. It is based mainly on deep architectures using deep neural network. The algorithm generates diacritized text with determined end case. The algorithm is described in detail in: Ilyes Rebai, and Yassine BenAyed 'Text-to-speech synthesis system with Arabic diacritic recognition system', Computer Speech & Language, 2015. We appreciate it very much if you can cite our related work. ************** Installation...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    This project is devoted to the development of natural language processing tools and resources for the Lingala language, which is spoken by tens of millions of people in central Africa.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Gemini 3 and 200+ AI Models on One Platform Icon
    Gemini 3 and 200+ AI Models on One Platform

    Access Google's best plus Claude, Llama, and Gemma. Fine-tune and deploy from one console.

    Build, govern, and optimize agents and models with Gemini Enterprise Agent Platform.
    Start Free
  • 10

    Khawas

    An Arabic Corpora Processing Tool

    The new version is available at https://sourceforge.net/projects/ghawwasv4/
    Downloads: 1 This Week
    Last Update:
    See Project
  • 11

    T.H.O.R.I.U.M.

    T.H.O.R.I.U.M. - Thermooptic radiation iterative universal module.

    The purpose of this project is to develop open source, precise, fast and easy-to-use software for radiation heat transfer analysis.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    SetFon focus is an interface web based for Praat resources (www.praat.org) wich focus speech sound annalysis; it is a gerent program for acoustic analysis PHP/Mysql based. Developed with the framework SIMP.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    The BioNLP UIMA Component Repository provides UIMA wrappers for novel and well-known 3rd-party NLP tools used in biomedical text prosessing, such as tokenizers, parsers, named entity taggers, and tools for evaluation.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    FALCON - Text Search Java Project

    FALCON - Text Search Java Project

    JSON based text search Java Project

    ----------------- - What is it? - ----------------- The "Falcon Search" is a JAVA API and tool to search inside the documents. It was originally started to search the content in pdf files under the project "HAWK Search". Searching with this tool is query-based not word-based as in most of the document search tools OR document readers. It also takes care of jumbling of words within query and spelling mistakes. Commonly used techniques in this project are Natural Language...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15

    Bermuda Text-to-Speech

    This project includes basic NLP and DSP techniques for Text-to-Speech

    See TTS demo at: http://rslp.racai.ro/index.php?page=tts This is an entirely written in JAVA project which includes a set of tools and methods designed to enable Multilingual Text-to-Speech (TTS) synthesis. We currently support English and Romanian but we will soon train more models and make them available for download. If you want to read more about our other NLP and TTS tools check out http://nlptools.racai.ro.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    HAWK - PDF Text Search Java Project

    HAWK - PDF Text Search Java Project

    No more support for this project - TAKE A LOOK AT FALCONSEARCH

    No more support for this project - TAKE A LOOK AT FALCONSEARCH "https://sourceforge.net/projects/falcontextsearch/"
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    TF-IDF.jar is a Java Archive file to measure TF-IDF of each document in a document collection (corpus). The jar can be used to (a) get all the terms in the corpus (b) get the document frequency (DF) and inverse document frequency (IDF) of all the terms in the corpus (c) get the TF-IDF of each document in the corpus (d) get each term with their frequency (no. of presence), term frequency (TF) and TF-IDF in every document
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    Knowtator is a general-purpose text annotation tool that is integrated with the Protégé knowledge representation system. Knowtator facilitates the manual creation of training and evaluation corpora for a variety of biomedical language processing tasks.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    gannu

    gannu

    Java API and tools for performing NLP and other AI tasks

    Java API and tools for performing a wide range of AI tasks such as: word sense disambiguation (released), optimization (5 Evolutionary Algorithms Implemented ETA February 2014), opinion mining (ETA November 2014) and text wikification (ETA July 2014). Gannu includes some graphical interfaces for scientific purposes. When using Gannu please cite: *Jiménez, F. V., Gelbukh, A. F. & Sidorov, G. (2013). Simple Window Selection Strategies for the Simplified Lesk Algorithm for Word Sense...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20

    An ethernet sniffer for BrainNet36®

    An ethernet sniffer for the EEG acquisition system BrainNet36®

    ...BrainNet36® has 36 channels, A/D converters with 16 bit accuracy, conversion time of 10 µs and Ethernet communication interface. Being a device for clinical purposes, BrainNet36® does not export data online. This sniffer was developed to allow online processing by working in promiscuous mode and recording data in a plain text file.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21

    BioLemmatizer

    Lemmatization tool for morphological analysis of biomedical literature

    ...If you use the BioLemmatizer to support academic research, please cite the following paper: Haibin Liu, Tom Christiansen, William A Baumgartner Jr, and Karin Verspoor BioLemmatizer: a lemmatization tool for morphological processing of biomedical text Journal of Biomedical Semantics 2012, 3:3.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    Java application for training and deploying text processing applications such as part-of-speech taggers, based on a re-implementation of Brill's algorithm in Java.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    LinqYedict

    LinqYedict

    Translate Chinese to English

    Translate Chinese to English using CEDICT (cantonese dictionary). Demonstrate the speed of C# and Linq. Copy the chinese text from any browser/application to Windows clipboard and see the translation.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24

    BioDare

    BioDare is Biological Data Repository focused on timeseries data

    BioDare (Biological Data Repository) was developed under the multi-site ROBuST project (http://hallidaylab.bio.ed.ac.uk/ROBuST.html) to support data exchange inside the project. It is a web application which allows data-sharing (including public dissemination), data-processing and analysis, with the main focus on time-series data produced in circadian experiments. The main features of BioDare are: - an online repository for experimental data accompanied by extensive metadata - generation of secondary data (normalized, detrended, averaged …) - graphical output of data, secondary data and rhythm analysis - simple text-based search throughout metadata - biology- and conditions-aware search for data - data aggregation and export - group-based privacy settings for collaborative research
    Downloads: 1 This Week
    Last Update:
    See Project
  • 25
    latexdiff is a Perl script, which compares two latex files and marks up significant differences between them (i.e. a diff for latex files). Various options are available for visual markup using standard latex packages such as "color.sty".
    Downloads: 0 This Week
    Last Update:
    See Project
MongoDB Logo MongoDB