Showing 144 open source projects for "text processing"

View related business solutions
  • Auth0 B2B Essentials: SSO, MFA, and RBAC Built In Icon
    Auth0 B2B Essentials: SSO, MFA, and RBAC Built In

    Unlimited organizations, 3 enterprise SSO connections, role-based access control, and pro MFA included. Dev and prod tenants out of the box.

    Auth0's B2B Essentials plan gives you everything you need to ship secure multi-tenant apps. Unlimited orgs, enterprise SSO, RBAC, audit log streaming, and higher auth and API limits included. Add on M2M tokens, enterprise MFA, or additional SSO connections as you scale.
    Sign Up Free
  • Custom VMs From 1 to 96 vCPUs With 99.95% Uptime Icon
    Custom VMs From 1 to 96 vCPUs With 99.95% Uptime

    General-purpose, compute-optimized, or GPU/TPU-accelerated. Built to your exact specs.

    Live migration and automatic failover keep workloads online through maintenance. One free e2-micro VM every month.
    Try Free
  • 1
    Leseratte is a Java parser for German written language. Currently, it contains a German lexicon (based on the Wiktionary), inflexion rules, a grammar and a parser. (Semantics component planned.) Usable as a Java library, also provides a graphical UI.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2

    KSUCCA Corpus

    A 50 million tokens corpus of Classical Arabic.

    King Saud University Corpus of Classical Arabic (KSUCCA) is a pioneering 50 million tokens annotated corpus of Classical Arabic texts from the period of pre-Islamic era until the fourth Hijri century (equivalent to the period from the seventh until early eleventh century CE), which is the period of pure classical Arabic. The main aim of this corpus is to be used for studying the distributional lexical semantics of The Quran words. However, it can be used for other research purposes, such...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 3

    AhoTTS Multilingual, a Multilingual TTS

    Text-to-Speech TTS for Basque, Spanish, Catalan, Galician and English

    Text-to-Speech conversor for Basque, Spanish, Catalan, Galician and English. It includes linguistic processing and built voices for all the languages aforementioned. Its acoustic engine is based on hts_engine and it uses a high quality vocoder called AhoCoder. Developed by Aholab Signal Processing Laboratory: https://aholab.ehu.es/aholab/ http://aholab.ehu.es/ahocoder/
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    TIES

    TIES

    A smart search engine for medical documents

    TIES (Text Information Extraction System) is a clinical text search engine that uses Natural Language Processing techniques to extract medical concepts from free text clinical reports. It provides secure de-identified access to this information and has in built collaboration tools and honest broker functionality. It is licensed for academic use under the BSD license.
    Downloads: 0 This Week
    Last Update:
    See Project
  • $300 in Free Credit Towards Top Cloud Services Icon
    $300 in Free Credit Towards Top Cloud Services

    Build VMs, containers, AI, databases, storage—all in one place.

    Start your project in minutes. After credits run out, 20+ products include free monthly usage. Only pay when you're ready to scale.
    Get Started
  • 5

    Safe Harbor Deidentification

    Safe Harbor Deidentification for medical documents

    Phalanx - Deidentify Safe Harbor Deidentification Mode of Phalanx is an abridged pipeline of NLP annotators culminating in NER annotators which write output of text offsets. It uses the Safe Harbor deidentification method.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6

    SmGen

    Verilog Finite State Machine (FSM) Code Generator

    SmGen is a finite state machine (FSM) generator for Verilog. On the other hand, it is not an FSM entry tool. The input is behavioral Verilog with clock boundaries specifically set by the designer. SmGen unrolls this behavioral code and generates an FSM from it in synthesizable Verilog. Clock boundaries are explicitly provided by the designer so there is good control on the expected timing
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7

    Arabic Corpus

    Text categorization, arabic language processing, language modeling

    The Arabic Corpus {compiled by Dr. Mourad Abbas ( http://sites.google.com/site/mouradabbas9/corpora ) The corpus Khaleej-2004 contains 5690 documents. It is divided to 4 topics (categories). The corpus Watan-2004 contains 20291 documents organized in 6 topics (categories). Researchers who use these two corpora would mention the two main references: (1) For Watan-2004 corpus ---------------------- M. Abbas, K. Smaili, D. Berkani, (2011) Evaluation of Topic Identification Methods on...
    Leader badge
    Downloads: 9 This Week
    Last Update:
    See Project
  • 8

    Ghawwas_V4

    An open source system for Arabic corpora processing

    Ghawwas (previously known as Khawas) is an open source system for Arabic corpora processing. Ghawwas V4.0 provides the following main functions: a. Frequency list for single word and N-Grams b. Concordance c. Collocation (MI, CHI Squared, LL, T-Score, Z Score, Dice, Log Dice, Weirdness Coefficient) d. Lexical patterns search e. Two corpora frequency profile comparison based on MI, CHI, LL, T-Score, Z Score, Dice, Log Dice, Weirdness Coefficient f. Accept Windows and UTF-8 character...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 9
    Mavscript

    Mavscript

    Calculations in a text document

    Mavscript allows the user to do calculations in a text document. Plain text, LaTeX and OpenOffice Writer files (.odt) are supported. The calculation is done by the algebra system Yacas (default), Jasymca or by the Java interpreter BeanShell.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Go From AI Idea to AI App Fast Icon
    Go From AI Idea to AI App Fast

    One platform to build, fine-tune, and deploy ML models. No MLOps team required.

    Access Gemini 3 and 200+ models. Build chatbots, agents, or custom models with built-in monitoring and scaling.
    Try Free
  • 10
    lottie vectors

    lottie vectors

    Create, display and process 2D vectors in a 3D window.

    Lottie Vectors is an application for Matlab that alows you to do some pretty neat things -with vectors. More exactly -displaying them in ways that hopefully will allow you to explore and better understand your vector data. The basic idea is simple, take a vector defined in one of a few different types of data formats and map it on the screen. Add another vector and you start to form a 'route'. Each route or position vector can be accompanied with a 'force' vector. This can be used to...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 11
    IceNLP is an open source Natural Language Processing (NLP) toolkit for analyzing and processing Icelandic text. The toolkit is implemented in Java.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    High-Throughput Tabular Data Processor
    ...Citation: Madanecki P, Bałut M, Buckley PG, Ochocka JR, Bartoszewski R, Crossman DK, et al. (2018) High-Throughput Tabular Data Processor – Platform independent graphical tool for processing large data sets. PLoS ONE 13(2): e0192858. https://doi.org/10.1371/journal.pone.0192858
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    Capsim(r) C Text Mode Kernel(TMK),DSP and communication blocks, topologies, libraries and tools for the development of high performance block diagram digital signal processing and communications systems,built in interpreter for scripting.SystemC support.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 14

    PSIworx

    Data processing for the PSI fluorometer

    PSIworxR (R) and PSIworx (MATLAB) are a collection of functions and scripts to analyze data from the PSI SuperHead Fast Fluorometer series (www.psi.cz). These fluorometers are used in limnology and oceanography research communities. The program retrieves parameters from single turnover induction and relaxation. Results are returned as text files and PDF figures.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15

    TEES

    Turku Event Extraction System

    Turku Event Extraction System (TEES) is a free and open source natural language processing system developed for the extraction of events and relations from biomedical text. It is written mostly in Python, and should work in generic Unix/Linux environments. Currently, the TEES source code repository still remains on GitHub at http://jbjorne.github.com/TEES/ where there is also a wiki with more information.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 16
    Welsh Natural Language Toolkit
    The project supports the Welsh Language Technology domain with a set of NLP tools that drive innovation and advance the development of sophisticated textual analysis solutions. The WNLT project delivers four core NLP modules; a) Word Segmentation for separating text into words b) Sentence Boundary Disambiguation for finding sentence boundaries c) Part of Speech Tagger for determining the part of speech of each word d) Morphological Analyser for identifying the root form (lemma) of words....
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17

    Discriminative Language Editor

    Discriminative language editor based on ontologies

    Text editor in Java that is able to detect discriminative expressions while the user is typing. When the internal ontology-based analyzer detects a potential discriminative expression the user is advised by underscoring the related words in the text. A descriptive message about the issue is also shown to the user when the cursor is placed over the potential discriminative expression.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18

    BioC

    We describe a simple XML format to share text documents and annotation

    A minimalist approach to share text documents and data annotations. Allows a large number of different annotations to be represented. Project files contain: - simple code to hold/read/write data and perform sample processing. - BioC-formatted corpora - BioC tools that work with BioC corpora BioC goals - simplicity - interoperability - broad use - reuse There should be little investment required to learn to use a format or a software module to process that format. ...
    Leader badge
    Downloads: 16 This Week
    Last Update:
    See Project
  • 19
    Visualization of Protein-Ligand Graphs

    Visualization of Protein-Ligand Graphs

    Compute protein graphs. Moved to https://github.com/MolBIFFM/PTGLtools

    NOTE: Project moved to https://github.com/MolBIFFM/PTGLtools. The Visualization of Protein-Ligand Graphs (VPLG) software package computes and visualizes protein graphs. It works on the super-secondary structure level and uses the atom coordinates from PDB files and the SSE assignments of the DSSP algorithm. VPLG is command line software. If you do not like typing commands, try our PTGL web server: http://ptgl.uni-frankfurt.de/
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    Welsh Natural Language Toolkit

    Welsh Natural Language Toolkit

    WNLT is a suite of open source natural language modules for the Welsh

    The project supports the Welsh Language Technology domain with a set of NLP tools that drive innovation and advance the development of sophisticated textual analysis solutions. The WNLT project delivers four core NLP modules; a) Word Segmentation for separating text into words b) Sentence Boundary Disambiguation for finding sentence boundaries c) Part of Speech Tagger for determining the part of speech of each word d) Morphological Analyser for identifying the root form (lemma) of words....
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21

    PLP

    Powerfull pre-processor

    Powerful Verilog Preprocessor. PLP stands for Perl Pre-processor. Perl is used as "control language" that is embedded in the Verilog code (or any other code) to generate code on the fly. It is used commonly as a Verilog pre-processor but can be used with any target/output language (C, C++, Java, VHDL, plain text etc)
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22

    bnf2xml

    simple BNF parser makes xml markup of matches

    bnf2xml a simple BNF parser that takes text as input, searches according to a BNF query file, and outputs text marked up by the xml labels that show context. bnf2xml is as simple to use as any text binary ie, awk(1) grep(1). bnf2xml does not require C API because it outputs simple xml labeling. README is visible on file dl page. EXAMPLE: $ echo "hi" | bnf2xml patternfile <word><alph>h</alph><alph>i</alph></word> or <gas>hydrogen iodide</gas> patternfile says how to find...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    ATTENTION! Morfologik is now at GitHub: https://github.com/morfologik/
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    Virastyar

    Virastyar

    Virastyar is an spell checker for low-resource languages

    Virastyar is a free and open-source (FOSS) spell checker. It stands upon the shoulders of many free/libre/open-source (FLOSS) libraries developed for processing low-resource languages, especially Persian and RTL languages Publications: Kashefi, O., Nasri, M., & Kanani, K. (2010). Towards Automatic Persian Spell Checking. SCICT. Kashefi, O., Sharifi, M., & Minaie, B. (2013). A novel string distance metric for ranking Persian respelling suggestions. Natural Language Engineering,...
    Downloads: 56 This Week
    Last Update:
    See Project
  • 25
    MARF is a general cross-platform framework with a collection of algorithms for audio (voice, speech, and sound) and natural language text analysis and recognition along with sample applications (identification, NLP, etc.) of its use, implemented in Java.
    Downloads: 0 This Week
    Last Update:
    See Project
MongoDB Logo MongoDB