Showing 50 open source projects for "corpus"

View related business solutions
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • Atera - an All-in-one platform for IT management Icon
    Atera - an All-in-one platform for IT management

    Ideal for IT departments and MSPs (managed service providers)

    Your IT essentials, integrated & elevated. Take your IT management from automated to autonomous, download Atera's agent to start your free trial!
    Try Atera now
  • 1
    IMS Open Corpus Workbench

    IMS Open Corpus Workbench

    Indexing and query tools for very large text corpora

    The IMS Open Corpus Workbench is a collection of tools for managing and querying large text corpora (100 M words and more) with linguistic annotations. Its central component is the flexible and efficient query processor CQP, which can be used interactively in a terminal session, as a backend e.g. from a Perl script, or through the Web-based GUI CQPweb.
    Leader badge
    Downloads: 60 This Week
    Last Update:
    See Project
  • 2
    iramuteq
    IRAMUTEQ : Interface de R pour les Analyses Multidimensionnelles de Textes et de Questionnaires. Logiciel de traitement de données pour des corpus texte ou de type individus/caractères. Permet notamment de réaliser des analyses de type "ALCESTE"
    Leader badge
    Downloads: 655 This Week
    Last Update:
    See Project
  • 3

    modnlp-plugins

    External plugins for modnlp/teccli

    This is a general project for modnlp/teccli plugins, with focus on text visualizaton.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4

    Linguistic Analyzer

    The Linguistic Analyzer is a tool for corpus analysis and comparison

    The Linguistic Analyzer (Almuhalil Alloghawy) is a free tool designed by a team from Al-Imam Muhammad bin Saud islamic university that can be used for corpus analysis and comparison in terms of the several linguistic characteristics, such as frequency lists generation, concordances, collocation extraction, the difference between two words, and keyword identification.
    Downloads: 10 This Week
    Last Update:
    See Project
  • Build Agents and Models on One Platform Icon
    Build Agents and Models on One Platform

    Everything you need to build production-ready agents and models. Access 200+ Google and third-party AI models and tools.

    Gemini Enterprise Agent Platform is Google Cloud's comprehensive platform for developers to build, scale, govern, and optimize agents and models. Choose from Google's most advanced models and third-party models like Anthropic's Claude Model Family.
    Try It Free
  • 5
    Software, information, data sets and documentation for the Web as Corpus community.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6

    DWDS/Dialing Concordance

    a collection of indexing and search tools for corpus linguists

    DWDS/Dialing Concordance (DDC) - a collection of index and search tools for corpus linguists
    Downloads: 1 This Week
    Last Update:
    See Project
  • 7
    This is an application generator for conflation algorithms in perl language. This system supports generation perl source code for a stemmer from a rule file, running a stemmer which is supported by the system, parsing a corpus file.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8

    korpus

    Corpus Linguistics Software

    Some software for Corpus Linguistics, which includes Corpus Text Editor, Web-based search, etc. This project created for Belarusian Corpus, but can be used for other languages with some adaption.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9

    KSUCCA Corpus

    A 50 million tokens corpus of Classical Arabic.

    King Saud University Corpus of Classical Arabic (KSUCCA) is a pioneering 50 million tokens annotated corpus of Classical Arabic texts from the period of pre-Islamic era until the fourth Hijri century (equivalent to the period from the seventh until early eleventh century CE), which is the period of pure classical Arabic. The main aim of this corpus is to be used for studying the distributional lexical semantics of The Quran words.
    Downloads: 4 This Week
    Last Update:
    See Project
  • Stop Cyber Threats with VM-Series Next-Gen Firewall on Azure Icon
    Stop Cyber Threats with VM-Series Next-Gen Firewall on Azure

    Native application identity and user-based security for your Azure cloud

    Gain integrated visibility across all traffic in a single pass. Deploy Palo Alto Networks VM-Series to determine application identity and content while automating security policy updates via rich APIs.
    Get a free trial
  • 10

    SimpleLemmatizer

    This program is for text lemmatization

    It lemmatizes texts based on supplied model. The base model is for slovak texts and is created from Slovak National Corpus, copyright by Ľ. Štúr Institute of Linguistics, Slovak Academy of Sciences
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11

    Arabic Corpus

    Text categorization, arabic language processing, language modeling

    The Arabic Corpus {compiled by Dr. Mourad Abbas ( http://sites.google.com/site/mouradabbas9/corpora ) The corpus Khaleej-2004 contains 5690 documents. It is divided to 4 topics (categories). The corpus Watan-2004 contains 20291 documents organized in 6 topics (categories). Researchers who use these two corpora would mention the two main references: (1) For Watan-2004 corpus ---------------------- M.
    Leader badge
    Downloads: 4 This Week
    Last Update:
    See Project
  • 12
    Open data for a Khmer language corpus and lexicographic data that can be used for the development of free language tools for Khmer language, such as automatic translators, dictionaries, linguistic analysis tools, etc.
    Downloads: 163 This Week
    Last Update:
    See Project
  • 13
    Corpus Toolkit

    Corpus Toolkit

    A text management tool for linguistic purposes...

    Downloads: 0 This Week
    Last Update:
    See Project
  • 14

    PADIC

    A multilingual Parallel Arabic DIalectal Corpus

    PADIC (Parallel Arabic DIalectal Corpus) is a multi-dialectal corpus built in the framework of the National Research Project "TORJMAN", led by Scientific and Technical Research Center for the Development of Arabic Language and funded by the Algerian Ministry of Higher Education and Scientific Research. PADIC is composed of 6 dialects: two Algerian dialects (Algiers and Annaba cities), Palestinian, Syrian, Tunisian, Moroccan) and MSA.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 15

    ReLiS

    Tool for conducting systematic literature reviews and studies

    ...When a researcher wants to address a research problem, he starts by looking at what already exists in the scientific literature (published papers) on the topic. ReLiS is a tool that helps him considerably reduce the effort to analyze the corpus of papers, typically varying between hunderds and thousands depending on the research topic. ReLiS allows the user to follow a systematic process and automate the classification of papers as much as possible during the literature study.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    ICE Nigeria

    ICE Nigeria

    Nigerian component of the International Corpus of English

    This is the Nigerian component of the International Corpus of English, a one million word corpus of written and spoken Nigerian English for linguistic research. It can be used as a stand-alone corpus or in conjunction with other components of the International Corpus of English (such as ICE-GB, ICE-India, etc.) to compare international varieties of English. This is the first release of the complete corpus.
    Downloads: 7 This Week
    Last Update:
    See Project
  • 17
    AFEWC corpus is a multilingual comparable text articles in Arabic, French, and English languages. Each triple article is related to the same topic (aligned at article level). AFEWC corpus is collected from Wikipedia. The corpus is available for free for research purposes only. It is composed of 40K aligned articles, 91.3M English words, 57.8M French words, 22M Arabic words, 2.8M English unique words, 1.9M French unique words, and 1.5M Arabic unique words. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18

    Drug Extraction

    Drug name extraction

    ...Using CONLL-Evaluation: processed 32065 tokens with 3656 phrases; found: 3251 phrases; correct: 2786. accuracy: 95.25%; precision: 85.70%; recall: 76.20%; FB1: 80.67 Using GATE Corpus Benchmark: Strict: P: 0.65 R: 0.73 F1: 0.69 Lenient: P: 0.74 R: 0.84 F1: 0.78 The details of how to reproduce evaluation, see README. To use standalone version for tagging download DrugExtractionStandalone.tar.gz from Files.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    DisMo

    DisMo

    A POS, disfluency and multi-word unit annotator for spoken language

    ...It is developed and maintained by George Christodoulides (Centre Valibel, IL&C, University of Louvain, Louvain-la-Neuve, Belgium). Visit www.corpusannotation.org to find out more about DisMo and other annotation tools for language corpora. If you are using DisMo to annotate your corpus, please cite the following paper: Christodoulides, George; Avanzi, Mathieu; Goldman, Jean-Philippe. DisMo: A Morphosyntactic, Disfluency and Multi-Word Unit Annotator. An Evaluation on a Corpus of French Spontaneous and Read Speech. Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC) 2014, Reykjavik, Iceland, 26-31 May 2014, pp. 3902-3907.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    Platform for Annotated Corpora in XML Integrated tool for corpus linguists built on Eclipse, Vex, Subversive, etc. for creating and editing transcriptions and annotations, querying, managing version controlled data, and building a shippable corpus.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    TF-IDF.jar is a Java Archive file to measure TF-IDF of each document in a document collection (corpus). The jar can be used to (a) get all the terms in the corpus (b) get the document frequency (DF) and inverse document frequency (IDF) of all the terms in the corpus (c) get the TF-IDF of each document in the corpus (d) get each term with their frequency (no. of presence), term frequency (TF) and TF-IDF in every document
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    CorpusSearch finds syntactic structures in a corpus of annotated sentence trees. It can be used as a research tool on a corpus, or as a development tool for building the corpus.
    Leader badge
    Downloads: 39 This Week
    Last Update:
    See Project
  • 23
    ValiTerms

    ValiTerms

    Validation of terms in corpus

    ValiTerms is a tool that helps the validation of terms in corpus. It finds their occurrences and allows terminologists to choose if a term is relevant or not. ValiTerms is developed at LIPN (http://www-lipn.univ-paris13.fr), RCLN team. Please consult the wiki for instructions about installation and usage.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    Redundancy due to cut-paste operations in text creates bias in machine learning for NLP. This module takes a directory and produces a subset of the files in that directory (in a list) with an upper bound on similarity between two files.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    Australian National Corpus

    Australian National Corpus

    An ongoing project to collate and provide access to language data

    Includes • Scripts for the program/ code developed • High level architecture diagrams • Install guides for developers • Links to end user documentation on the AusNC website Note: The BSD license applies to customised plug-ins, scripts and ingest programs developed by the AusNC project team. Additional open source, 3rd party software products used by the AusNC solution are referenced on our SF wiki space.
    Downloads: 1 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • Next
Auth0 Logo