Showing 171 open source projects for "corpus"

View related business solutions
  • Top-Rated Free CRM Software Icon
    Top-Rated Free CRM Software

    216,000+ customers in over 135 countries grow their businesses with HubSpot

    HubSpot is an AI-powered customer platform with all the software, integrations, and resources you need to connect your marketing, sales, and customer service. HubSpot's connected platform enables you to grow your business faster by focusing on what matters most: your customers.
  • Achieve perfect load balancing with a flexible Open Source Load Balancer Icon
    Achieve perfect load balancing with a flexible Open Source Load Balancer

    Take advantage of Open Source Load Balancer to elevate your business security and IT infrastructure with a custom ADC Solution.

    Boost application security and continuity with SKUDONET ADC, our Open Source Load Balancer, that maximizes IT infrastructure flexibility. Additionally, save up to $470 K per incident with AI and SKUDONET solutions, further enhancing your organization’s risk management and cost-efficiency strategies.
  • 1
    IMS Open Corpus Workbench

    IMS Open Corpus Workbench

    Indexing and query tools for very large text corpora

    The IMS Open Corpus Workbench is a collection of tools for managing and querying large text corpora (100 M words and more) with linguistic annotations. Its central component is the flexible and efficient query processor CQP, which can be used interactively in a terminal session, as a backend e.g. from a Perl script, or through the Web-based GUI CQPweb.
    Leader badge
    Downloads: 75 This Week
    Last Update:
    See Project
  • 2
    Minimal text diffusion

    Minimal text diffusion

    A minimal implementation of diffusion models for text generation

    A minimal implementation of diffusion models of text: learns a diffusion model of a given text corpus, allowing to generate text samples from the learned model. The main idea was to retain just enough code to allow training a simple diffusion model and generating samples, remove image-related terms, and make it easier to use. To train a model, run scripts/train.sh. By default, this will train a model on the simple corpus. However, you can change this to any text file using the --train_data...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    gensim

    gensim

    Topic Modelling for Humans

    Gensim is a Python library for topic modeling, document indexing, and similarity retrieval with large corpora. The target audience is the natural language processing (NLP) and information retrieval (IR) community.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    Chronos Forecasting

    Chronos Forecasting

    Pretrained (Language) Models for Probabilistic Time Series Forecasting

    Chronos is a family of pretrained time series forecasting models based on language model architectures. A time series is transformed into a sequence of tokens via scaling and quantization, and a language model is trained on these tokens using the cross-entropy loss. Once trained, probabilistic forecasts are obtained by sampling multiple future trajectories given the historical context. Chronos models have been trained on a large corpus of publicly available time series data, as well...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Simplify Purchasing For Your Business Icon
    Simplify Purchasing For Your Business

    Manage what you buy and how you buy it with Order.co, so you have control over your time and money spent.

    Simplify every aspect of buying for your business in Order.co. From sourcing products to scaling purchasing across locations to automating your AP and approvals workstreams, Order.co is the platform of choice for growing businesses.
  • 5
    Echidna

    Echidna

    Ethereum smart contract fuzzer

    ... in specific cases. Optional corpus collection, mutation and coverage guidance to find deeper bugs. Powered by Slither to extract useful information before the fuzzing campaign. Source code integration to identify which lines are covered after the fuzzing campaign. Curses-based retro UI, text-only or JSON output.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    LF Aligner helps translators create translation memories from texts and their translations. It relies on Hunalign for automatic sentence pairing. Input: txt, doc, docx, rtf, pdf, html. Output: tab delimited txt, TMX and xls. With web features. My email address is listed in readme.txt; for support, use the forum here. My personal website: www.farkastranslations.com.
    Leader badge
    Downloads: 77 This Week
    Last Update:
    See Project
  • 8
    43 queries of various topics for the Information Retrieval Collection . The corpus is created from the OSAC corpus of journalistic texts consisting of 4763 articles recovered from the Arabic BBC News. https://sourceforge.net/projects/ar-text-mining/files/Arabic-Corpora/
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    TEXminer

    TEXminer

    Text Mining Classification for Texts in ASCII, Unicode and PDF Format.

    ... is not disigned to have a Reference Corpus, Thematic Model Statistics uses Language Models (lexicons) to have Background Knowledge about certain Languages (English, German, French, Spanish, Italian, Russian), which are derived from Decaleon Project. The Thematic Models for Standard Vocabulary have been extended (spring 2015). The Thematic Models for Technical Terms have been extended (autumn 2015). The Thematic Models for additional Standard Vocabularies have been extended (2015-2019).
    Downloads: 15 This Week
    Last Update:
    See Project
  • AI-based, Comprehensive Service Management for Businesses and IT Providers Icon
    AI-based, Comprehensive Service Management for Businesses and IT Providers

    Modular solutions for change management, asset management and more

    ChangeGear provides IT staff with the functions required to manage everything from ticketing to incident, change and asset management and more. ChangeGear includes a virtual agent, self-service portals and AI-based features to support analyst and end user productivity.
  • 10

    Syllabic Verse Analysis (SylVA)

    Syllabifies and scans syllabic verse texts for metrical annotation

    The tool syllabifies and scans texts written in syllabic verse for metrical corpus annotation. It is designed for Old French and Old Occitan and exports the results in PAULA format suitable for the ANNIS platform (http://corpus-tools.org/annis/). Used first in the preparation of the metrical treebank containing the Old Occitan <i>Boeci</i> text (cf. Rainsford and Scrivner 2014), development continued for use with the Old Gallo-Romance Corpus <http://www.ogr-corpus.org>).
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11

    modnlp-plugins

    External plugins for modnlp/teccli

    This is a general project for modnlp/teccli plugins, with focus on text visualizaton.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12

    modnlp

    Modular Suite of NLP Tools

    modnlp aims to provide a modular architecture and tools for natural language processing written (mainly) in Java. It provides an API and tools for (inverted) indexing, storage and retrieval of large amounts of text, with (XML-based) handling of meta-data, tools for text categorisation, including, functionality for XML parsing, term set reduction (and basic keyword extraction), probabilistic classifier induction, sample classification tools, and evaluation modules, a suite of corpus management...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    TXM

    TXM

    Unicode-XML-TEI text/corpus analysis platform

    TXM is a free and open-source cross-platform Unicode & XML based text/corpus analysis environment and graphical client, supporting Windows, Linux and Mac OS X. It can also be used online as a J2EE standard compliant web portal (GWT based) with access control built in. DOWNLOAD LATEST VERSION OF TXM : http://textometrie.ens-lyon.fr/spip.php?rubrique61&lang=en TXM offers a comprehensive range of analysis tools (concordances, collocate search, frequency lists, etc.) based on the powerfull CQP...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 14

    Letrista

    Generador automático de textos

    ... el generador de texto propiamente dicho, además de la base de datos preentrenada que le permite generar letras de canciones de tango. Su eficacia hasta el momento es limitada, a pesar de ser capaz de crear textos semanticamente aceptables en la mayoría de los casos, es incapaz de centrarse en una temática específica. El corpus mínimo para generar una base de datos utilazable es de al menos 100 páginas de texto y los resultados comienzan a mejorar a partir de esa cifra.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15

    Linguistic Analyzer

    The Linguistic Analyzer is a tool for corpus analysis and comparison

    The Linguistic Analyzer (Almuhalil Alloghawy) is a free tool designed by a team from Al-Imam Muhammad bin Saud islamic university that can be used for corpus analysis and comparison in terms of the several linguistic characteristics, such as frequency lists generation, concordances, collocation extraction, the difference between two words, and keyword identification.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    Software, information, data sets and documentation for the Web as Corpus community.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    hebrew-gpt_neo

    hebrew-gpt_neo

    Hebrew text generation models based on EleutherAI's gpt-neo

    Hebrew text generation models based on EleutherAI's gpt-neo. Each was trained on a TPUv3-8 which was made available to me via the TPU Research Cloud Program. The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    AliceMind

    AliceMind

    ALIbaba's Collection of Encoder-decoders from MinD

    ... and sentence levels, respectively. Pre-trained models for natural language generation (NLG). We propose a novel scheme that jointly pre-trains an autoencoding and autoregressive language model on a large unlabeled corpus, specifically designed for generating new text conditioned on context. It achieves new SOTA results in several downstream tasks.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    Open-Content Text Corpus
    The OCTC hosts open-content texts, encoded in TEI P5, for many languages, each in a separate subcorpus. Another part of the OCTC stores inter-language alignment info. The project is intended to be an open platform for collaboration.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    This is an application generator for conflation algorithms in perl language. This system supports generation perl source code for a stemmer from a rule file, running a stemmer which is supported by the system, parsing a corpus file.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21

    DWDS/Dialing Concordance

    a collection of indexing and search tools for corpus linguists

    DWDS/Dialing Concordance (DDC) - a collection of index and search tools for corpus linguists
    Leader badge
    Downloads: 3 This Week
    Last Update:
    See Project
  • 22
    node-markov-generator

    node-markov-generator

    Generates simple sentences based on given text corpus

    This simple generator emits short sentences based on the given text corpus using a Markov chain. To put it simply, it works kinda like word suggestions that you have while typing messages in your smartphone. It analyzes which word is followed by which in the given corpus and how often. And then, for any given word it tries to predict what the next one might be. Here you create an instance of TextGenerator passing an array of strings to it - it represents your text corpus which will be used...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    In this corpus: 10 essays containing 752 sentences (with a total of 4,160 words). The essays were selected from different collections of partially or totally diacritic Arabic texts, all of which are available in the Tashkeela corpus. Texts in this corpus have been used in the evaluation of AGD checker. There are two types of texts in this corpus: 1- Texts without errors to evaluate AGD in terms of detecting and correcting errors that we do not know about before the checking process 2-Texts...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24

    korpus

    Corpus Linguistics Software

    Some software for Corpus Linguistics, which includes Corpus Text Editor, Web-based search, etc. This project created for Belarusian Corpus, but can be used for other languages with some adaption.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    GPT2 for Multiple Languages

    GPT2 for Multiple Languages

    GPT2 for Multiple Languages, including pretrained models

    With just 2 clicks (not including Colab auth process), the 1.5B pretrained Chinese model demo is ready to go. The contents in this repository are for academic research purpose, and we do not provide any conclusive remarks. Research supported with Cloud TPUs from Google's TensorFlow Research Cloud (TFRC) Simplifed GPT2 train scripts(based on Grover, supporting TPUs). Ported bert tokenizer, multilingual corpus compatible. 1.5B GPT2 pretrained Chinese model (~15G corpus, 10w steps). Batteries...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • 4
  • 5
  • Next