corpora free download

Showing 26 open source projects for "corpora"

View related business solutions

Scientific/Engineering Mac Clear Filters & Widen Search

Forever Free Full-Stack Observability | Grafana Cloud
Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.

Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.

Create free account
Earn up to 16% annual interest with Nexo.
Let your crypto work for you

Put idle assets to work with competitive interest rates, borrow without selling, and trade with precision. All in one platform. Geographic restrictions, eligibility, and terms apply.

Get started with Nexo.
1

IMS Open Corpus Workbench

Indexing and query tools for very large text corpora

The IMS Open Corpus Workbench is a collection of tools for managing and querying large text corpora (100 M words and more) with linguistic annotations. Its central component is the flexible and efficient query processor CQP, which can be used interactively in a terminal session, as a backend e.g. from a Perl script, or through the Web-based GUI CQPweb.

Downloads: 33 This Week

Last Update: 2026-05-20
See Project
2

TXM

Unicode XML TEI text analysis platform

TXM is a free and open-source cross-platform Unicode & XML based text analysis environment and graphical client, supporting Windows, Linux and Mac OS X. It can also be used online as a J2EE standard compliant web portal (GWT based) with access control built in. DOWNLOAD LATEST VERSION OF TXM : http://textometrie.ens-lyon.fr/spip.php?rubrique61&lang=en TXM offers a comprehensive range of analysis tools (concordances, collocate search, frequency lists, etc.) based on the powerfull CQP...

Downloads: 5 This Week

Last Update: 2024-12-09
See Project
3

Linguistic Analyzer

The Linguistic Analyzer is a tool for corpus analysis and comparison

The Linguistic Analyzer (Almuhalil Alloghawy) is a free tool designed by a team from Al-Imam Muhammad bin Saud islamic university that can be used for corpus analysis and comparison in terms of the several linguistic characteristics, such as frequency lists generation, concordances, collocation extraction, the difference between two words, and keyword identification.

Downloads: 0 This Week

Last Update: 2022-04-16
See Project
4

UnsupervisedMT

Phrase-Based & Neural Unsupervised Machine Translation

Unsupervised Machine Translation is a research repository that implements both phrase-based SMT and neural MT approaches for translation without parallel corpora. The neural component supports multiple architectures—seq2seq, biLSTM with attention, and Transformer—and allows extensive parameter sharing across languages to improve data efficiency. Training relies on denoising auto-encoding and back-translation, with on-the-fly, multithreaded generation of synthetic parallel data to continually refresh supervision signals. ...

Downloads: 0 This Week

Last Update: 7 days ago
See Project
Our Free Plans just got better! | Auth0
With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.

Try free now
5

@Note2

@Note2 - A workbench for Biomedical Text Mining

Biomedical Text Mining (BioTM) is providing valuable approaches to the automated curation of scientific literature.

1 Review

Downloads: 1 This Week

Last Update: 2019-05-13
See Project
6

Ghawwas_V4

An open source system for Arabic corpora processing

Ghawwas (previously known as Khawas) is an open source system for Arabic corpora processing. Ghawwas V4.0 provides the following main functions: a. Frequency list for single word and N-Grams b. Concordance c. Collocation (MI, CHI Squared, LL, T-Score, Z Score, Dice, Log Dice, Weirdness Coefficient) d. Lexical patterns search e. Two corpora frequency profile comparison based on MI, CHI, LL, T-Score, Z Score, Dice, Log Dice, Weirdness Coefficient f.

1 Review

Downloads: 1 This Week

Last Update: 2018-12-09
See Project
7

BioNLP-Corpora

BioNLP-Corpora is a repository of biomedically and linguistically annotated corpora and biomedical data sources. There are many resources available in separate packages in this project.

Downloads: 1 This Week

Last Update: 2016-11-22
See Project
8

BioC

We describe a simple XML format to share text documents and annotation

...Allows a large number of different annotations to be represented. Project files contain: - simple code to hold/read/write data and perform sample processing. - BioC-formatted corpora - BioC tools that work with BioC corpora BioC goals - simplicity - interoperability - broad use - reuse There should be little investment required to learn to use a format or a software module to process that format. We are interested in reuse, and we focus on common NLP tasks that are broadly useful for textmining.

Downloads: 7 This Week

Last Update: 2016-08-08
See Project
9

diasim

Dialogue Similarity

Tools for calculating similarity (including lexical and syntactic) between speakers in dialogue, across standard and randomised corpora.

Downloads: 0 This Week

Last Update: 2016-03-31
See Project
$300 Free Credits for Your Google Cloud Projects
Start building on Google Cloud with $300 in free credits. No commitment, no credit card required until you're ready to scale.

Launch your next project with $300 in free Google Cloud credits—no strings attached. Test, build, and deploy without risk. Use your credits across the entire Google Cloud platform to find what works best for your needs. After your credits are used, continue with always-free tier services. Only pay when you're ready to scale. Sign up in minutes and start exploring.

Start Free Trial
10

Cross-Language Computational Linguistics

cross-languages resources

...It is composed of 40K aligned articles, 91.3M English words, 57.8M French words, 22M Arabic words, 2.8M English unique words, 1.9M French unique words, and 1.5M Arabic unique words. Wikipedia text is available under Creative Commons Attribution-ShareAlike 3.0 License. https://en.wikipedia.org/wiki/Wikipedia:About To cite the corpora: M. Saad, D. Langlois, and K. Smaïli. Extracting Comparable Articles from Wikipedia and Measuring their Comparabilities. Procedia - Social and Behavioral Sciences, 95(0):40 – 47, 2013. ISSN 1877-0428.

Downloads: 0 This Week

Last Update: 2015-09-11
See Project
11

mwetoolkit

THIS PROJECT MIGRATED TO https://gitlab.com/mwetoolkit/mwetoolkit3/

THIS PROJECT MIGRATED TO https://gitlab.com/mwetoolkit/mwetoolkit3/ The Multiword Expressions toolkit aids in the automatic identification and extraction of multiword units in running text. These include idioms (kick the bucket), noun compounds (cable car), phrasal verbs (take off, give up), etc. Even though it focuses on multiword expresisons, the framework is quite complete and can also be useful in any corpus-based study in computational linguistics. The mwetoolkit can be...

1 Review

Downloads: 1 This Week

Last Update: 2019-05-01
See Project
12

EXMARaLDA

EXMARaLDA stands for "Extensible Markup Language for Discourse Annotation". It's a system of concepts, data formats and tools for the computer assisted transcription and annotation of spoken language, and the analysis of spoken language corpora. This project's source code has moved to https://github.com/Exmaralda-Org/exmaralda

Downloads: 0 This Week

Last Update: 2020-05-05
See Project
13

TextTools

TextTools is a freeware corpus linguistics tool developed in Python to aid in research. This program analyzes user-created corpora and displays information about word (token) frequency, n-grams, clusters, collocations, keyword in context (KWIC), and keyness. TextTools is designed to be user-friendly and intuitive and will run natively on Mac OS X.

Downloads: 0 This Week

Last Update: 2014-09-28
See Project
14

Aelius Brazilian Portuguese POS-Tagger

Python, NLTK-based package for shallow parsing of Brazilian Portuguese

...It also includes language resources such as language models, sample texts, and gold standards. Presently, Aelius already offers facilities for POS-tagging and chunking corpora and outputting annotations in different formats, such as in XML in the TEI P5 encoding scheme.

1 Review

Downloads: 0 This Week

Last Update: 2014-11-03
See Project
15

Khawas

An Arabic Corpora Processing Tool

The new version is available at https://sourceforge.net/projects/ghawwasv4/

Downloads: 0 This Week

Last Update: 2014-08-02
See Project
16

Knowtator

Knowtator is a general-purpose text annotation tool that is integrated with the Protégé knowledge representation system. Knowtator facilitates the manual creation of training and evaluation corpora for a variety of biomedical language processing tasks.

Downloads: 0 This Week

Last Update: 2013-11-08
See Project
17

Donatus Parsing Tools for Portuguese

Donatus is an on-going project consisting of Python, NLTK-based tools and grammars for deep parsing and syntactical annotation of Brazilian Portuguese corpora. It includes a user-friendly graphical user interface for building syntactic parsers with the NLTK, providing some additional functionalities.

Downloads: 0 This Week

Last Update: 2016-08-28
See Project
18

Hermes Natural Language Processing

A repository of software, documentation and data for NLP

Hermes is a repository of software, documentation and data for NLP. I am currently adding corpora extracted from Wikipedia (mostrly in Romance languages).

Downloads: 0 This Week

Last Update: 2013-04-26
See Project
19

Uplug corpus tools

Various tools for creating annotated parallel corpora including pre-trained tagging and parsing models for various languages, sentence alignment tools and word alignment tools. Uplug also includes a web-based interface for interactive sentence and word alignment and scripts for indexing and querying parallel corpora using the Corpus Work Bench CWB. Download 'uplug-main' first and then add other packages.

Downloads: 0 This Week

Last Update: 2013-04-29
See Project
20

Gargantua

Fast Unsupervised Sentence Aligner described in "Improved unsupervised sentence alignment for symmetrical and asymmetrical parallel corpora", COLING 2010. NEWS : release 1.0b : bug fixed (release1.0a deprecated).

1 Review

Downloads: 1 This Week

Last Update: 2015-10-24
See Project
21

CorpSe

CORPSE (CORPus SEarch) is a powerful search engine written in Java. The aim is to provide an efficient implementation of a word level inverted index search with various cool functions that can be used on very large corpora.

1 Review

Downloads: 0 This Week

Last Update: 2013-04-26
See Project
22

CorpusReader

Enrich and query corpora in the TEI-XML vocabulary. CorpusReader manage very large corpora and corpora containing milestone annotation. It provides tools for enriching corpora with output of linguistic parsers, and for extracting quantitative information

Downloads: 0 This Week

Last Update: 2013-04-18
See Project
23

NooJ - linguistic engineering developmen

NooJ is used by linguists to describe linguistic phenomena and apply the formalized morphological, syntactic or semantic rules to corpora . It is used by non linguists in fields like psychology, sociology, history, literature studies as well.

Downloads: 0 This Week

Last Update: 2013-04-22
See Project
24

The NITE XML Toolkit

The NITE XML Toolkit supports the creation, analysis, and browsing of annotated multimodal, text, or spoken language corpora, and represents both timing and rich linguistic structure. It contains libraries for developers and some end user tools.

Downloads: 1 This Week

Last Update: 2013-04-22
See Project
25

AmiGram

AmiGram is the AMI Graphical Representation and Annotation Module. It is a general-purpose tool for multimodal corpus annotation and allows the time line based annoation of NXT corpora in a layer based environment.

Downloads: 1 This Week

Last Update: 2013-03-08
See Project

Previous
You're on page 1
2
Next

Search Results for "corpora"

Showing 26 open source projects for "corpora"

IMS Open Corpus Workbench

TXM

Linguistic Analyzer

UnsupervisedMT

@Note2

Ghawwas_V4

BioNLP-Corpora

BioC

diasim

Cross-Language Computational Linguistics

mwetoolkit

EXMARaLDA

TextTools

Aelius Brazilian Portuguese POS-Tagger

Khawas

Knowtator

Donatus Parsing Tools for Portuguese

Hermes Natural Language Processing

Uplug corpus tools

Gargantua

CorpSe

CorpusReader

NooJ - linguistic engineering developmen

The NITE XML Toolkit

AmiGram

Search Results for "corpora"

Showing 26 open source projects for "corpora"

IMS Open Corpus Workbench

TXM

Linguistic Analyzer

UnsupervisedMT

@Note2

Ghawwas_V4

BioNLP-Corpora

BioC

diasim

Cross-Language Computational Linguistics

mwetoolkit

EXMARaLDA

TextTools

Aelius Brazilian Portuguese POS-Tagger

Khawas

Knowtator

Donatus Parsing Tools for Portuguese

Hermes Natural Language Processing

Uplug corpus tools

Gargantua

CorpSe

CorpusReader

NooJ - linguistic engineering developmen

The NITE XML Toolkit

AmiGram

Related Searches

Related Categories