corpora free download

Showing 16 open source projects for "corpora"

View related business solutions

Java Clear Filters & Widen Search

$300 Free Credits for Your Google Cloud Projects
Start building on Google Cloud with $300 in free credits. No commitment, no credit card required until you're ready to scale.

Launch your next project with $300 in free Google Cloud credits—no strings attached. Test, build, and deploy without risk. Use your credits across the entire Google Cloud platform to find what works best for your needs. After your credits are used, continue with always-free tier services. Only pay when you're ready to scale. Sign up in minutes and start exploring.

Start Free Trial
Train ML Models With SQL You Already Know
BigQuery automates data prep, analysis, and predictions with built-in AI assistance.

Build and deploy ML models using familiar SQL. Automate data prep with built-in Gemini. Query 1 TB and store 10 GB free monthly.

Try Free
1

TXM

Unicode XML TEI text analysis platform

TXM is a free and open-source cross-platform Unicode & XML based text analysis environment and graphical client, supporting Windows, Linux and Mac OS X. It can also be used online as a J2EE standard compliant web portal (GWT based) with access control built in. DOWNLOAD LATEST VERSION OF TXM : http://textometrie.ens-lyon.fr/spip.php?rubrique61&lang=en TXM offers a comprehensive range of analysis tools (concordances, collocate search, frequency lists, etc.) based on the powerfull CQP...

Downloads: 14 This Week

Last Update: 2024-12-09
See Project
2

JoBimText

Linking Language to Knowledge with Distributional Semantics

JobimText is a software solution for automatic text expansion using contextualized distributional similarity. It provides text analysis tools for large corpora and has capabilities to create distributional semantic models (JoBimText models) and multi-word expressions.

Downloads: 0 This Week

Last Update: 2022-08-04
See Project
3

@Note2

@Note2 - A workbench for Biomedical Text Mining

Biomedical Text Mining (BioTM) is providing valuable approaches to the automated curation of scientific literature.

1 Review

Downloads: 2 This Week

Last Update: 2019-05-13
See Project
4

Ghawwas_V4

An open source system for Arabic corpora processing

Ghawwas (previously known as Khawas) is an open source system for Arabic corpora processing. Ghawwas V4.0 provides the following main functions: a. Frequency list for single word and N-Grams b. Concordance c. Collocation (MI, CHI Squared, LL, T-Score, Z Score, Dice, Log Dice, Weirdness Coefficient) d. Lexical patterns search e. Two corpora frequency profile comparison based on MI, CHI, LL, T-Score, Z Score, Dice, Log Dice, Weirdness Coefficient f.

1 Review

Downloads: 8 This Week

Last Update: 2018-12-09
See Project
Secure File Transfer for Windows with Cerberus by Redwood
Protect and share files over FTP/S, SFTP, HTTPS and SCP with the #1 rated Windows file transfer server.

Cerberus supports unlimited users and connections on a single IP, with built-in encryption, 2FA, and a browser-based web client — all deployable in under 15 minutes with a 25-day free trial.

Try for Free
5

BioC

We describe a simple XML format to share text documents and annotation

...Allows a large number of different annotations to be represented. Project files contain: - simple code to hold/read/write data and perform sample processing. - BioC-formatted corpora - BioC tools that work with BioC corpora BioC goals - simplicity - interoperability - broad use - reuse There should be little investment required to learn to use a format or a software module to process that format. We are interested in reuse, and we focus on common NLP tasks that are broadly useful for textmining.

Downloads: 37 This Week

Last Update: 2016-08-08
See Project
6

diasim

Dialogue Similarity

Tools for calculating similarity (including lexical and syntactic) between speakers in dialogue, across standard and randomised corpora.

Downloads: 0 This Week

Last Update: 2016-03-31
See Project
7

WebCorpus

Hadoop framework for scalable processing of large web corpora

WebCorpus is a Hadoop-based framework that enables you to calculate statistics on large web corpora extracted from web crawls.

Downloads: 0 This Week

Last Update: 2016-11-09
See Project
8

EXMARaLDA

EXMARaLDA stands for "Extensible Markup Language for Discourse Annotation". It's a system of concepts, data formats and tools for the computer assisted transcription and annotation of spoken language, and the analysis of spoken language corpora. This project's source code has moved to https://github.com/Exmaralda-Org/exmaralda

Downloads: 0 This Week

Last Update: 2020-05-05
See Project
9

Khawas

An Arabic Corpora Processing Tool

The new version is available at https://sourceforge.net/projects/ghawwasv4/

Downloads: 1 This Week

Last Update: 2014-08-02
See Project
Our Free Plans just got better! | Auth0
With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.

Try free now
10

Pacx

Platform for Annotated Corpora in XML Integrated tool for corpus linguists built on Eclipse, Vex, Subversive, etc. for creating and editing transcriptions and annotations, querying, managing version controlled data, and building a shippable corpus.

Downloads: 2 This Week

Last Update: 2014-03-15
See Project
11

Knowtator

Knowtator is a general-purpose text annotation tool that is integrated with the Protégé knowledge representation system. Knowtator facilitates the manual creation of training and evaluation corpora for a variety of biomedical language processing tasks.

Downloads: 1 This Week

Last Update: 2013-11-08
See Project
12

Poliqarp

A universal suite of utilities for large corpora processing.

Downloads: 0 This Week

Last Update: 2013-05-22
See Project
13

CorpSe

CORPSE (CORPus SEarch) is a powerful search engine written in Java. The aim is to provide an efficient implementation of a word level inverted index search with various cool functions that can be used on very large corpora.

1 Review

Downloads: 2 This Week

Last Update: 2013-04-26
See Project
14

CorpusReader

Enrich and query corpora in the TEI-XML vocabulary. CorpusReader manage very large corpora and corpora containing milestone annotation. It provides tools for enriching corpora with output of linguistic parsers, and for extracting quantitative information

Downloads: 0 This Week

Last Update: 2013-04-18
See Project
15

The NITE XML Toolkit

The NITE XML Toolkit supports the creation, analysis, and browsing of annotated multimodal, text, or spoken language corpora, and represents both timing and rich linguistic structure. It contains libraries for developers and some end user tools.

Downloads: 0 This Week

Last Update: 2013-04-22
See Project
16

AmiGram

AmiGram is the AMI Graphical Representation and Annotation Module. It is a general-purpose tool for multimodal corpus annotation and allows the time line based annoation of NXT corpora in a layer based environment.

Downloads: 0 This Week

Last Update: 2013-03-08
See Project