Page 4 | corpora free download

Showing 105 open source projects for "corpora"

View related business solutions

Linux Clear Filters & Widen Search

Ship Agents Faster
Transform your applications and workflows into powerful agentic systems at global scale.

Gemini Enterprise Agent Platform lets you rapidly build, scale, govern and optimize production-ready agents grounded in your organization's data. The platform enables developers to build custom or pre-built agents for virtually any use case. New customers get $300 in free credits.

Get Started Free
MongoDB Atlas runs apps anywhere
Deploy in 115+ regions with the modern database for every enterprise.

MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.

Start Free
1

Khawas

An Arabic Corpora Processing Tool

The new version is available at https://sourceforge.net/projects/ghawwasv4/

Downloads: 0 This Week

Last Update: 2014-08-02
See Project
2

Fine-grained Arabic Named Entity Corpora

Fine-grained Arabic Named Entity Corpora

...Those corpora have been manually annotated from the Arabic Wikipedia and Newswire sources respectively. B) Automatically-developed: 1) WikiFANE_Whole: All sentences of the Arabic Wikipedia articles were retrieved to compile to corpus. ~2M tokens. 2) WikiFANE_Selective: Sentences which have at least one NE phrase were retrieved to compile the corpus. ~2M tokens.

Downloads: 0 This Week

Last Update: 2014-06-12
See Project
3

WN-Toolkit

Creation of WordNets using the expand model

This toolkit is a set of Python programs for the creation or enlargement of WordNets using the expand model. Several methodologies are available: dictionary-based, Babelnet based as well as methodologies based on the use of parallel corpora.

Downloads: 0 This Week

Last Update: 2015-03-09
See Project
4

BioParallelCorporaExtractor

BioPCE: a tool to extract parallel corpora of biomedical texts

BioParallelCorporaExtractor (BioPCE) is Python tool to extract parallel corpora of biomedical texts. It's a joint work between Elise Prieur-Gaston, Antonio Jimeno Yepes and Aurélie Névéol. In the "Files" tab in this page, you can find the perl script used to web-crawl publisher data and a sample input file created for 5 MEDLINE citations. Each line in the input file should contain the PubMed identifier (PMID) and its Digital Object Idetifier (DOI) separated by the pipe symbol. ...

Downloads: 0 This Week

Last Update: 2014-01-22
See Project
Auth0 B2B Essentials: SSO, MFA, and RBAC Built In
Unlimited organizations, 3 enterprise SSO connections, role-based access control, and pro MFA included. Dev and prod tenants out of the box.

Auth0's B2B Essentials plan gives you everything you need to ship secure multi-tenant apps. Unlimited orgs, enterprise SSO, RBAC, audit log streaming, and higher auth and API limits included. Add on M2M tokens, enterprise MFA, or additional SSO connections as you scale.

Sign Up Free
5

Autshumato Text Anonymiser

Text anonymiser for the Autshumato project.

A tool for the anonymisation of text corpora which entails the identification of entities that may convey confidential information and replacing those entities with with randomly selected entities of the same type.

Downloads: 0 This Week

Last Update: 2018-04-24
See Project
6

RedLDA

Redundancy Aware LDA Gibbs Sampler

Redundancy-Aware Topic Modeling Copy Paste Redundancy or Data Duplication are prevalent in many corpora.This redundancy has a negative impact on the quality of text mining and topic modeling in particular. This is a software package of a novel variant of Latent Dirichlet Allocation (LDA) topic modeling, Red-LDA, which takes into account the inherent redundancy of corpora when modeling content. My site: http://www.cs.bgu.ac.il/~cohenrap/ Lab site: http://www.cs.bgu.ac.il/~nlpproj/ Sister project: http://sourceforge.net/projects/corpusredundanc/

Downloads: 0 This Week

Last Update: 2014-01-05
See Project
7

TextBlob

TextBlob is a Python library for processing textual data

...Also, it comes with a WordNet integration. If you only intend to use TextBlob’s default models (no model overrides), you can pass the lite argument. This downloads only those corpora needed for basic functionality. TextBlob is also available as a conda package.

Downloads: 0 This Week

Last Update: 2021-07-23
See Project
8

Knowtator

Knowtator is a general-purpose text annotation tool that is integrated with the Protégé knowledge representation system. Knowtator facilitates the manual creation of training and evaluation corpora for a variety of biomedical language processing tasks.

Downloads: 0 This Week

Last Update: 2013-11-08
See Project
9

Donatus Parsing Tools for Portuguese

Donatus is an on-going project consisting of Python, NLTK-based tools and grammars for deep parsing and syntactical annotation of Brazilian Portuguese corpora. It includes a user-friendly graphical user interface for building syntactic parsers with the NLTK, providing some additional functionalities.

Downloads: 0 This Week

Last Update: 2016-08-28
See Project
$300 Free Credits for Your Google Cloud Projects
Start building on Google Cloud with $300 in free credits. No commitment, no credit card required until you're ready to scale.

Launch your next project with $300 in free Google Cloud credits—no strings attached. Test, build, and deploy without risk. Use your credits across the entire Google Cloud platform to find what works best for your needs. After your credits are used, continue with always-free tier services. Only pay when you're ready to scale. Sign up in minutes and start exploring.

Start Free Trial
10

Hermes Natural Language Processing

A repository of software, documentation and data for NLP

Hermes is a repository of software, documentation and data for NLP. I am currently adding corpora extracted from Wikipedia (mostrly in Romance languages).

Downloads: 0 This Week

Last Update: 2013-04-26
See Project
11

Uplug corpus tools

Various tools for creating annotated parallel corpora including pre-trained tagging and parsing models for various languages, sentence alignment tools and word alignment tools. Uplug also includes a web-based interface for interactive sentence and word alignment and scripts for indexing and querying parallel corpora using the Corpus Work Bench CWB. Download 'uplug-main' first and then add other packages.

Downloads: 0 This Week

Last Update: 2013-04-29
See Project
12

Arabic Computational Linguistics

Arabic Computational Linguistics resources and Tools, Arabic Text Mining Tools, Arabic Language tools, Arabic Morphological Analysis (Stemming / Light Stemming), Arabic text preprocessing, Arabic Corpora, Open Source Arabic Corpora OSAC, Comparable Corpora. For more information: http://sites.google.com/site/motazsite

Downloads: 9 This Week

Last Update: 2014-05-09
See Project
13

Corpora of Misspellings

Corpora with misspellings marked

This is a project for creating corpora with misspellings marked and the correct word given. Example use could be for testing spell checkers.

1 Review

Downloads: 0 This Week

Last Update: 2012-07-26
See Project
14

Poliqarp

A universal suite of utilities for large corpora processing.

Downloads: 0 This Week

Last Update: 2013-05-22
See Project
15

Gargantua

Fast Unsupervised Sentence Aligner described in "Improved unsupervised sentence alignment for symmetrical and asymmetrical parallel corpora", COLING 2010. NEWS : release 1.0b : bug fixed (release1.0a deprecated).

1 Review

Downloads: 1 This Week

Last Update: 2015-10-24
See Project
16

Richextr

A tool for large richly annotated parallel corpora preprocessing and Moses phrase-table extraction.

Downloads: 0 This Week

Last Update: 2015-11-12
See Project
17

CorpSe

CORPSE (CORPus SEarch) is a powerful search engine written in Java. The aim is to provide an efficient implementation of a word level inverted index search with various cool functions that can be used on very large corpora.

1 Review

Downloads: 0 This Week

Last Update: 2013-04-26
See Project
18

MedTag - Annotated Corpora

A database of linguistic annotation of medical text (from MEDLINE), including corpora used with ABGene, BioCreative I and II, and the MedPost training corpus.

Downloads: 0 This Week

Last Update: 2014-02-05
See Project
19

CorpusReader

Enrich and query corpora in the TEI-XML vocabulary. CorpusReader manage very large corpora and corpora containing milestone annotation. It provides tools for enriching corpora with output of linguistic parsers, and for extracting quantitative information

Downloads: 0 This Week

Last Update: 2013-04-18
See Project
20

NooJ - linguistic engineering developmen

NooJ is used by linguists to describe linguistic phenomena and apply the formalized morphological, syntactic or semantic rules to corpora . It is used by non linguists in fields like psychology, sociology, history, literature studies as well.

Downloads: 0 This Week

Last Update: 2013-04-22
See Project
21

The NITE XML Toolkit

The NITE XML Toolkit supports the creation, analysis, and browsing of annotated multimodal, text, or spoken language corpora, and represents both timing and rich linguistic structure. It contains libraries for developers and some end user tools.

Downloads: 1 This Week

Last Update: 2013-04-22
See Project
22

CorporAl: a tool for overlapping corpora

CorporAl implements a method for processing overlapping corpora. The current version supports parallel corpora. It works by aligning the corresponding language parts and then aligning the alignments between themselves.

Downloads: 0 This Week

Last Update: 2013-04-18
See Project
23

Xaira

XAIRA (XML Aware Indexing and Retrieval Architecture) supports indexing and analysis of large XML textual resources such as natural language corpora.

1 Review

Downloads: 1 This Week

Last Update: 2013-05-13
See Project
24

xcorp

Tool for processing XML-annotated linguistic corpora

Downloads: 0 This Week

Last Update: 2015-05-22
See Project
25

NATools

Parallel Corpora tools.

Downloads: 0 This Week

Last Update: 2013-03-27
See Project

Previous
1
2
3
You're on page 4
5
Next

Search Results for "corpora" - Page 4

Showing 105 open source projects for "corpora"

Khawas

Fine-grained Arabic Named Entity Corpora

WN-Toolkit

BioParallelCorporaExtractor

Autshumato Text Anonymiser

RedLDA

TextBlob

Knowtator

Donatus Parsing Tools for Portuguese

Hermes Natural Language Processing

Uplug corpus tools

Arabic Computational Linguistics

Corpora of Misspellings

Poliqarp

Gargantua

Richextr

CorpSe

MedTag - Annotated Corpora

CorpusReader

NooJ - linguistic engineering developmen

The NITE XML Toolkit

CorporAl: a tool for overlapping corpora

Xaira

xcorp

NATools

Search Results for "corpora" - Page 4

Showing 105 open source projects for "corpora"

Khawas

Fine-grained Arabic Named Entity Corpora

WN-Toolkit

BioParallelCorporaExtractor

Autshumato Text Anonymiser

RedLDA

TextBlob

Knowtator

Donatus Parsing Tools for Portuguese

Hermes Natural Language Processing

Uplug corpus tools

Arabic Computational Linguistics

Corpora of Misspellings

Poliqarp

Gargantua

Richextr

CorpSe

MedTag - Annotated Corpora

CorpusReader

NooJ - linguistic engineering developmen

The NITE XML Toolkit

CorporAl: a tool for overlapping corpora

Xaira

xcorp

NATools

Related Searches

Related Categories