Page 6 | corpus free download

Showing 170 open source projects for "corpus"

View related business solutions

Our Free Plans just got better! | Auth0 by Okta
With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your secuirty. Auth0 now, thank yourself later.

Try free now
Bright Data - All in One Platform for Proxies and Web Scraping
Say goodbye to blocks, restrictions, and CAPTCHAs

Bright Data offers the highest quality proxies with automated session management, IP rotation, and advanced web unlocking technology. Enjoy reliable, fast performance with easy integration, a user-friendly dashboard, and enterprise-grade scaling. Powered by ethically-sourced residential IPs for seamless web scraping.

Get Started
1

speech corpus collector

use to collect speech corpus speech recognition task like sphinx .

Downloads: 0 This Week

Last Update: 2013-04-10
See Project
2

VoiceScribe

VoiceScribe is a simple highlighting editor. Its purpose is to faciliate the task of creating and correcting transcripts for inclusion in the Vienna Oxford International Corpus of English (VOICE).

Downloads: 1 This Week

Last Update: 2014-04-15
See Project
3

arabicwordcorpus

An Arabic word Corpus, which contains a huge list of words, starting by 1.5 million words, usefull for naturel language processing.

Downloads: 0 This Week

Last Update: 2013-05-30
See Project
4

HCI_CSC8570

Supporting software for a school research paper to analyze a corpus for letter frequency and word properties.

Downloads: 0 This Week

Last Update: 2013-04-19
See Project
Create and run cloud-based virtual machines.
Secure and customizable compute service that lets you create and run virtual machines on Google’s infrastructure.

Computing infrastructure in predefined or custom machine sizes to accelerate your cloud transformation. General purpose (E2, N1, N2, N2D) machines provide a good balance of price and performance. Compute optimized (C2) machines offer high-end vCPU performance for compute-intensive workloads. Memory optimized (M2) machines offer the highest memory and are great for in-memory databases. Accelerator optimized (A2) machines are based on the A100 GPU, for very demanding applications.

Try for free
5

ClipSyll

Clipsyll is a collection of scripts and programs for dowloading, codifying, analysing (using NLTK) CLIPS, the largest Italian corpus of spoken language. It includes a syllabification module based on the SSP: http://sourceforge.net/projects/silly

Downloads: 0 This Week

Last Update: 2013-04-02
See Project
6

MorphoParser

Unsupervised non-language specific morphological parser based on compression and precedence relations between morphemes. Can be run on a Unicode corpus and will output a lexicon of proposed morphemes in the language.

Downloads: 0 This Week

Last Update: 2016-11-17
See Project
7

Cunei Machine Translation Platform

Cunei is a data-driven machine translation system that builds dynamic, statistical models based on instances of known translations found in a corpus.

1 Review

Downloads: 0 This Week

Last Update: 2013-06-05
See Project
8

GHIRL

GHIRL is the Graph-based Heterogeneous Information Representation Language: a java library for representing, querying, and navigating graph- or network-based data structures.

Downloads: 0 This Week

Last Update: 2013-04-03
See Project
9

CorpusWeb

This project is realized to pass our two years Degree in Computer Science of Orleans (France). The aim of this project is to save web pages to create a corpus of web document. The research is done with key words,language, website,search enginer...

Downloads: 0 This Week

Last Update: 2015-05-20
See Project
A new approach to fast data transfer | IBM Aspera
For organizations interested in a file transfer and streaming solution

IBM Aspera takes a different approach to tackling the challenges of big data movement over global WANs. Rather than optimize or accelerate data transfer, Aspera eliminates underlying bottlenecks by using a breakthrough transport technology that fully utilizes available network bandwidth to maximize speed and quickly scale up with no theoretical limit.

Learn More
10

Sanchay

Sanchay is a collection of tools and APIs for language researchers. It has some implementations of NLP algorithms, some flexible APIs, several user friendly annotation interfaces and Sanchay Query Language for language resources.

Downloads: 0 This Week

Last Update: 2013-04-11
See Project
11

JavaWAC

Web-as-corpus tools in Java. * Simple Crawler (and also integration with Nutch and Heritrix) * HTML cleaner to remove boiler plate code * Language recognition * Corpus builder

Downloads: 0 This Week

Last Update: 2013-04-19
See Project
12

processEuroparl

A set of tools, ready to process the Europarl corpus as published by statmt.org (v3).

Downloads: 0 This Week

Last Update: 2013-04-23
See Project
13

LookIng4LO

This proyect presents a system, which, from a corpus of documents, extracts information about a theme area, and a pedagogical components collection. This information is packed into fine granularity learning objects (metadata included).

Downloads: 0 This Week

Last Update: 2013-04-08
See Project
14

Get 1T

Get1T is a tool for filtering through the massive quantity of data available in the Web 1T corpus and extracting only the counts you need - including for simple wildcard patterns.

Downloads: 0 This Week

Last Update: 2013-04-19
See Project
15

Top Ranked Phrases in a Corpus

This project is supposed to list the Top R ranked terms that are of between M and N length. It is designed to extract these phrases from a given corpus in a input folder.

Downloads: 0 This Week

Last Update: 2013-03-22
See Project
16

Samudra-Manthan

Samudra Manthan uses C and MPI for finding interesting n-grams(terms) in a large corpus of data. We use the GigaWord corpus to find top m interesting n-grams using TF*IDF measure.

Downloads: 0 This Week

Last Update: 2013-03-22
See Project
17

cl-cc-bnc

cl-cc-bnc provides a frontend to learners of English language. You can enter an URI, which will be analyzed word-frequency-wise and compared to word frequencies in the British National Corpus.

Downloads: 0 This Week

Last Update: 2014-05-05
See Project
18

wikipedia2XML

A collection of python scripts to create and handle an XML corpus (a large collection of text for linguistic purpose) from an original Wikipedia database backup dump. It includes a regular expression based parser for the MediaWiki markup language.

Downloads: 0 This Week

Last Update: 2013-04-05
See Project
19

Palm TaCo

TaCo is a tasty Palm application that enables you to use the Tanaka Corpus on your handheld. The Tanaka Corpus is a collection of Japanese/English sentence pairs that a student of Japanese language can use as a source of example sentences.

Downloads: 0 This Week

Last Update: 2018-01-22
See Project
20

Bi-gram based Applications

Bi-gram applications based on language models produced by SRILM from Chinese Wikipedia corpus, include Chinese word segmenter, word-based (not character-based) Traditional-Simplified Chinese converter and Chinese syllable-to-word converter.

Downloads: 0 This Week

Last Update: 2013-04-18
See Project
21

BabyTALK

BabyTALK is to add another brick in the wall of natural languages learning. The baby needs to structure a corpus of texts when his tutor points and talks about a particular part of the corpus. The baby is also to describe any selected part of the corpus.

Downloads: 0 This Week

Last Update: 2016-08-22
See Project
22

RocketReader Readability

A fast way to rate the reading challenging level of book or text. Unlike well known reading metrics such as Fog, Kincaid, SMOG, ARI, Flesch, and Coleman-Liau readability this metric takes into account far more factors and is standarized against a corpus

Downloads: 0 This Week

Last Update: 2015-08-03
See Project
23

CRFChunker: CRF English Phrase Chunker

CRFChunker: Conditional Random Fields Phrase Chunker (Phrase Chunking Tool) for English. The model was trained on sections 01..24 of WSJ corpus and using section 00 as the development test set (F1-score of 95.77). Chunking speed: 700 sentences/s

Downloads: 0 This Week

Last Update: 2013-03-11
See Project
24

CRFTagger: CRF English POS Tagger

CRFTagger: Conditional Random Fields Part-of-Speech (POS) Tagger for English. The model was trained on sections 01..24 of WSJ corpus and using section 00 as the development test set (accuracy of 97.00%). Tagging speed: 500 sentences/s.

Downloads: 0 This Week

Last Update: 2013-03-25
See Project
25

Stem-Les

Stem-Les (Lexicon Extraction Suite) extracts lexical chunks that are relevant in a corpus of documents. If the corpus is bilingual, Stem-Les also finds translation equivalents for the lexical solution selected by the user.

Downloads: 0 This Week

Last Update: 2013-03-20
See Project

Previous
2
3
4
5
You're on page 6
7
Next

Search Results for "corpus" - Page 6

Showing 170 open source projects for "corpus"

speech corpus collector

VoiceScribe

arabicwordcorpus

HCI_CSC8570

ClipSyll

MorphoParser

Cunei Machine Translation Platform

GHIRL

CorpusWeb

Sanchay

JavaWAC

processEuroparl

LookIng4LO

Get 1T

Top Ranked Phrases in a Corpus

Samudra-Manthan

cl-cc-bnc

wikipedia2XML

Palm TaCo

Bi-gram based Applications

BabyTALK

RocketReader Readability

CRFChunker: CRF English Phrase Chunker

CRFTagger: CRF English POS Tagger

Stem-Les

Search Results for "corpus" - Page 6

Showing 170 open source projects for "corpus"

speech corpus collector

VoiceScribe

arabicwordcorpus

HCI_CSC8570

ClipSyll

MorphoParser

Cunei Machine Translation Platform

GHIRL

CorpusWeb

Sanchay

JavaWAC

processEuroparl

LookIng4LO

Get 1T

Top Ranked Phrases in a Corpus

Samudra-Manthan

cl-cc-bnc

wikipedia2XML

Palm TaCo

Bi-gram based Applications

BabyTALK

RocketReader Readability

CRFChunker: CRF English Phrase Chunker

CRFTagger: CRF English POS Tagger

Stem-Les

Related Searches

Related Categories