corpus free download - SourceForge

Showing 14 open source projects for "corpus"

View related business solutions

C Clear Filters & Widen Search

Our Free Plans just got better! | Auth0 by Okta
With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your secuirty. Auth0 now, thank yourself later.

Try free now
Bright Data - All in One Platform for Proxies and Web Scraping
Say goodbye to blocks, restrictions, and CAPTCHAs

Bright Data offers the highest quality proxies with automated session management, IP rotation, and advanced web unlocking technology. Enjoy reliable, fast performance with easy integration, a user-friendly dashboard, and enterprise-grade scaling. Powered by ethically-sourced residential IPs for seamless web scraping.

Get Started
1

TXM

Unicode-XML-TEI text/corpus analysis platform

TXM is a free and open-source cross-platform Unicode & XML based text/corpus analysis environment and graphical client, supporting Windows, Linux and Mac OS X. It can also be used online as a J2EE standard compliant web portal (GWT based) with access control built in. DOWNLOAD LATEST VERSION OF TXM : http://textometrie.ens-lyon.fr/spip.php?rubrique61&lang=en TXM offers a comprehensive range of analysis tools (concordances, collocate search, frequency lists, etc.) based on the powerfull...

Downloads: 21 This Week

Last Update: 2023-10-02
See Project
2

Web as Corpus

Software, information, data sets and documentation for the Web as Corpus community.

Downloads: 3 This Week

Last Update: 2021-04-29
See Project
3

concordia

Powerful search library, best suited for computer-aided translation

Concordia - Roman goddess of agreement. Concordance searcher - tool for translators who need their translations to "agree" with one standard. Concordia is a C++ library for fast text lookup in large corpora. It uses a RAM stored index, which takes up approximately 600MB of memory for a corpus of 2 million sentences. It is based on the idea of a suffix array, enhanced by the presence of other auxiliary data structures. The effects are stunning - Concordia is able to do simple substring...

Downloads: 0 This Week

Last Update: 2019-02-28
See Project
4

rcqp

R interface to the Corpus Query Protocol

Implements the Corpus Query Protocol as a package for the R statistical environment. It allows to query linguistic corpora and manipulate the data as native R objects. It is based on the CWB software.

Downloads: 0 This Week

Last Update: 2018-03-13
See Project
IBM Blueworks Live is a cloud-based business process modeling tool that helps you discover, map and document your processes.
It is easy to use, allowing you to learn and perform business process modeling in minutes.

With an intuitive, web-based interface, IBM Blueworks Live empowers teams to document, analyze and streamline processes with unprecedented ease and efficiency, with no downloads necessary. It's designed for dynamic collaboration, enabling stakeholders to connect, share insights and drive improvements in real-time, from anywhere.

Learn More
5

GloVe

GloVe model for distributed word representation

GloVe is an unsupervised learning algorithm for obtaining vector representations for words. Training is performed on aggregated global word-word co-occurrence statistics from a corpus, and the resulting representations showcase interesting linear substructures of the word vector space. The links provided contain word vectors obtained from the respective corpora. If you want word vectors trained on massive web datasets, you need only download one of these text files! Pre-trained word vectors...

Downloads: 0 This Week

Last Update: 2021-09-30
See Project
6

mwetoolkit

THIS PROJECT MIGRATED TO https://gitlab.com/mwetoolkit/mwetoolkit3/

THIS PROJECT MIGRATED TO https://gitlab.com/mwetoolkit/mwetoolkit3/ The Multiword Expressions toolkit aids in the automatic identification and extraction of multiword units in running text. These include idioms (kick the bucket), noun compounds (cable car), phrasal verbs (take off, give up), etc. Even though it focuses on multiword expresisons, the framework is quite complete and can also be useful in any corpus-based study in computational linguistics. The mwetoolkit can be applied...

1 Review

Downloads: 0 This Week

Last Update: 2019-05-01
See Project
7

Universal text extractor

This is a library to extract raw unicode text from any written documents (office documents such as PDF, Word, OpenOffice, ...). It should be useful to developpers of search engine, text processing, corpus analysis, ....

Downloads: 1 This Week

Last Update: 2014-06-09
See Project
8

Modular Suite of NLP Tools

This project aims to build a suite of Natural Language Processing tools. Modules will include corpus indexing and access tools, a part-of-speech tagger, tokenisers, text classification software, etc.

Downloads: 0 This Week

Last Update: 2014-06-09
See Project
9

HCI_CSC8570

Supporting software for a school research paper to analyze a corpus for letter frequency and word properties.

Downloads: 0 This Week

Last Update: 2013-04-19
See Project
Claims Processing solution for healthcare practitioners.
Very easy to use for medical, dental and therapy offices.

Speedy Claims became the top CMS-1500 Software by providing the best customer service imaginable to our thousands of clients all over America. Medical billing isn't the kind of thing most people get excited about - it is just a tedious task you have to do. But while it will never be a fun task, it doesn't have to be as difficult or time consumimg as it is now. With Speedy Claims CMS-1500 software you can get the job done quickly and easily, allowing you to focus on the things you love about your job, like helping patients. With a simple interface, powerful features to eliminate repetitive work, and unrivaled customer support, it's simply the best HCFA 1500 software available on the market. A powerful built-in error checking helps ensure your HCFA 1500 form is complete and correctly filled out, preventing CMS-1500 claims from being denied.

Learn More
10

Get 1T

Get1T is a tool for filtering through the massive quantity of data available in the Web 1T corpus and extracting only the counts you need - including for simple wildcard patterns.

Downloads: 0 This Week

Last Update: 2013-04-19
See Project
11

Top Ranked Phrases in a Corpus

This project is supposed to list the Top R ranked terms that are of between M and N length. It is designed to extract these phrases from a given corpus in a input folder.

Downloads: 0 This Week

Last Update: 2013-03-22
See Project
12

Samudra-Manthan

Samudra Manthan uses C and MPI for finding interesting n-grams(terms) in a large corpus of data. We use the GigaWord corpus to find top m interesting n-grams using TF*IDF measure.

Downloads: 0 This Week

Last Update: 2013-03-22
See Project
13

Palm TaCo

TaCo is a tasty Palm application that enables you to use the Tanaka Corpus on your handheld. The Tanaka Corpus is a collection of Japanese/English sentence pairs that a student of Japanese language can use as a source of example sentences.

Downloads: 0 This Week

Last Update: 2018-01-22
See Project
14

reputron

reputron is a knowledge extraction engine platform that covers all aspect of text mining, relevance, indexing and querying on a corpus of text documents.

Downloads: 0 This Week

Last Update: 2015-04-08
See Project

Previous
You're on page 1
Next

Search Results for "corpus"

Showing 14 open source projects for "corpus"

TXM

Web as Corpus

concordia

rcqp

GloVe

mwetoolkit

Universal text extractor

Modular Suite of NLP Tools

HCI_CSC8570

Get 1T

Top Ranked Phrases in a Corpus

Samudra-Manthan

Palm TaCo

reputron

Search Results for "corpus"

Showing 14 open source projects for "corpus"

TXM

Web as Corpus

concordia

rcqp

GloVe

mwetoolkit

Universal text extractor

Modular Suite of NLP Tools

HCI_CSC8570

Get 1T

Top Ranked Phrases in a Corpus

Samudra-Manthan

Palm TaCo

reputron

Related Searches

Related Categories