http free download - SourceForge

find-similar

User-friendly library to find similar objects

...Whether dealing with texts, images, audio, or more, our project aims to simplify the process of identifying similarities and enhancing decision-making. https://github.com/findsimilar/find-similar - GitHub repo http://demo.findsimilar.org/ - Demo project and tutorial https://docs.findsimilar.org/ - Documentation

1 Review

Downloads: 5 This Week

Last Update: 2023-11-12

See Project

CC-Net

Tools to download and cleanup Common Crawl data

...The outputs are intended for pretraining language models and for creating standardized corpora that can be reproduced or updated with new crawls. The repository documents practical concerns like HTTP failures, snapshot differences, and stats JSONs, reflecting community use across many languages. While powerful, the repo has been archived and is read-only, so users should expect to run it as-is or fork for maintenance. Even in archived state, issues and releases pages remain useful references for implementation details and dataset lineage.

Downloads: 0 This Week

Last Update: 2025-10-11

See Project

Arabic Corpus

Text categorization, arabic language processing, language modeling

The Arabic Corpus {compiled by Dr. Mourad Abbas ( http://sites.google.com/site/mouradabbas9/corpora ) The corpus Khaleej-2004 contains 5690 documents. It is divided to 4 topics (categories). The corpus Watan-2004 contains 20291 documents organized in 6 topics (categories). Researchers who use these two corpora would mention the two main references: (1) For Watan-2004 corpus ---------------------- M.

Downloads: 3 This Week

Last Update: 2019-03-05

See Project

TEES

Turku Event Extraction System

Turku Event Extraction System (TEES) is a free and open source natural language processing system developed for the extraction of events and relations from biomedical text. It is written mostly in Python, and should work in generic Unix/Linux environments. Currently, the TEES source code repository still remains on GitHub at http://jbjorne.github.com/TEES/ where there is also a wiki with more information.

Downloads: 0 This Week

Last Update: 2017-05-23

See Project

Search Results for "http"

Showing 4 open source projects for "http"

find-similar

CC-Net

Arabic Corpus

TEES

Search Results for "http"

Showing 4 open source projects for "http"

find-similar

CC-Net

Arabic Corpus

TEES

Related Searches

Related Categories