The Arabic Corpus {compiled by Dr. Mourad Abbas ( http://sites.google.com/site/mouradabbas9/corpora ) The corpus Khaleej-2004 contains 5690 documents. It is divided to 4 topics (categories). The corpus Watan-2004 contains 20291 documents organized in 6 topics (categories). Researchers who use these two corpora would mention the two main references:
(1) For Watan-2004 corpus
----------------------
M. Abbas, K. Smaili, D. Berkani, (2011) Evaluation of Topic Identification Methods on Arabic Corpora,JOURNAL OF DIGITAL INFORMATION MANAGEMENT,vol. 9, N. 5, pp.185-192.

2) For Khaleej-2004 corpus
---------------------------------
M. Abbas, K. Smaili (2005) Comparison of Topic Identification Methods for Arabic Language, RANLP05 : Recent Advances in Natural Language Processing ,pp. 14-17, 21-23 september 2005, Borovets, Bulgary.

More useful references to check:
-------------------------------------------
https://sites.google.com/site/mouradabbas9/corpora

Project Activity

See All Activity >

License

GNU General Public License version 2.0 (GPLv2)

Follow Arabic Corpus

Arabic Corpus Web Site

You Might Also Like
Cloud data warehouse to power your data-driven innovation Icon
Cloud data warehouse to power your data-driven innovation

BigQuery is a serverless and cost-effective enterprise data warehouse that works across clouds and scales with your data.

BigQuery Studio provides a single, unified interface for all data practitioners of various coding skills to simplify analytics workflows from data ingestion and preparation to data exploration and visualization to ML model creation and use. It also allows you to use simple SQL to access Vertex AI foundational models directly inside BigQuery for text processing tasks, such as sentiment analysis, entity extraction, and many more without having to deal with specialized models.
Rate This Project
Login To Rate This Project

User Reviews

Be the first to post a review of Arabic Corpus!

Additional Project Details

Operating Systems

Cygwin, Linux, Windows

Languages

French, Dutch, English, Arabic

Intended Audience

Information Technology, Science/Research, Advanced End Users, Developers, Quality Engineers, Engineering

User Interface

Win32 (MS Windows), KDE

Programming Language

Python, C++, JavaScript

Database Environment

MySQL

Related Categories

Python Machine Translation Software, Python Machine Learning Software, Python Natural Language Processing (NLP) Tool, C++ Machine Translation Software, C++ Machine Learning Software, C++ Natural Language Processing (NLP) Tool, JavaScript Machine Translation Software, JavaScript Machine Learning Software, JavaScript Natural Language Processing (NLP) Tool

Registered

2010-10-30