The Arabic Corpus {compiled by Dr. Mourad Abbas ( http://sites.google.com/site/mouradabbas9/corpora ) The corpus Khaleej-2004 contains 5690 documents. It is divided to 4 topics (categories). The corpus Watan-2004 contains 20291 documents organized in 6 topics (categories). Researchers who use these two corpora would mention the two main references:
(1) For Watan-2004 corpus
----------------------
M. Abbas, K. Smaili, D. Berkani, (2011) Evaluation of Topic Identification Methods on Arabic Corpora,JOURNAL OF DIGITAL INFORMATION MANAGEMENT,vol. 9, N. 5, pp.185-192.

2) For Khaleej-2004 corpus
---------------------------------
M. Abbas, K. Smaili (2005) Comparison of Topic Identification Methods for Arabic Language, RANLP05 : Recent Advances in Natural Language Processing ,pp. 14-17, 21-23 september 2005, Borovets, Bulgary.

More useful references to check:
-------------------------------------------
https://sites.google.com/site/mouradabbas9/corpora

Project Activity

See All Activity >

License

GNU General Public License version 2.0 (GPLv2)

Follow Arabic Corpus

Arabic Corpus Web Site

You Might Also Like
Our Free Plans just got better! | Auth0 by Okta Icon
Our Free Plans just got better! | Auth0 by Okta

With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your secuirty. Auth0 now, thank yourself later.
Try free now
Rate This Project
Login To Rate This Project

User Reviews

Be the first to post a review of Arabic Corpus!

Additional Project Details

Operating Systems

Cygwin, Linux, Windows

Languages

French, Dutch, English, Arabic

Intended Audience

Information Technology, Science/Research, Advanced End Users, Developers, Quality Engineers, Engineering

User Interface

Win32 (MS Windows), KDE

Programming Language

Python, C++, JavaScript

Database Environment

MySQL

Related Categories

Python Machine Translation Software, Python Machine Learning Software, Python Natural Language Processing (NLP) Tool, C++ Machine Translation Software, C++ Machine Learning Software, C++ Natural Language Processing (NLP) Tool, JavaScript Machine Translation Software, JavaScript Machine Learning Software, JavaScript Natural Language Processing (NLP) Tool

Registered

2010-10-30