Arabic Corpus Icon

Arabic Corpus


Text categorization, arabic language processing, language modeling

Add a Review
26 Downloads (This Week)
Last Update:
Download watan-2004.7z
Browse All Files
Windows Linux


The Arabic Corpus {compiled by Dr. Mourad Abbas ( ) The corpus Khaleej-2004 contains 5690 documents. It is divided to 4 topics (categories). The corpus Watan-2004 contains 20291 documents organized in 6 topics (categories). Researchers who use these two corpora would mention the two main references:
(1) For Watan-2004 corpus
M. Abbas, K. Smaili, D. Berkani, (2011) Evaluation of Topic Identification Methods on Arabic Corpora,JOURNAL OF DIGITAL INFORMATION MANAGEMENT,vol. 9, N. 5, pp.185-192.

2) For Khaleej-2004 corpus
M. Abbas, K. Smaili (2005) Comparison of Topic Identification Methods for Arabic Language, RANLP05 : Recent Advances in Natural Language Processing ,pp. 14-17, 21-23 september 2005, Borovets, Bulgary.

More useful references to check:

Arabic Corpus Web Site


Write a Review

User Reviews

Be the first to post a review of Arabic Corpus!

Additional Project Details


French, Dutch, English, Arabic

Intended Audience

Information Technology, Science/Research, Advanced End Users, Developers, Quality Engineers, Engineering

User Interface

Win32 (MS Windows), KDE

Programming Language

Python, C++, JavaScript



Thanks for helping keep SourceForge clean.

Screenshot instructions:
Red Hat Linux   Ubuntu

Click URL instructions:
Right-click on ad, choose "Copy Link", then paste here →
(This may not be possible with some types of ads)

More information about our ad policies

Briefly describe the problem (required):

Upload screenshot of ad (required):
Select a file, or drag & drop file here.

Please provide the ad click URL, if possible:

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:

No, thanks