The Arabic Corpus {compiled by Dr. Mourad Abbas ( http://sites.google.com/site/mouradabbas9/corpora ) The corpus Khaleej-2004 contains 5690 documents. It is divided to 4 topics (categories). The corpus Watan-2004 contains 20291 documents organized in 6 topics (categories). Researchers who use these two corpora would mention the two main references:
(1) For Watan-2004 corpus
----------------------
M. Abbas, K. Smaili, D. Berkani, (2011) Evaluation of Topic Identification Methods on Arabic Corpora,JOURNAL OF DIGITAL INFORMATION MANAGEMENT,vol. 9, N. 5, pp.185-192.

2) For Khaleej-2004 corpus
---------------------------------
M. Abbas, K. Smaili (2005) Comparison of Topic Identification Methods for Arabic Language, RANLP05 : Recent Advances in Natural Language Processing ,pp. 14-17, 21-23 september 2005, Borovets, Bulgary.

More useful references to check:
-------------------------------------------
https://sites.google.com/site/mouradabbas9/corpora

Project Activity

See All Activity >

License

GNU General Public License version 2.0 (GPLv2)

Follow Arabic Corpus

Arabic Corpus Web Site

Other Useful Business Software
Stay in Flow. Let Zenflow Handle the Heavy Lifting. Icon
Stay in Flow. Let Zenflow Handle the Heavy Lifting.

Your AI engineering control center. Zenflow turns specs into shipped features using parallel agents and multi-repo intelligence.

Zenflow is your engineering control center, turning specs into shipped features. Parallel agents handle coding, testing, and refactoring with real repo context. Multi-agent workflows remove bottlenecks and automate routine work so developers stay focused and in flow.
Try free now
Rate This Project
Login To Rate This Project

User Reviews

Be the first to post a review of Arabic Corpus!

Additional Project Details

Operating Systems

Cygwin, Linux, Windows

Languages

Arabic, Dutch, English, French

Intended Audience

Advanced End Users, Developers, Engineering, Information Technology, Quality Engineers, Science/Research

User Interface

KDE, Win32 (MS Windows)

Programming Language

C++, JavaScript, Python

Database Environment

MySQL

Related Categories

Python Machine Translation Software, Python Machine Learning Software, Python Natural Language Processing (NLP) Tool, C++ Machine Translation Software, C++ Machine Learning Software, C++ Natural Language Processing (NLP) Tool, JavaScript Machine Translation Software, JavaScript Machine Learning Software, JavaScript Natural Language Processing (NLP) Tool

Registered

2010-10-30