The Arabic Corpus {compiled by Dr. Mourad Abbas ( http://sites.google.com/site/mouradabbas9/corpora ) The corpus Khaleej-2004 contains 5690 documents. It is divided to 4 topics (categories). The corpus Watan-2004 contains 20291 documents organized in 6 topics (categories). Researchers who use these two corpora would mention the two main references:
(1) For Watan-2004 corpus
----------------------
M. Abbas, K. Smaili, D. Berkani, (2011) Evaluation of Topic Identification Methods on Arabic Corpora,JOURNAL OF DIGITAL INFORMATION MANAGEMENT,vol. 9, N. 5, pp.185-192.

2) For Khaleej-2004 corpus
---------------------------------
M. Abbas, K. Smaili (2005) Comparison of Topic Identification Methods for Arabic Language, RANLP05 : Recent Advances in Natural Language Processing ,pp. 14-17, 21-23 september 2005, Borovets, Bulgary.

More useful references to check:
-------------------------------------------
https://sites.google.com/site/mouradabbas9/corpora

Project Activity

See All Activity >

License

GNU General Public License version 2.0 (GPLv2)

Follow Arabic Corpus

Arabic Corpus Web Site

Other Useful Business Software
$300 Free Credits for Your Google Cloud Projects Icon
$300 Free Credits for Your Google Cloud Projects

Start building on Google Cloud with $300 in free credits. No commitment, no credit card required until you're ready to scale.

Launch your next project with $300 in free Google Cloud credits—no strings attached. Test, build, and deploy without risk. Use your credits across the entire Google Cloud platform to find what works best for your needs. After your credits are used, continue with always-free tier services. Only pay when you're ready to scale. Sign up in minutes and start exploring.
Start Free Trial
Rate This Project
Login To Rate This Project

User Reviews

Be the first to post a review of Arabic Corpus!

Additional Project Details

Operating Systems

Cygwin, Linux, Windows

Languages

Arabic, Dutch, English, French

Intended Audience

Advanced End Users, Developers, Engineering, Information Technology, Quality Engineers, Science/Research

User Interface

KDE, Win32 (MS Windows)

Programming Language

C++, JavaScript, Python

Database Environment

MySQL

Related Categories

Python Machine Translation Software, Python Machine Learning Software, Python Natural Language Processing (NLP) Tool, C++ Machine Translation Software, C++ Machine Learning Software, C++ Natural Language Processing (NLP) Tool, JavaScript Machine Translation Software, JavaScript Machine Learning Software, JavaScript Natural Language Processing (NLP) Tool

Registered

2010-10-30