PADIC (Parallel Arabic DIalectal Corpus) is a multi-dialectal corpus built in the framework of the National Research Project "TORJMAN", led by Scientific and Technical Research Center for the Development of Arabic Language and funded by the Algerian Ministry of Higher Education and Scientific Research.
PADIC is composed of 6 dialects: two Algerian dialects (Algiers and Annaba cities), Palestinian, Syrian, Tunisian, Moroccan) and MSA.
Mourad Abbas
Computational Linguistics Department, crstdla
https://sites.google.com/site/mouradabbas9
Publications
-----------------
K. Meftouh, S. Harrat, S. Jamoussi, M. Abbas, K. Smaïli, Machine Translation Experiments on PADIC: A Parallel Arabic DIalect Corpus, The 29th Pacific Asia Conference on Language, Information and Computation, PACLIC 2015, Shanghai, 2015.
TORJMAN website:
-------------------------
https://sites.google.com/site/torjmanepnr/6-corpus
Features
- XML
- Buckwalter
- 5 Arabic dialects + Modern Standard Arabic
- More than 6000 sentences