Open Source tool for Arabic text readability
The tool is a Java open source to calculate readability for Arabic text with and without diacritics (Tashkeel).
The tool works better with diacritics added in (we provide a method to allow you add diacritics to plain Arabic text).
Class TestOSMAN shows how to measure OSMAN readability for text with and without diacritics.
Method calculateOsman(String text) can be called using an instance from the class OsmanReadability.
users can also add and remove diacritics using addTashkeel(String text) and removeTashkeel(String text).
The tool allows you to calculate other readability metrics such as ARI and LIX.
When using Eclipse (or other editors) make sure you set the encoding to UTF-8 for the console output (Run configuration -> Common Tab --> Other Encoding)
You can click on Download Zip button on the right hand side.
Or you can navigate to Osman_UN_Corpus navigate to each folder and download the dataset zip files one by one. To download any of the zip files click on the zip file then click on "View Raw".
Have a question? Get in touch with us on: dr.melhaj@gmail.com