Looking for the latest version? Download arabicstopwords0.3.zip (255.8 kB)
Name Modified Size Downloads / Week Status
readme 2010-12-04 2.9 kB 0
arabicstopwords0.3.zip 2010-12-04 255.8 kB 2020 weekly downloads
arabicstopwords0.2.zip 2009-09-06 715.5 kB 33 weekly downloads
Totals: 3 Items   974.1 kB 23
#INSTALL ------------------ Arabic Stop words -------------------- - This list can be reused, It't not easy to detemine the stop words, and in other hand, stop words differs according to the case, for this purpos, we propose a classified list which can be parametered by developper The Word list contains only wonds in its commun forms, and we have generated all forms by a script. Files ------ data/ : contains data of stopwords data/classified/stopwords.cvs: the data file as csv data/classified/stopwords.xls: data in Excel fomat with more valuble informations, and classified stopwords data/allforms/stopwordsallforms.sql: all forms database in sql format data/allforms/stopwords_allforms.txt: data generated from minimal data file data/allforms/stopwordsallforms.py: all forms data as python dictionary tools/: scripts used to generate all forms from minimal data usage : generate_stopwords_forms.py -f data/stopwords.cvs > output_file.txt Note: to avoid program to treat some data, comment lines by #, in the data file Note: script can be custumed Data Structure -------------- All forms data .CSV file 1st field : unvocalised word ( Ýí) 2nd field : unvocalised stemmed word with -'-' between affixes: e.g. Ý-È-ÎãÓíä-í Minimal classified data .CSV file 1st field : unvocalised word ( Ýí) 2nd field : type of the word: e.g. ÍÑÝ 3rd field : class of word : e.g. preposition Affixation infomration in other fields: 4th field : AIN in arabic , if word accept Conjuction 'ÇáÚØÝ', '*' else 5th field : TEH in arabic , if word accept definate article 'Çá ÇáÊÚÑíÝ', '*' else 6th field : JEEM in arabic , if word accept preposition article 'ÍÑæÝ ÇáÌÑ ÇáãÊÕáÉ', '*' else 7th field : DAD in arabic , if word accept IDAFA articles 'ÇáÖãÇÆÑ ÇáãÊÕáÉ', '*' else 7th field : SAD in arabic , if word accept verb conjugation articles 'ÇáÊÕÑíÝ', '*' else 8th field : LAM in arabic , if word accept LAM QASAM articles 'áÇã ÇáÞÓã', '*' else 8th field : MEEM in arabic , if word has ALEF LAM as definition article 'ãÚÑÝ', '*' else How to custum stop word list --------------- 1- check the minimal form data file ( stopwords.csv) 2- comment by "#" all words which you don't need 3- run generate_stopwords_forms.py script 4- catch the output of script. Generation script usage: ------------------------ Usage: generate_stopwords_forms -f filename [OPTIONS] [-h | --help] outputs this usage message [-V | --version] program version [-f | --file= filename] input file to generate_stopwords_forms [-o | --out= output format] output format(csv,python,sql) How to add a word into word list --------------- 1- check if the word doesn't exist in the minimal form data file ( stopwords.csv) 2- add affixation information 3- run generate_stopwords_forms.py script 4- catch the output of script. Thanks
Source: readme, updated 2010-12-04

Thanks for helping keep SourceForge clean.

Screenshot instructions:
Red Hat Linux   Ubuntu

Click URL instructions:
Right-click on ad, choose "Copy Link", then paste here →
(This may not be possible with some types of ads)

More information about our ad policies

Briefly describe the problem (required):

Upload screenshot of ad (required):
Select a file, or drag & drop file here.

Please provide the ad click URL, if possible:

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:

No, thanks