Name | Modified | Size | Downloads / Week |
---|---|---|---|
README.rst | 2015-08-26 | 9.9 kB | |
Totals: 1 Item | 9.9 kB | 0 |
python-il-parser
Parser for Indian Languages
Installation
Dependencies
python-il-parser requires NumPy, GraphViz and PyDot.
To install the dependencies do something like this (Ubuntu):
sudo apt-get install python-numpy sudo apt-get install graphviz sudo apt-get install python-pydot
Download
Download python-il-parser from sourceforge.
Install
tar -xvzf python-il-parser.tar.gz cd python-il-parser gunzip models/* sudo python setup.py install
Example
>>> from ilparser import ilparser >>> with open('tests/sample.conll') as fp: ... sentences = fp.read() ... >>> print(sentences) 1 इसके यह pn PRP cat-pn|gen-any|num-sg|pers-3|case-o|vib-0_अतिरिक्त|tam-ke|chunkId-NP|chunkType-head|stype-|voicetype- _ _ _ _ 2 अतिरिक्त अतिरिक्त psp PSP cat-psp|gen-|num-|pers-|case-|vib-|tam-|chunkId-NP|chunkType-child|stype-|voicetype- _ _ _ _ 3 गुग्गुल गुग्गुल n NNPC cat-n|gen-m|num-sg|pers-3|case-d|vib-0|tam-0|chunkId-NP2|chunkType-child|stype-|voicetype- _ _ _ _ 4 कुंड कुंड n NNP cat-n|gen-m|num-sg|pers-3|case-d|vib-0|tam-0|chunkId-NP2|chunkType-head|stype-|voicetype- _ _ _ _ 5 , COMMA punc SYM cat-punc|gen-|num-|pers-|case-|vib-|tam-|chunkId-NP2|chunkType-child|stype-|voicetype- _ _ _ _ 6 भीम भीम n NNPC cat-n|gen-m|num-sg|pers-3|case-d|vib-0|tam-0|chunkId-NP3|chunkType-child|stype-|voicetype- _ _ _ _ 7 गुफा गुफा n NNP cat-n|gen-f|num-sg|pers-3|case-d|vib-0|tam-0|chunkId-NP3|chunkType-head|stype-|voicetype- _ _ _ _ 8 तथा तथा avy CC cat-avy|gen-|num-|pers-|case-|vib-|tam-|chunkId-CCP|chunkType-head|stype-|voicetype- _ _ _ _ 9 भीमशिला भीमशिला n NNP cat-n|gen-f|num-sg|pers-3|case-d|vib-0|tam-0|chunkId-NP4|chunkType-head|stype-|voicetype- _ _ _ _ 10 भी भी avy RP cat-avy|gen-|num-|pers-|case-|vib-|tam-|chunkId-NP4|chunkType-child|stype-|voicetype- _ _ _ _ 11 दर्शनीय दर्शनीय adj JJ cat-adj|gen-any|num-any|pers-|case-d|vib-|tam-|chunkId-NP5|chunkType-child|stype-|voicetype- _ _ _ _ 12 स्थल स्थल n NN cat-n|gen-m|num-pl|pers-3|case-d|vib-0|tam-0|chunkId-NP5|chunkType-head|stype-|voicetype- _ _ _ _ 13 हैं है v VM cat-v|gen-any|num-pl|pers-3|case-|vib-है|tam-hE|chunkId-VGF|chunkType-head|stype-declarative|voicetype-active 0 root _ _ 14 । । punc SYM cat-punc|gen-|num-|pers-|case-|vib-|tam-|chunkId-BLK|chunkType-head|stype-|voicetype- _ _ _ _ 1 इसकी यह pn PRP cat-pn|gen-f|num-sg|pers-3|case-o|vib-का|tam-kA|chunkId-NP|chunkType-head|stype-|voicetype- _ _ _ _ 2 ऊँचाई ऊँचाई n NN cat-n|gen-f|num-sg|pers-3|case-d|vib-0|tam-0|chunkId-NP2|chunkType-head|stype-|voicetype- _ _ _ _ 3 केवल केवल avy RP cat-avy|gen-|num-|pers-|case-|vib-|tam-|chunkId-NP3|chunkType-child|stype-|voicetype- _ _ _ _ 4 1982 1982 num QC cat-num|gen-any|num-any|pers-|case-any|vib-|tam-|chunkId-NP3|chunkType-child|stype-|voicetype- _ _ _ _ 5 मीटर मीटर n NN cat-n|gen-m|num-sg|pers-3|case-d|vib-0|tam-0|chunkId-NP3|chunkType-head|stype-|voicetype- _ _ _ _ 6 है है v VM cat-v|gen-any|num-sg|pers-3|case-|vib-है|tam-hE|chunkId-VGF|chunkType-head|stype-declarative|voicetype-active 0 root _ _ 7 । । punc SYM cat-punc|gen-|num-|pers-|case-|vib-|tam-|chunkId-BLK|chunkType-head|stype-|voicetype- _ _ _ _ >>> >>> >>> psrser = ilparser(out_dir="output-trees", plot=True) >>> #plot is a flag to be set if you want to plot output parse trees ... #if plot is True, you need to pass the output directory for plotted trees in "out_dir" ... #default plot directory is /home/user/output-trees ... #if the specified plot directory already exists it will be cleaned first before redirecting plots to it ... #make sure the specified plot directory doesn't contain any important files ... >>> parsed_sents = parser.getParse(sentences) >>> print(parsed_sents) >>> >>> >>> print(parse_sens) 1 इसके यह pn PRP case-o|vib-0_अतिरिक्त|cp-|psd-|cat-pn|pers-3|num-sg|stype-|voicetype-|tam-ke|sem-|chunkId-NP|gen-any|chunkType-head 13 k7p _ _ 2 अतिरिक्त अतिरिक्त psp PSP case-|vib-|cp-|psd-|cat-psp|pers-|num-|stype-|voicetype-|tam-|sem-|chunkId-NP|gen-|chunkType-child 1 lwg__psp _ _ 3 गुग्गुल गुग्गुल n NNPC case-d|vib-0|cp-|psd-|cat-n|pers-3|num-sg|stype-|voicetype-|tam-0|sem-|chunkId-NP2|gen-m|chunkType-child 4 pof__cn _ _ 4 कुंड कुंड n NNP case-d|vib-0|cp-|psd-|cat-n|pers-3|num-sg|stype-|voicetype-|tam-0|sem-|chunkId-NP2|gen-m|chunkType-head 8 ccof _ _ 5 , COMMA punc SYM case-|vib-|cp-|psd-|cat-punc|pers-|num-|stype-|voicetype-|tam-|sem-|chunkId-NP2|gen-|chunkType-child 4 rsym _ _ 6 भीम भीम n NNPC case-d|vib-0|cp-|psd-|cat-n|pers-3|num-sg|stype-|voicetype-|tam-0|sem-|chunkId-NP3|gen-m|chunkType-child 7 pof__cn _ _ 7 गुफा गुफा n NNP case-d|vib-0|cp-|psd-|cat-n|pers-3|num-sg|stype-|voicetype-|tam-0|sem-|chunkId-NP3|gen-f|chunkType-head 8 ccof _ _ 8 तथा तथा avy CC case-|vib-|cp-|psd-|cat-avy|pers-|num-|stype-|voicetype-|tam-|sem-|chunkId-CCP|gen-|chunkType-head 12 nmod _ _ 9 भीमशिला भीमशिला n NNP case-d|vib-0|cp-|psd-|cat-n|pers-3|num-sg|stype-|voicetype-|tam-0|sem-|chunkId-NP4|gen-f|chunkType-head 8 ccof _ _ 10 भी भी avy RP case-|vib-|cp-|psd-|cat-avy|pers-|num-|stype-|voicetype-|tam-|sem-|chunkId-NP4|gen-|chunkType-child 9 lwg__rp _ _ 11 दर्शनीय दर्शनीय adj JJ case-d|vib-|cp-|psd-|cat-adj|pers-|num-any|stype-|voicetype-|tam-|sem-|chunkId-NP5|gen-any|chunkType-child 12 nmod__adj _ _ 12 स्थल स्थल n NN case-d|vib-0|cp-|psd-|cat-n|pers-3|num-pl|stype-|voicetype-|tam-0|sem-|chunkId-NP5|gen-m|chunkType-head 13 k1s _ _ 13 हैं है v VM case-|vib-है|cp-|psd-|cat-v|pers-3|num-pl|stype-declarative|voicetype-active|tam-hE|sem-|chunkId-VGF|gen-any|chunkType-head 0 root _ _ 14 । । punc SYM case-|vib-|cp-|psd-|cat-punc|pers-|num-|stype-|voicetype-|tam-|sem-|chunkId-BLK|gen-|chunkType-head 13 rsym _ _ 1 इसकी यह pn PRP case-o|vib-का|cp-|psd-|cat-pn|pers-3|num-sg|stype-|voicetype-|tam-kA|sem-|chunkId-NP|gen-f|chunkType-head 2 r6 _ _ 2 ऊँचाई ऊँचाई n NN case-d|vib-0|cp-|psd-|cat-n|pers-3|num-sg|stype-|voicetype-|tam-0|sem-|chunkId-NP2|gen-f|chunkType-head 6 k1 _ _ 3 केवल केवल avy RP case-|vib-|cp-|psd-|cat-avy|pers-|num-|stype-|voicetype-|tam-|sem-|chunkId-NP3|gen-|chunkType-child 4 lwg__rp _ _ 4 1982 1982 num QC case-any|vib-|cp-|psd-|cat-num|pers-|num-any|stype-|voicetype-|tam-|sem-|chunkId-NP3|gen-any|chunkType-child 5 nmod__adj _ _ 5 मीटर मीटर n NN case-d|vib-0|cp-|psd-|cat-n|pers-3|num-sg|stype-|voicetype-|tam-0|sem-|chunkId-NP3|gen-m|chunkType-head 6 k1s _ _ 6 है है v VM case-|vib-है|cp-|psd-|cat-v|pers-3|num-sg|stype-declarative|voicetype-active|tam-hE|sem-|chunkId-VGF|gen-any|chunkType-head 0 root _ _ 7 । । punc SYM case-|vib-|cp-|psd-|cat-punc|pers-|num-|stype-|voicetype-|tam-|sem-|chunkId-BLK|gen-|chunkType-head 6 rsym _ _ >>>
Contact
Riyaz Ahmad Bhat PHD-CL IIITH, Hyderabad riyaz.bhat@research.iiit.ac.in Irshad Ahmad Bhat MS-CSE IIITH, Hyderabad irshad.bhat@research.iiit.ac.in