Home
Name Modified Size InfoDownloads / Week
README.rst 2015-08-26 9.9 kB
Totals: 1 Item   9.9 kB 0

python-il-parser

Parser for Indian Languages

Installation

Dependencies

python-il-parser requires NumPy, GraphViz and PyDot.

To install the dependencies do something like this (Ubuntu):

sudo apt-get install python-numpy
sudo apt-get install graphviz
sudo apt-get install python-pydot

Download

Download python-il-parser from sourceforge.

Install

tar -xvzf python-il-parser.tar.gz
cd python-il-parser
gunzip models/*
sudo python setup.py install

Example

>>> from ilparser import ilparser
>>> with open('tests/sample.conll') as fp:
...   sentences = fp.read()
...
>>> print(sentences)
1       इसके     यह      pn      PRP     cat-pn|gen-any|num-sg|pers-3|case-o|vib-0_अतिरिक्त|tam-ke|chunkId-NP|chunkType-head|stype-|voicetype-      _       _       _       _
2       अतिरिक्त   अतिरिक्त   psp     PSP     cat-psp|gen-|num-|pers-|case-|vib-|tam-|chunkId-NP|chunkType-child|stype-|voicetype-    _       _       _       _
3       गुग्गुल    गुग्गुल    n       NNPC    cat-n|gen-m|num-sg|pers-3|case-d|vib-0|tam-0|chunkId-NP2|chunkType-child|stype-|voicetype-      _       _       _       _
4       कुंड      कुंड      n       NNP     cat-n|gen-m|num-sg|pers-3|case-d|vib-0|tam-0|chunkId-NP2|chunkType-head|stype-|voicetype-       _       _       _       _
5       ,       COMMA   punc    SYM     cat-punc|gen-|num-|pers-|case-|vib-|tam-|chunkId-NP2|chunkType-child|stype-|voicetype-  _       _       _       _
6       भीम      भीम      n       NNPC    cat-n|gen-m|num-sg|pers-3|case-d|vib-0|tam-0|chunkId-NP3|chunkType-child|stype-|voicetype-      _       _       _       _
7       गुफा      गुफा      n       NNP     cat-n|gen-f|num-sg|pers-3|case-d|vib-0|tam-0|chunkId-NP3|chunkType-head|stype-|voicetype-       _       _       _       _
8       तथा      तथा      avy     CC      cat-avy|gen-|num-|pers-|case-|vib-|tam-|chunkId-CCP|chunkType-head|stype-|voicetype-    _       _       _       _
9       भीमशिला    भीमशिला    n       NNP     cat-n|gen-f|num-sg|pers-3|case-d|vib-0|tam-0|chunkId-NP4|chunkType-head|stype-|voicetype-       _       _       _       _
10      भी       भी       avy     RP      cat-avy|gen-|num-|pers-|case-|vib-|tam-|chunkId-NP4|chunkType-child|stype-|voicetype-   _       _       _       _
11      दर्शनीय   दर्शनीय   adj     JJ      cat-adj|gen-any|num-any|pers-|case-d|vib-|tam-|chunkId-NP5|chunkType-child|stype-|voicetype-    _       _       _       _
12      स्थल     स्थल     n       NN      cat-n|gen-m|num-pl|pers-3|case-d|vib-0|tam-0|chunkId-NP5|chunkType-head|stype-|voicetype-       _       _       _       _
13      हैं       है       v       VM      cat-v|gen-any|num-pl|pers-3|case-|vib-है|tam-hE|chunkId-VGF|chunkType-head|stype-declarative|voicetype-active    0       root    _       _
14      ।       ।       punc    SYM     cat-punc|gen-|num-|pers-|case-|vib-|tam-|chunkId-BLK|chunkType-head|stype-|voicetype-   _       _       _       _

1       इसकी     यह      pn      PRP     cat-pn|gen-f|num-sg|pers-3|case-o|vib-का|tam-kA|chunkId-NP|chunkType-head|stype-|voicetype-      _       _       _       _
2       ऊँचाई     ऊँचाई     n       NN      cat-n|gen-f|num-sg|pers-3|case-d|vib-0|tam-0|chunkId-NP2|chunkType-head|stype-|voicetype-       _       _       _       _
3       केवल     केवल     avy     RP      cat-avy|gen-|num-|pers-|case-|vib-|tam-|chunkId-NP3|chunkType-child|stype-|voicetype-   _       _       _       _
4       1982    1982    num     QC      cat-num|gen-any|num-any|pers-|case-any|vib-|tam-|chunkId-NP3|chunkType-child|stype-|voicetype-  _       _       _       _
5       मीटर     मीटर     n       NN      cat-n|gen-m|num-sg|pers-3|case-d|vib-0|tam-0|chunkId-NP3|chunkType-head|stype-|voicetype-       _       _       _       _
6       है       है       v       VM      cat-v|gen-any|num-sg|pers-3|case-|vib-है|tam-hE|chunkId-VGF|chunkType-head|stype-declarative|voicetype-active    0       root    _       _
7       ।       ।       punc    SYM     cat-punc|gen-|num-|pers-|case-|vib-|tam-|chunkId-BLK|chunkType-head|stype-|voicetype-   _       _       _       _

>>>
>>>
>>> psrser = ilparser(out_dir="output-trees", plot=True)
>>> #plot is a flag to be set if you want to plot output parse trees
... #if plot is True, you need to pass the output directory for plotted trees in "out_dir"
... #default plot directory is /home/user/output-trees
... #if the specified plot directory already exists it will be cleaned first before redirecting plots to it
... #make sure the specified plot directory doesn't contain any important files
...
>>> parsed_sents = parser.getParse(sentences)
>>> print(parsed_sents)
>>>
>>>
>>> print(parse_sens)
1       इसके     यह      pn      PRP     case-o|vib-0_अतिरिक्त|cp-|psd-|cat-pn|pers-3|num-sg|stype-|voicetype-|tam-ke|sem-|chunkId-NP|gen-any|chunkType-head        13      k7p     _     _
2       अतिरिक्त   अतिरिक्त   psp     PSP     case-|vib-|cp-|psd-|cat-psp|pers-|num-|stype-|voicetype-|tam-|sem-|chunkId-NP|gen-|chunkType-child      1       lwg__psp        _       _
3       गुग्गुल    गुग्गुल    n       NNPC    case-d|vib-0|cp-|psd-|cat-n|pers-3|num-sg|stype-|voicetype-|tam-0|sem-|chunkId-NP2|gen-m|chunkType-child        4       pof__cn _       _
4       कुंड      कुंड      n       NNP     case-d|vib-0|cp-|psd-|cat-n|pers-3|num-sg|stype-|voicetype-|tam-0|sem-|chunkId-NP2|gen-m|chunkType-head 8       ccof    _       _
5       ,       COMMA   punc    SYM     case-|vib-|cp-|psd-|cat-punc|pers-|num-|stype-|voicetype-|tam-|sem-|chunkId-NP2|gen-|chunkType-child    4       rsym    _       _
6       भीम      भीम      n       NNPC    case-d|vib-0|cp-|psd-|cat-n|pers-3|num-sg|stype-|voicetype-|tam-0|sem-|chunkId-NP3|gen-m|chunkType-child        7       pof__cn _       _
7       गुफा      गुफा      n       NNP     case-d|vib-0|cp-|psd-|cat-n|pers-3|num-sg|stype-|voicetype-|tam-0|sem-|chunkId-NP3|gen-f|chunkType-head 8       ccof    _       _
8       तथा      तथा      avy     CC      case-|vib-|cp-|psd-|cat-avy|pers-|num-|stype-|voicetype-|tam-|sem-|chunkId-CCP|gen-|chunkType-head      12      nmod    _       _
9       भीमशिला    भीमशिला    n       NNP     case-d|vib-0|cp-|psd-|cat-n|pers-3|num-sg|stype-|voicetype-|tam-0|sem-|chunkId-NP4|gen-f|chunkType-head 8       ccof    _       _
10      भी       भी       avy     RP      case-|vib-|cp-|psd-|cat-avy|pers-|num-|stype-|voicetype-|tam-|sem-|chunkId-NP4|gen-|chunkType-child     9       lwg__rp _       _
11      दर्शनीय   दर्शनीय   adj     JJ      case-d|vib-|cp-|psd-|cat-adj|pers-|num-any|stype-|voicetype-|tam-|sem-|chunkId-NP5|gen-any|chunkType-child      12      nmod__adj       _     _
12      स्थल     स्थल     n       NN      case-d|vib-0|cp-|psd-|cat-n|pers-3|num-pl|stype-|voicetype-|tam-0|sem-|chunkId-NP5|gen-m|chunkType-head 13      k1s     _       _
13      हैं       है       v       VM      case-|vib-है|cp-|psd-|cat-v|pers-3|num-pl|stype-declarative|voicetype-active|tam-hE|sem-|chunkId-VGF|gen-any|chunkType-head      0       root    _       _
14      ।       ।       punc    SYM     case-|vib-|cp-|psd-|cat-punc|pers-|num-|stype-|voicetype-|tam-|sem-|chunkId-BLK|gen-|chunkType-head     13      rsym    _       _

1       इसकी     यह      pn      PRP     case-o|vib-का|cp-|psd-|cat-pn|pers-3|num-sg|stype-|voicetype-|tam-kA|sem-|chunkId-NP|gen-f|chunkType-head        2       r6      _       _
2       ऊँचाई     ऊँचाई     n       NN      case-d|vib-0|cp-|psd-|cat-n|pers-3|num-sg|stype-|voicetype-|tam-0|sem-|chunkId-NP2|gen-f|chunkType-head 6       k1      _       _
3       केवल     केवल     avy     RP      case-|vib-|cp-|psd-|cat-avy|pers-|num-|stype-|voicetype-|tam-|sem-|chunkId-NP3|gen-|chunkType-child     4       lwg__rp _       _
4       1982    1982    num     QC      case-any|vib-|cp-|psd-|cat-num|pers-|num-any|stype-|voicetype-|tam-|sem-|chunkId-NP3|gen-any|chunkType-child    5       nmod__adj       _       _
5       मीटर     मीटर     n       NN      case-d|vib-0|cp-|psd-|cat-n|pers-3|num-sg|stype-|voicetype-|tam-0|sem-|chunkId-NP3|gen-m|chunkType-head 6       k1s     _       _
6       है       है       v       VM      case-|vib-है|cp-|psd-|cat-v|pers-3|num-sg|stype-declarative|voicetype-active|tam-hE|sem-|chunkId-VGF|gen-any|chunkType-head      0       root    _       _
7       ।       ।       punc    SYM     case-|vib-|cp-|psd-|cat-punc|pers-|num-|stype-|voicetype-|tam-|sem-|chunkId-BLK|gen-|chunkType-head     6       rsym    _       _
>>>

Contact

Riyaz Ahmad Bhat
PHD-CL IIITH, Hyderabad
riyaz.bhat@research.iiit.ac.in

Irshad Ahmad Bhat
MS-CSE IIITH, Hyderabad
irshad.bhat@research.iiit.ac.in
Source: README.rst, updated 2015-08-26