ElixirFM Wiki

Functional Arabic Morphology

Brought to you by: otakar-smrz

ElixirPerl

Authors:

ElixirFM in Perl

ElixirFM is provided with modules that let programmers use it from within other Perl programs.

These modules are bundled in the ElixirFM-*-Perl.tar.gz package, where installation instructions can also be found.

Let us assume you have the elixir executable of the ElixirFM-*-Exec-*.zip package installed as well.

Lookup

You can call elixir lookup from a command line and then supply your input, upon which you receive the program's output, like:

1 2	$$ elixir lookup something

(6367,Just [3])
<Nest>
 <root>^s y '</root>
 <ents>
  <Entry>
   <morphs>FaCL</morphs>
   <entity>
    <Noun>
     <plural>HaFCAL</plural>
    </Noun>
   </entity>
   <reflex>
    <LM>something</LM>
    <LM>thing</LM>
   </reflex>
  </Entry>
 </ents>
</Nest>

The results of the lookup mode give us the pointers to the particular entries of the lexicon that match the searched term. In this case, there is only one entry in the ElixirFM lexicon that contains the word "something" in the translations.

By using the ElixirFM::Exec module, we can call the elixir executable from within a Perl program. The function ElixirFM::Exec::elixir accepts the command-line parameters and the input as arguments, and returns a value that is identical to the output of the executable. You can try yourself:

use ElixirFM::Exec; 

$r = ElixirFM::Exec::elixir "lookup", "something";

print $r;

This would give us the same results as in the previous snippet. The point now is that we can process this rather condensed output information with various functions defined in the ElixirFM module.

The ElixirFM::unpretty function allows us to get to the individual pieces of the information that we obtain. The optional additional parameter will ensure that the XML-formatted contents of each entry be parsed into appropriate data structures.

1
2
3

use ElixirFM;

@u = ElixirFM::unpretty $r, "clear";

This code parses the output of elixir lookup completely and re-organizes some sub-structures. In order to see their pretty-printed representation, we can use the Data::Dumper module:

1
2
3

use Data::Dumper;

print Data::Dumper->Dump([\@u],["*u"]);

@u = (
       [
         {
           'clip' => '(6367,Just[3])',
           'ents' => [
                       [
                         'Entry',
                         {
                           'entity' => [
                                         'Noun',
                                         {
                                           'plural' => [
                                                         'HaFCAL'
                                                       ]
                                         }
                                       ],
                           'morphs' => 'FaCL',
                           'reflex' => [
                                         'something',
                                         'thing'
                                       ]
                         }
                       ]
                     ],
           'root' => '^s y \''
         }
       ]
     );

ElixirHaskell will list the complete options for using the elixir lookup mode.

Output of the other modes of ElixirFM can be processed in a similar way.

Derive

The elixir derive works with lexemes, which are declared via lexicon pointers as additional arguments on the command line, and derives new lexemes according to the morphological tags supplied on the input. This call will return the deverbal noun and the active and passive participles for the verb "read":

1 2	$$ elixir derive "(1234,1)" [NA]---------

[N---------]    I    qirA'aT   "q r '"   FiCAL |< aT
[A--A------]    I    qAri'     "q r '"   FACiL
[A--P------]    I    maqrU'    "q r '"   MaFCUL

Let us invoke the same from within Perl. Note that the original command-line arguments would be passed to the ElixirFM::Exec::elixir function as an array reference containing the list of the very arguments, after which the list of input lines with requested tags would follow:

$r = ElixirFM::Exec::elixir "derive", ["(1234,1)"], "[NA]---------";

@u = ElixirFM::unpretty $r;

print Data::Dumper->Dump([\@u], ["*u"]);

@u = ( 
       [
         [
           '[N---------]',
           [
             'I',
             'qirA\'aT',
             '"q r \'"',
             'FiCAL |< aT'
           ]
         ],
         [
           '[A--A------]',
           [
             'I',
             'qAri\'',
             '"q r \'"',
             'FACiL'
           ]
         ],
         [
           '[A--P------]',
           [
             'I',
             'maqrU\'',
             '"q r \'"',
             'MaFCUL'
           ]
         ]
       ]
     );

This is all nice, but how do we process this information into something naturally useful? We can traverse these data structures, select some data, and convert them into the Arabic script or the phonological transcription if we like:

use Encode::Arabic::ArabTeX ':simple';

print encode "utf8", join "",
      map { ElixirFM::phon($_) . "\t" . ElixirFM::orth($_) . "\n" }
      map { $_->[1][1] } map { @{$_} } @u;

qirā'at قِرَاءَة
qāri'    قَارِئ
maqrū'  مَقرُوء

Inflect

One of the most interesting applications of ElixirFM comes with elixir inflect, which generates all word forms of particular lexemes that correspond to the provided grammatical parameters. The space of inflection parameters is restricted via morphological tags on the standard input, while the lexemes are supplied via lexicon pointers as additional arguments on the command line:

1 2	$$ elixir inflect "(1234,1)" VP-A-3-S-- VIIA-3-S--

VP-A-3MS--    qara'a    "q r '"   FaCaL |<< "a"
VP-A-3FS--    qara'at   "q r '"   FaCaL |<< "at"
VIIA-3MS--    yaqra'u   "q r '"   "ya" >>| FCaL |<< "u"
VIIA-3FS--    taqra'u   "q r '"   "ta" >>| FCaL |<< "u"

This data format is quite similar to that of elixir derive, and so is the invocation of the mode from within Perl:

$r = ElixirFM::Exec::elixir "inflect", ["(1234,1)"], "VP-A-3-S-- VIIA-3-S--";

@u = ElixirFM::unpretty $r;

print Data::Dumper->Dump([\@u], ["*u"]);

@u = (
       [
         [
           'VP-A-3MS--',
           [
             'qara\'a',
             '"q r \'"',
             'FaCaL |<< "a"'
           ]
         ],
         [
           'VP-A-3FS--',
           [
             'qara\'at',
             '"q r \'"',
             'FaCaL |<< "at"'
           ]
         ],
         [
           'VIIA-3MS--',
           [
             'yaqra\'u',
             '"q r \'"',
             '"ya" >>| FCaL |<< "u"'
           ]
         ],
         [
           'VIIA-3FS--',
           [
             'taqra\'u',
             '"q r \'"',
             '"ta" >>| FCaL |<< "u"'
           ]
         ]
       ]
     );

The ElixirFM Perl module implements miscellaneous functions to process the declarations of grammatical parameters. It can retrieve tag restrictions from a string of freely abbreviated natural language names, as well as spell out the formal tags into commonly used descriptions:

1	print ElixirFM::retrieve "perfect verb second person feminine active";

VP-A-2F---

1	print join " ", ElixirFM::retrieve "(verb act sg pl) (noun adj sg nom indef) S V[PI]-A";

V--A---[SP]-- [NA]------S1I S--------- V[PI]-A------

1	print ElixirFM::describe "V[PI]-A-3[FM]S--";

perfective imperfective verb, active voice, third person, feminine masculine gender, singular number

1	print ElixirFM::describe "[NA]------S1I", 'terse';

noun adjective, singular, nominative, indefinite

Resolve

The elixir resolve mode provides the morphological analysis of the entered text. The output of this mode is usually quite complex, however, you can control how the multiple interpretations are structured. No command-line argument or the explicit --trees option will present the system's reply in form of MorphoTrees, while the --lists option will produce the MorphoLists format that may be more verbatim, but is sure to preserve the consistency of the solutions.

The MorphoTrees format summarizes the different readings into compact subgroups of alternations, and it looks as follows:

1 2	$$ elixir resolve حوله

:::: حوله

 ::: <.hawwala .. .hawla> <hi hu>

  :: <.hawwala .. .hawla>
   : (876,2)    ["change","convert","switch"]
                Verb [] [] []   [II]
                .hawwal ".h w l"        FaCCaL
     VP-A-3MS-- .hawwala        ".h w l"        FaCCaL |<< "a"
     VCJ---MS-- .hawwil ".h w l"        "" >>| FaCCiL |<< ""
   : (876,37)   ["power"]
                Noun [] [I]
                .hawl   ".h w l"        FaCL
     N------S1R .hawlu  ".h w l"        FaCL |<< "u"
     N------S2R .hawli  ".h w l"        FaCL |<< "i"
     N------S4R .hawla  ".h w l"        FaCL |<< "a"
   : (876,38)   ["about","around"]
                Prep    []
                .hawla  ".h w l"        FaCL |<< "a"
     PI------1- .hawlu  ".h w l"        FaCL |<< "u"
     PI------2- .hawli  ".h w l"        FaCL |<< "i"
     PI------4- .hawla  ".h w l"        FaCL |<< "a"

  :: <hi hu>
   : (22,1)     ["he","she","it"]
                Pron    []
                huwa    ""      "huwa"
     SP---3MS2- hi      ""      "hi"
     SP---3MS2- hu      ""      "hu"
     SP---3MS4- hu      ""      "hu"

1
2
3

$r = ElixirFM::Exec::elixir "resolve", "حوله";

@u = ElixirFM::unpretty $r;

While we omit the listings of the data structures for the elixir resolve output here, we encourage you to explore this mode, as well as the other ones, on the ElixirFM Online Interface.

The MorphoLists format presents the solutions in a bit more complex data structure, however, it guarantees consistency of the individual readings of the token sequences:

1 2	$$ elixir resolve --lists حوله

:::: حوله

 ::: <.hawwalahu> .. <.hawlahu>

  :: (876,2)    ["change","convert","switch"]
                Verb [] [] []   [II]
                .hawwal ".h w l"        FaCCaL
     (22,1)     ["he","she","it"]
                Pron    []
                huwa    ""      "huwa"
   : <.hawwalahu>
     VP-A-3MS-- .hawwala        ".h w l"        FaCCaL |<< "a"
     SP---3MS4- hu      ""      "hu"
   : <.hawwilhu>
     VCJ---MS-- .hawwil ".h w l"        "" >>| FaCCiL |<< ""
     SP---3MS4- hu      ""      "hu"

  :: (876,37)   ["power"]
                Noun [] [I]
                .hawl   ".h w l"        FaCL
     (22,1)     ["he","she","it"]
                Pron    []
                huwa    ""      "huwa"
   : <.hawluhu>
     N------S1R .hawlu  ".h w l"        FaCL |<< "u"
     SP---3MS2- hu      ""      "hu"
   : <.hawlihi>
     N------S2R .hawli  ".h w l"        FaCL |<< "i"
     SP---3MS2- hi      ""      "hi"
   : <.hawlahu>
     N------S4R .hawla  ".h w l"        FaCL |<< "a"
     SP---3MS2- hu      ""      "hu"

  :: (876,38)   ["about","around"]
                Prep    []
                .hawla  ".h w l"        FaCL |<< "a"
     (22,1)     ["he","she","it"]
                Pron    []
                huwa    ""      "huwa"
   : <.hawluhu>
     PI------1- .hawlu  ".h w l"        FaCL |<< "u"
     SP---3MS2- hu      ""      "hu"
   : <.hawlihi>
     PI------2- .hawli  ".h w l"        FaCL |<< "i"
     SP---3MS2- hi      ""      "hi"
   : <.hawlahu>
     PI------4- .hawla  ".h w l"        FaCL |<< "a"
     SP---3MS2- hu      ""      "hu"

1
2
3

$r = ElixirFM::Exec::elixir "resolve", ["--lists"], "حوله";

@u = ElixirFM::unpretty $r;

The ElixirFM-*-Perl.tar.gz package provides also the elixir-column.pl script that can be used for reformatting the general output of elixir resolve --lists into a column format that simply lists the solutions. In the currently distributed version, it leaves out interesting structural and lexical details, yet it can be quite easily modified or extended by the user:

1 2	$$ elixir resolve -l \| elixir-column.pl حوله

حوله    <.hawwalahu>    VP-A-3MS-- SP---3MS4-    (879,2) (22,1)     ["change","convert","switch"] ["he","she","it"]
حوله    <.hawwilhu>     VCJ---MS-- SP---3MS4-    (879,2) (22,1)     ["change","convert","switch"] ["he","she","it"]
حوله    <.huwwaluhu>    A-----MP1R SP---3MS2-    (879,26) (22,1)    ["changeable","variable","changing"] ["he","she","it"]
حوله    <.huwwalihi>    A-----MP2R SP---3MS2-    (879,26) (22,1)    ["changeable","variable","changing"] ["he","she","it"]
حوله    <.huwwalahu>    A-----MP4R SP---3MS2-    (879,26) (22,1)    ["changeable","variable","changing"] ["he","she","it"]
حوله    <.hawluhu>      N------S1R SP---3MS2-    (879,37) (22,1)    ["power"] ["he","she","it"]
حوله    <.hawlihi>      N------S2R SP---3MS2-    (879,37) (22,1)    ["power"] ["he","she","it"]
حوله    <.hawlahu>      N------S4R SP---3MS2-    (879,37) (22,1)    ["power"] ["he","she","it"]
حوله    <.hawluhu>      PI------1- SP---3MS2-    (879,38) (22,1)    ["about","around"] ["he","she","it"]
حوله    <.hawlihi>      PI------2- SP---3MS2-    (879,38) (22,1)    ["about","around"] ["he","she","it"]
حوله    <.hawlahu>      PI------4- SP---3MS2-    (879,38) (22,1)    ["about","around"] ["he","she","it"]

In most text formats above, the tab character "\t" is used as a delimiter of columns capturing the different kinds of information. Extracting and processing the information further using command line tools like cut, grep, etc. can be recommended. The alignment of columns can be improved by using the expand -t command setting or adjusting the tabbing positions. Converting the representation of phonology and orthography into the original script can be achieved with the Encode Arabic module and the encode and decode executables it provides, or with the convenience functions of the ElixirFM library, of course.

Optionally, one can use the elixir-encode.pl script that converts the ElixirFM notation into the original script or the Buckwalter transliteration, both with and without diacritics, and prints these into additional columns. You are encouraged to modify the elixir-encode.pl script according to your particular needs:

1 2	$$ elixir resolve -l \| elixir-column.pl \| elixir-encode.pl حوله

حوله    <.hawwalahu>   VP-A-3MS-- SP---3MS4-   (879,2) (22,1)   ["change","convert","switch"] ["he","she","it"]         Haw~alahu    Hwlh    حوله    حَوَّلَهُ
حوله    <.hawwilhu>    VCJ---MS-- SP---3MS4-   (879,2) (22,1)   ["change","convert","switch"] ["he","she","it"]         Haw~ilhu     Hwlh    حوله    حَوِّلهُ
حوله    <.huwwaluhu>   A-----MP1R SP---3MS2-   (879,26) (22,1)  ["changeable","variable","changing"] ["he","she","it"]  Huw~aluhu    Hwlh    حوله    حُوَّلُهُ
حوله    <.huwwalihi>   A-----MP2R SP---3MS2-   (879,26) (22,1)  ["changeable","variable","changing"] ["he","she","it"]  Huw~alihi    Hwlh    حوله    حُوَّلِهِ
حوله    <.huwwalahu>   A-----MP4R SP---3MS2-   (879,26) (22,1)  ["changeable","variable","changing"] ["he","she","it"]  Huw~alahu    Hwlh    حوله    حُوَّلَهُ
حوله    <.hawluhu>     N------S1R SP---3MS2-   (879,37) (22,1)  ["power","might"] ["he","she","it"]                     Hawluhu      Hwlh    حوله    حَولُهُ
حوله    <.hawlihi>     N------S2R SP---3MS2-   (879,37) (22,1)  ["power","might"] ["he","she","it"]                     Hawlihi      Hwlh    حوله    حَولِهِ
حوله    <.hawlahu>     N------S4R SP---3MS2-   (879,37) (22,1)  ["power","might"] ["he","she","it"]                     Hawlahu      Hwlh    حوله    حَولَهُ
حوله    <.hawluhu>     PI------1- SP---3MS2-   (879,38) (22,1)  ["around","about"] ["he","she","it"]                    Hawluhu      Hwlh    حوله    حَولُهُ
حوله    <.hawlihi>     PI------2- SP---3MS2-   (879,38) (22,1)  ["around","about"] ["he","she","it"]                    Hawlihi      Hwlh    حوله    حَولِهِ
حوله    <.hawlahu>     PI------4- SP---3MS2-   (879,38) (22,1)  ["around","about"] ["he","she","it"]                    Hawlahu      Hwlh    حوله    حَولَهُ

The elixir-encode.pl script can be applied even directly to the output of the elixir executable, just try it! :)

$$ elixir resolve --lists | elixir-encode.pl
حوله
$$ elixir resolve --trees | elixir-encode.pl
حوله

1 2	$$ elixir inflect "(1234,1)" \| elixir-encode.pl VP-A-3-S-- VIIA-3-S--

VP-A-3MS--    qara'a    "q r '"   FaCaL |<< "a"            qaraOa     qrO      قرأ     قَرَأَ
VP-A-3FS--    qara'at   "q r '"   FaCaL |<< "at"           qaraOat    qrOt    قرأت    قَرَأَت
VIIA-3MS--    yaqra'u   "q r '"   "ya" >>| FCaL |<< "u"    yaqraOu    yqrO    يقرأ    يَقرَأُ
VIIA-3FS--    taqra'u   "q r '"   "ta" >>| FCaL |<< "u"    taqraOu    tqrO    تقرأ    تَقرَأُ

Wiki: ElixirFM
Wiki: ElixirHaskell