ElixirFM is provided with modules that let programmers use it from within other Perl programs.
These modules are bundled in the ElixirFM-*-Perl.tar.gz package, where installation instructions can also be found.
Let us assume you have the elixir
executable of the ElixirFM-*-Exec-*.zip package installed as well.
You can call elixir lookup
from a command line and then supply your input, upon which you receive the program's output, like:
1 2 | $$ elixir lookup
something
|
(6367,Just [3]) <Nest> <root>^s y '</root> <ents> <Entry> <morphs>FaCL</morphs> <entity> <Noun> <plural>HaFCAL</plural> </Noun> </entity> <reflex> <LM>something</LM> <LM>thing</LM> </reflex> </Entry> </ents> </Nest>
The results of the lookup mode give us the pointers to the particular entries of the lexicon that match the searched term. In this case, there is only one entry in the ElixirFM lexicon that contains the word "something" in the translations.
By using the ElixirFM::Exec module, we can call the elixir
executable from within a Perl program. The function ElixirFM::Exec::elixir accepts the command-line parameters and the input as arguments, and returns a value that is identical to the output of the executable. You can try yourself:
1 2 3 4 5 | use ElixirFM::Exec; $r = ElixirFM::Exec::elixir "lookup", "something"; print $r; |
This would give us the same results as in the previous snippet. The point now is that we can process this rather condensed output information with various functions defined in the ElixirFM module.
The ElixirFM::unpretty function allows us to get to the individual pieces of the information that we obtain. The optional additional parameter will ensure that the XML-formatted contents of each entry be parsed into appropriate data structures.
1 2 3 | use ElixirFM; @u = ElixirFM::unpretty $r, "clear"; |
This code parses the output of elixir lookup
completely and re-organizes some sub-structures. In order to see their pretty-printed representation, we can use the Data::Dumper module:
1 2 3 | use Data::Dumper; print Data::Dumper->Dump([\@u],["*u"]); |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 | @u = ( [ { 'clip' => '(6367,Just[3])', 'ents' => [ [ 'Entry', { 'entity' => [ 'Noun', { 'plural' => [ 'HaFCAL' ] } ], 'morphs' => 'FaCL', 'reflex' => [ 'something', 'thing' ] } ] ], 'root' => '^s y \'' } ] ); |
ElixirHaskell will list the complete options for using the elixir lookup
mode.
Output of the other modes of ElixirFM can be processed in a similar way.
The elixir derive
works with lexemes, which are declared via lexicon pointers as additional arguments on the command line, and derives new lexemes according to the morphological tags supplied on the input. This call will return the deverbal noun and the active and passive participles for the verb "read":
1 2 | $$ elixir derive "(1234,1)" [NA]--------- |
[N---------] I qirA'aT "q r '" FiCAL |< aT [A--A------] I qAri' "q r '" FACiL [A--P------] I maqrU' "q r '" MaFCUL
Let us invoke the same from within Perl. Note that the original command-line arguments would be passed to the ElixirFM::Exec::elixir function as an array reference containing the list of the very arguments, after which the list of input lines with requested tags would follow:
1 2 3 4 5 | $r = ElixirFM::Exec::elixir "derive", ["(1234,1)"], "[NA]---------"; @u = ElixirFM::unpretty $r; print Data::Dumper->Dump([\@u], ["*u"]); |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 | @u = ( [ [ '[N---------]', [ 'I', 'qirA\'aT', '"q r \'"', 'FiCAL |< aT' ] ], [ '[A--A------]', [ 'I', 'qAri\'', '"q r \'"', 'FACiL' ] ], [ '[A--P------]', [ 'I', 'maqrU\'', '"q r \'"', 'MaFCUL' ] ] ] ); |
This is all nice, but how do we process this information into something naturally useful? We can traverse these data structures, select some data, and convert them into the Arabic script or the phonological transcription if we like:
1 2 3 4 5 | use Encode::Arabic::ArabTeX ':simple'; print encode "utf8", join "", map { ElixirFM::phon($_) . "\t" . ElixirFM::orth($_) . "\n" } map { $_->[1][1] } map { @{$_} } @u; |
qirā'at قِرَاءَة qāri' قَارِئ maqrū' مَقرُوء
One of the most interesting applications of ElixirFM comes with elixir inflect
, which generates all word forms of particular lexemes that correspond to the provided grammatical parameters. The space of inflection parameters is restricted via morphological tags on the standard input, while the lexemes are supplied via lexicon pointers as additional arguments on the command line:
1 2 | $$ elixir inflect "(1234,1)" VP-A-3-S-- VIIA-3-S-- |
VP-A-3MS-- qara'a "q r '" FaCaL |<< "a" VP-A-3FS-- qara'at "q r '" FaCaL |<< "at" VIIA-3MS-- yaqra'u "q r '" "ya" >>| FCaL |<< "u" VIIA-3FS-- taqra'u "q r '" "ta" >>| FCaL |<< "u"
This data format is quite similar to that of elixir derive
, and so is the invocation of the mode from within Perl:
1 2 3 4 5 | $r = ElixirFM::Exec::elixir "inflect", ["(1234,1)"], "VP-A-3-S-- VIIA-3-S--"; @u = ElixirFM::unpretty $r; print Data::Dumper->Dump([\@u], ["*u"]); |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 | @u = ( [ [ 'VP-A-3MS--', [ 'qara\'a', '"q r \'"', 'FaCaL |<< "a"' ] ], [ 'VP-A-3FS--', [ 'qara\'at', '"q r \'"', 'FaCaL |<< "at"' ] ], [ 'VIIA-3MS--', [ 'yaqra\'u', '"q r \'"', '"ya" >>| FCaL |<< "u"' ] ], [ 'VIIA-3FS--', [ 'taqra\'u', '"q r \'"', '"ta" >>| FCaL |<< "u"' ] ] ] ); |
The ElixirFM Perl module implements miscellaneous functions to process the declarations of grammatical parameters. It can retrieve tag restrictions from a string of freely abbreviated natural language names, as well as spell out the formal tags into commonly used descriptions:
1 | print ElixirFM::retrieve "perfect verb second person feminine active"; |
VP-A-2F---
1 | print join " ", ElixirFM::retrieve "(verb act sg pl) (noun adj sg nom indef) S V[PI]-A"; |
V--A---[SP]-- [NA]------S1I S--------- V[PI]-A------
1 | print ElixirFM::describe "V[PI]-A-3[FM]S--"; |
perfective imperfective verb, active voice, third person, feminine masculine gender, singular number
1 | print ElixirFM::describe "[NA]------S1I", 'terse'; |
noun adjective, singular, nominative, indefinite
The elixir resolve
mode provides the morphological analysis of the entered text. The output of this mode is usually quite complex, however, you can control how the multiple interpretations are structured. No command-line argument or the explicit --trees
option will present the system's reply in form of MorphoTrees, while the --lists
option will produce the MorphoLists format that may be more verbatim, but is sure to preserve the consistency of the solutions.
The MorphoTrees format summarizes the different readings into compact subgroups of alternations, and it looks as follows:
1 2 | $$ elixir resolve
حوله
|
:::: حوله ::: <.hawwala .. .hawla> <hi hu> :: <.hawwala .. .hawla> : (876,2) ["change","convert","switch"] Verb [] [] [] [II] .hawwal ".h w l" FaCCaL VP-A-3MS-- .hawwala ".h w l" FaCCaL |<< "a" VCJ---MS-- .hawwil ".h w l" "" >>| FaCCiL |<< "" : (876,37) ["power"] Noun [] [I] .hawl ".h w l" FaCL N------S1R .hawlu ".h w l" FaCL |<< "u" N------S2R .hawli ".h w l" FaCL |<< "i" N------S4R .hawla ".h w l" FaCL |<< "a" : (876,38) ["about","around"] Prep [] .hawla ".h w l" FaCL |<< "a" PI------1- .hawlu ".h w l" FaCL |<< "u" PI------2- .hawli ".h w l" FaCL |<< "i" PI------4- .hawla ".h w l" FaCL |<< "a" :: <hi hu> : (22,1) ["he","she","it"] Pron [] huwa "" "huwa" SP---3MS2- hi "" "hi" SP---3MS2- hu "" "hu" SP---3MS4- hu "" "hu"
1 2 3 | $r = ElixirFM::Exec::elixir "resolve", "حوله"; @u = ElixirFM::unpretty $r; |
While we omit the listings of the data structures for the elixir resolve
output here, we encourage you to explore this mode, as well as the other ones, on the ElixirFM Online Interface.
The MorphoLists format presents the solutions in a bit more complex data structure, however, it guarantees consistency of the individual readings of the token sequences:
1 2 | $$ elixir resolve --lists
حوله
|
:::: حوله ::: <.hawwalahu> .. <.hawlahu> :: (876,2) ["change","convert","switch"] Verb [] [] [] [II] .hawwal ".h w l" FaCCaL (22,1) ["he","she","it"] Pron [] huwa "" "huwa" : <.hawwalahu> VP-A-3MS-- .hawwala ".h w l" FaCCaL |<< "a" SP---3MS4- hu "" "hu" : <.hawwilhu> VCJ---MS-- .hawwil ".h w l" "" >>| FaCCiL |<< "" SP---3MS4- hu "" "hu" :: (876,37) ["power"] Noun [] [I] .hawl ".h w l" FaCL (22,1) ["he","she","it"] Pron [] huwa "" "huwa" : <.hawluhu> N------S1R .hawlu ".h w l" FaCL |<< "u" SP---3MS2- hu "" "hu" : <.hawlihi> N------S2R .hawli ".h w l" FaCL |<< "i" SP---3MS2- hi "" "hi" : <.hawlahu> N------S4R .hawla ".h w l" FaCL |<< "a" SP---3MS2- hu "" "hu" :: (876,38) ["about","around"] Prep [] .hawla ".h w l" FaCL |<< "a" (22,1) ["he","she","it"] Pron [] huwa "" "huwa" : <.hawluhu> PI------1- .hawlu ".h w l" FaCL |<< "u" SP---3MS2- hu "" "hu" : <.hawlihi> PI------2- .hawli ".h w l" FaCL |<< "i" SP---3MS2- hi "" "hi" : <.hawlahu> PI------4- .hawla ".h w l" FaCL |<< "a" SP---3MS2- hu "" "hu"
1 2 3 | $r = ElixirFM::Exec::elixir "resolve", ["--lists"], "حوله"; @u = ElixirFM::unpretty $r; |
The ElixirFM-*-Perl.tar.gz package provides also the elixir-column.pl
script that can be used for reformatting the general output of elixir resolve --lists
into a column format that simply lists the solutions. In the currently distributed version, it leaves out interesting structural and lexical details, yet it can be quite easily modified or extended by the user:
1 2 | $$ elixir resolve -l | elixir-column.pl حوله |
حوله <.hawwalahu> VP-A-3MS-- SP---3MS4- (879,2) (22,1) ["change","convert","switch"] ["he","she","it"] حوله <.hawwilhu> VCJ---MS-- SP---3MS4- (879,2) (22,1) ["change","convert","switch"] ["he","she","it"] حوله <.huwwaluhu> A-----MP1R SP---3MS2- (879,26) (22,1) ["changeable","variable","changing"] ["he","she","it"] حوله <.huwwalihi> A-----MP2R SP---3MS2- (879,26) (22,1) ["changeable","variable","changing"] ["he","she","it"] حوله <.huwwalahu> A-----MP4R SP---3MS2- (879,26) (22,1) ["changeable","variable","changing"] ["he","she","it"] حوله <.hawluhu> N------S1R SP---3MS2- (879,37) (22,1) ["power"] ["he","she","it"] حوله <.hawlihi> N------S2R SP---3MS2- (879,37) (22,1) ["power"] ["he","she","it"] حوله <.hawlahu> N------S4R SP---3MS2- (879,37) (22,1) ["power"] ["he","she","it"] حوله <.hawluhu> PI------1- SP---3MS2- (879,38) (22,1) ["about","around"] ["he","she","it"] حوله <.hawlihi> PI------2- SP---3MS2- (879,38) (22,1) ["about","around"] ["he","she","it"] حوله <.hawlahu> PI------4- SP---3MS2- (879,38) (22,1) ["about","around"] ["he","she","it"]
In most text formats above, the tab character "\t" is used as a delimiter of columns capturing the different kinds of information. Extracting and processing the information further using command line tools like cut
, grep
, etc. can be recommended. The alignment of columns can be improved by using the expand -t
command setting or adjusting the tabbing positions. Converting the representation of phonology and orthography into the original script can be achieved with the Encode Arabic module and the encode
and decode
executables it provides, or with the convenience functions of the ElixirFM library, of course.
Optionally, one can use the elixir-encode.pl
script that converts the ElixirFM notation into the original script or the Buckwalter transliteration, both with and without diacritics, and prints these into additional columns. You are encouraged to modify the elixir-encode.pl
script according to your particular needs:
1 2 | $$ elixir resolve -l | elixir-column.pl | elixir-encode.pl حوله |
حوله <.hawwalahu> VP-A-3MS-- SP---3MS4- (879,2) (22,1) ["change","convert","switch"] ["he","she","it"] Haw~alahu Hwlh حوله حَوَّلَهُ حوله <.hawwilhu> VCJ---MS-- SP---3MS4- (879,2) (22,1) ["change","convert","switch"] ["he","she","it"] Haw~ilhu Hwlh حوله حَوِّلهُ حوله <.huwwaluhu> A-----MP1R SP---3MS2- (879,26) (22,1) ["changeable","variable","changing"] ["he","she","it"] Huw~aluhu Hwlh حوله حُوَّلُهُ حوله <.huwwalihi> A-----MP2R SP---3MS2- (879,26) (22,1) ["changeable","variable","changing"] ["he","she","it"] Huw~alihi Hwlh حوله حُوَّلِهِ حوله <.huwwalahu> A-----MP4R SP---3MS2- (879,26) (22,1) ["changeable","variable","changing"] ["he","she","it"] Huw~alahu Hwlh حوله حُوَّلَهُ حوله <.hawluhu> N------S1R SP---3MS2- (879,37) (22,1) ["power","might"] ["he","she","it"] Hawluhu Hwlh حوله حَولُهُ حوله <.hawlihi> N------S2R SP---3MS2- (879,37) (22,1) ["power","might"] ["he","she","it"] Hawlihi Hwlh حوله حَولِهِ حوله <.hawlahu> N------S4R SP---3MS2- (879,37) (22,1) ["power","might"] ["he","she","it"] Hawlahu Hwlh حوله حَولَهُ حوله <.hawluhu> PI------1- SP---3MS2- (879,38) (22,1) ["around","about"] ["he","she","it"] Hawluhu Hwlh حوله حَولُهُ حوله <.hawlihi> PI------2- SP---3MS2- (879,38) (22,1) ["around","about"] ["he","she","it"] Hawlihi Hwlh حوله حَولِهِ حوله <.hawlahu> PI------4- SP---3MS2- (879,38) (22,1) ["around","about"] ["he","she","it"] Hawlahu Hwlh حوله حَولَهُ
The elixir-encode.pl
script can be applied even directly to the output of the elixir
executable, just try it! :)
1 2 3 4 | $$ elixir resolve --lists | elixir-encode.pl حوله $$ elixir resolve --trees | elixir-encode.pl حوله |
1 2 | $$ elixir inflect "(1234,1)" | elixir-encode.pl VP-A-3-S-- VIIA-3-S-- |
VP-A-3MS-- qara'a "q r '" FaCaL |<< "a" qaraOa qrO قرأ قَرَأَ VP-A-3FS-- qara'at "q r '" FaCaL |<< "at" qaraOat qrOt قرأت قَرَأَت VIIA-3MS-- yaqra'u "q r '" "ya" >>| FCaL |<< "u" yaqraOu yqrO يقرأ يَقرَأُ VIIA-3FS-- taqra'u "q r '" "ta" >>| FCaL |<< "u" taqraOu tqrO تقرأ تَقرَأُ