File | Date | Author | Commit |
---|---|---|---|
documents | 2021-04-15 | antonocube | [4e5531] Added Spanish demos. |
examples | 2021-04-27 | antonocube | [e67dc7] feat: Added Greek examples. |
lib | 2021-04-27 | spyretta.leiv | [bf2c54] proofread and additions in Worded-number-specs-... |
t | 2021-04-27 | spyretta.leiv | [bf2c54] proofread and additions in Worded-number-specs-... |
.gitignore | 2021-04-10 | Anton Antonov | [13bc44] Initial commit |
.travis.yml | 2021-04-16 | antonocube | [6a318b] First version. |
LICENSE | 2021-04-10 | antonocube | [45a9b6] First version -- interpretation only. |
META6.json | 2021-04-27 | antonocube | [368d1e] feat: Added Greek role and actions class. |
README.md | 2021-04-19 | antonocube | [ecc0ef] docs: Added todo/todone item for Japanese. |
This repository provides a Raku package with functions for the
generation, parsing, and interpretation of numeric word forms in different languages.
The initial versions of the code in this repository can be found in the GitHub repository [AAr1].
The Raku package
Lingua::Number
, [BL1],
provides word forms (cardinal, ordinal, etc.) generation in many languages.
(But at least for one language the produced forms are incorrect.)
The Raku package
Lingua::EN::Numbers
, [SS1],
also provides word forms (cardinal, ordinal, etc.) generation in English.
The parsers and interpreters of this package can be seen as complementary
to the functions in [BL1, SS1].
Remark: Maybe a more complete version of this package should be merged with
Lingua::Number
, [BL1].
Remark: I can judge the quality of the results only of the languages:
Bulgarian, English, and Russian. The numeric word form interpreters for the rest of the languages
pass testing, but they might have potentially many deficiencies.
(That are easily detected by people who have mastered those languages.)
1. Install Raku (Perl 6) : https://raku.org/downloads .
2. Make sure you have Zef Module Installer.
zef --version
in the command line.3. Open a command line program. (E.g. Terminal on Mac OS X.)
4. Run the command:
zef install https://github.com/antononcube/Raku-Lingua-NumericWordForms.git
Generation of numeric word forms:
use Lingua::NumericWordForms;
say to-numeric-word-form(8093);
say to-numeric-word-form(8093, 'Bulgarian'); # not implemented yet
say to-numeric-word-form(8093, 'Russian'); # not implemented yet
Here is a screenshot of the results:
Interpretation of numeric word forms:
use Lingua::NumericWordForms;
say from-numeric-word-form('one thousand and twenty three');
say from-numeric-word-form('хиляда двадесет и три', 'Bulgarian');
say from-numeric-word-form('tysiąc dwadzieścia trzy', 'Polish');
say from-numeric-word-form('одна тысяча двадцать три', 'Russian');
say from-numeric-word-form('mil veintitrés', 'Spanish');
Here is a screenshot of the results:
The function from-numeric-word-form
can also take a list or array of strings as a first argument.
Here is an example:
say from-numeric-word-form(['mil veintitrés', 'dos mil setenta y dos'], 'Spanish');
For more examples see the file
NumericWordForms-examples.raku.
The returned result can be an Int
object or a Str
object -- that is controlled with
the adverb number
(which by default is True
.) Here is an example:
my $res = from-numeric-word-form('one thousand and twenty three');
say $res, ' ', $res.WHAT;
$res = from-numeric-word-form('one thousand and twenty three', :!number);
say $res, ' ', $res.WHAT;
Automatic language detection is invoked if the second argument is 'Automatic' or not specified:
say from-numeric-word-form('tysiąc dwadzieścia trzy', 'Automatic'):p;
say from-numeric-word-form(['tysiąc dwadzieścia trzy', 'twenty three']):p;
The adverb :p
specifies whether the result should be a Pair
object or a List
of Pair
objects
with the detected languages as keys.
Translation from one language to another:
say translate-numeric-word-form('хиляда двадесет и три', 'Bulgarian' => 'English');
(Currently that function translates to English only.)
This package provides (exports) roles that can be used in grammars or roles in other packages, applications, etc.
For example, see the roles:
Lingua::NumericWordForms::Roles::Bulgarian::WordedNumberSpec
Lingua::NumericWordForms::Roles::English::WordedNumberSpec
A grammar or role that does the roles above should use the rule:
<numeric-word-form>
For code examples see the file
Parsing-examples.raku.
Remark: The role Lingua::NumericWordForms::Roles::WordedNumberSpec
and the corresponding
actions class Lingua::NumericWordForms::Actions::WordedNumberSpec
are "abstract".
They were introduced in order to have simpler roles and actions code
(and non-duplicated implementations.) Hence, that role and class should not be used in
grammars and roles outside of this package.
The following TODO items are ordered by priority, the most important are on top.
[ ] Expand parsing beyond trillions
[X] Automatic determination of the language
[ ] Word form generation:
[ ] General algorithm
[ ] Full, consistent Persian numbers parsing.
Currently, Persian number parsing works only for numbers less than 101.
[X] General strategy for parsing and interpretation of
numeric word forms of East Asia languages
[X] Implementation for Japanese.
[ ] Implement parsing of ordinal numeric word forms
[ ] Implement parsing of year "shortcut" word forms, like "twenty o three"
[ ] Implement parsing of numeric word forms for rationals, like "five twelfths"
[X] Translation function (from one language to another)
[AAr1] Anton Antonov,
Raku::DSL::Shared.
[BL1] Brent "Labster" Laabs,
Lingua::Number
.
[SS1] Larry Wall, Steve Schulze,
Lingua::EN::Numbers.