Hunspell / Bugs (archive) / #269 Needs more complex conversion in ICONV/OCONV

#269 Needs more complex conversion in ICONV/OCONV

Milestone: v1.0 (example)

Status: open

Owner: nobody

Labels: None

Priority: 5

Updated: 2015-05-04

Created: 2015-05-04

Creator: Changwoo Ryu

Private: No

How about adding regular expression-like pattern and stripping in ICONV/OCONV, like PFX?

Background: The Korean dictionary is using ICONV and OCONV to convert between usual Hangul syllables and (internal) Hangul Jamo sequences. But the current internal encoding is not very ideal. To implement the ideal internal encoding, it needs to do stateful encoding conversion. The conversion depends on the characters after the source sequence.

For example, for simplicity let's assume there are only one consonant "G" and one vowel "A" in Korean language. A Korean syllable consists of consonant+vowel or consonant+vowel+consonant. So possible syllables are "GA" and "GAG". A word GAGGA should be converted to GAG+GA, while GAGA to GA+GA. This conversion cannot be implemented using the current OCONV.

So how about adding regular expression and stripping in ICONV/OCONV like this:

OCONV <pattern> <result> <stripping>

OCONV GAG[A] <GA> GA
OCONV GAG[^A] <GAG> GAG

Or it would be great if some general stateful conversion feature is implemented.

Needs more complex conversion in ICONV/OCONV

Group

Searches

Help

#269 Needs more complex conversion in ICONV/OCONV

Discussion