Unmunch with flag num isn't working for long numbers separated with commas.
Ex: 1242,1231,4232,4343
Trying this outputs almost all affix combinations with a given dictionary word.
Problem seems to be in expand_rootword() and that following code checks only one character. So the programme successfully matches even character 4 with the flag 4232 because it contains 4.
if (strchr(ap,(stable[i].aep)->achar)){
suf_add(ts, wl, stable[i].aep, stable[i].num);
}
I'm attaching the dic and aff pair I tried.
Sinhala language dic and aff files
Patch proposed by the reporter: http://sourceforge.net/p/hunspell/patches/37/
I’ve had to use the 2KiB Bash script at https://github.com/kscanne/hunspell-gd/blob/master/unmunch.sh which seems to somehow fix unmunch with awk magic. Could you please merge this fix upstream?
OK, the script does not fix unmunch, it is a simple implementation of it. I also noticed that when there are two ‘.dic’ entries with the same lemma and a different combination of flags, only the last combination is used.
For example, given a ‘.dic’ with:
word/10,15
word/10
The result of the script is the same as it would be with a ‘.dic’ with:
word/10
Removing the ‘word/10’ line results in the desired output.
I ended up implementing my own unmunch in Python. Quite slower but fast enough for me, and it supports what I need (and so far nothing else).
The script is available in the https://github.com/eitsl/hunspell repository:
https://github.com/eitsl/hunspell/blob/master/utils/unmunch.py (command-line interface)
https://github.com/eitsl/hunspell/blob/master/hunspell.py (implementation, _Unmuncher class)