Words in dictionary aren't recognized
Status: Beta
Brought to you by:
kevina
Using Aspell 0.60.6 on Ubuntu, and a fresh install of the Irish dictionary aspell5-ga-4.4-0.tar.bz2.
Immediately after "sudo make install" of the dictionary, I run this, expecting no output:
$ /usr/bin/word-list-compress d < ga.cwl | iconv -f iso-8859-1 -t utf8 | aspell --lang=ga list
d'orgán
m'orgán
n-arm
t-arm
These four aren't recognized though. The other 326042 are ok!
So the problem is that d'orgán and Dorgan are similar in that they both have the same "clean" value of "dorgan" but the soundslike is different at "T*R*K*" and "T*R*K*N" respectively, which violates some of my assumptions I made. Not sure how I am going to fix this.
And fixing this will almost certainly require breaking the dictionary format, further complicating things.
Ok, maybe we're honing in on the problem. Both of those words *should* have a soundslike of "T*R*K*N". But I can't find a problem in the gaeilge_phonet.dat file.
As a simpler example, consider "organ". Should have a soundalike of *R*K*N but it comes out as *R*K*
The rule that's causing the trouble appears to be:
R(BGM)- R*
I think this because, for example, the string "oragan" correctly gives *R*K*N.
Am I not allowed to use the - syntax together with characters in parens as above? That syntax seems to work correctly other places.
There could also be a bug in the phonet code. I did not write the original code, and it has been a while since I looked at it. I will try to have a look sometime soon.
If you fell so inclined you are welcome to look for yourself.
This issue has moved to GitHub: https://github.com/GNUAspell/aspell/issues/476