#243 Words in dictionary aren't recognized

0.60
open
true-bug (70)
6
2011-07-19
2010-07-26
Kevin Scannell
No

Using Aspell 0.60.6 on Ubuntu, and a fresh install of the Irish dictionary aspell5-ga-4.4-0.tar.bz2.

Immediately after "sudo make install" of the dictionary, I run this, expecting no output:

$ /usr/bin/word-list-compress d < ga.cwl | iconv -f iso-8859-1 -t utf8 | aspell --lang=ga list
d'orgán
m'orgán
n-arm
t-arm

These four aren't recognized though. The other 326042 are ok!

Discussion

  • Kevin Atkinson
    Kevin Atkinson
    2011-06-27

    • priority: 5 --> 6
     
  • Kevin Atkinson
    Kevin Atkinson
    2011-07-03

    • labels: --> true-bug
     
  • Kevin Atkinson
    Kevin Atkinson
    2011-07-03

    So the problem is that d'orgán and Dorgan are similar in that they both have the same "clean" value of "dorgan" but the soundslike is different at "T*R*K*" and "T*R*K*N" respectively, which violates some of my assumptions I made. Not sure how I am going to fix this.

     
  • Kevin Atkinson
    Kevin Atkinson
    2011-07-04

    And fixing this will almost certainly require breaking the dictionary format, further complicating things.

     
  • Kevin Scannell
    Kevin Scannell
    2011-07-18

    Ok, maybe we're honing in on the problem. Both of those words *should* have a soundslike of "T*R*K*N". But I can't find a problem in the gaeilge_phonet.dat file.

    As a simpler example, consider "organ". Should have a soundalike of *R*K*N but it comes out as *R*K*

    The rule that's causing the trouble appears to be:

    R(BGM)- R*

    I think this because, for example, the string "oragan" correctly gives *R*K*N.

    Am I not allowed to use the - syntax together with characters in parens as above? That syntax seems to work correctly other places.

     
  • Kevin Atkinson
    Kevin Atkinson
    2011-07-19

    • assigned_to: nobody --> kevina
     
  • Kevin Atkinson
    Kevin Atkinson
    2011-07-19

    There could also be a bug in the phonet code. I did not write the original code, and it has been a while since I looked at it. I will try to have a look sometime soon.

    If you fell so inclined you are welcome to look for yourself.