Work at SourceForge, help us to make it a better place! We have an immediate need for a Support Technician in our San Francisco or Denver office.

Close

#5 unmunch does not generate expected forms

open
None
5
2006-02-16
2005-11-04
Anonymous
No

Hi,

With

-------------------------------------------------------------------------------------------
SET UTF-8
TRY esianrtolcdugmphbyfvkwzjqxESIANRTOLCDUGMPHBYFVKWZJQX'

SFX B Y 1
SFX B y ied y
-------------------------------------------------------------------------------------------

applied to

-------------------------------------------------------------------------------------------
1
try/B
-------------------------------------------------------------------------------------------

only generates "try", and not "tried". I do not
understand why.
It seems like a bug to me, but perhaps I am
misunderstanding how the rules work.

Some more questions:
1- is the order of the flags in *.dic-entries
important? (I guess not, but it is not mentioned anywhere)
2- is the order of the rules in *.aff important? What
happens if two lines of the same rule apply to a word:
is the first/last taken, or are two strings generated?
(e.g. SFX A y ied y and SFX A 0 ied .)

(Note: the example in hunspell(4) is wrong: the first
suffix rule should be
SFX B 0 d e
As it is, it generates moveed and removeed. As in my
example, "tried" is not generated. I'm using hunspell
1.1.0.

Thanks!

fouvry@acrolinx.de

Discussion

  • Logged In: YES
    user_id=726595

    Hi,

    Unfortunatelly, Unmunch hasn't supported Unicode and other
    new Hunspell features yet. I don't know, when I will have
    time to implement it.

    Flag and rule orders don't matter. Spell checking stops
    at the first match. Morphological analysis doesn't, but
    repeating
    results are dropped from the output.

    Many thanks for your note and the bug report!

    Laci

     
  • Logged In: YES
    user_id=552189

    Hi,

    Thanks for your reply. We can live without UTF-8, that is
    not really a problem. However, I still do not understand
    why the rule does not generate the expected "tried".

    Thanks!

     
  • Logged In: YES
    user_id=552189

    Hi,

    Sorry, I didn't read your reply as carefully as I should
    have done: I'll try to test hunspell without unmunch. (munch
    generates superfluous entries with the German affix file,
    and I was trying to find out why that is happening. E.g.
    ---------------
    2
    test
    testen
    ---------------

    with the current german.aff, gives

    ---------------
    testen/W
    test/P
    ---------------

    which is obviously silly: testen/W only generates itself
    again, but that is already covered by test/P. Question:
    what is to blame: the rules or hunspell?

    Cheers,

    Frederik

     
    • assigned_to: nobody --> nemethl
     
  • Daniel Naber
    Daniel Naber
    2007-09-24

    Logged In: YES
    user_id=39804
    Originator: NO

    Another feature which doesn't seem to be supported in unmunch yet is having .aff definitions like this:

    SFX j 0 0/xoc .

    A .dic which contains "Entwicklungs/j" will be expanded to:

    Entwicklungs
    Entwicklungs0/xoc

    This affects the latest version of the German dictioanry, so it would be great to have this fixed.

     
  • Logged In: YES
    user_id=726595
    Originator: NO

    Hi,

    Newer versions of unmunch has already solved the original problem:

    $ unmunch
    correct syntax is:
    unmunch dic_file affix_file
    $ unmunch m.dic m.aff
    parsing line: SET UTF-8
    parsing line: TRY esianrtolcdugmphbyfvkwzjqxESIANRTOLCDUGMPHBYFVKWZJQX'
    parsing line:
    parsing line: SFX B Y 1
    parsing B entries 1
    affix: ied 3, strip: y 1
    stable 0 num is 1 flag B
    parsed in 0 prefixes and 1 suffixes
    try
    tried

    But handling of the double suffixes is missing.
    About testen/test: German dictionary developed by Björn Jacke uses redundant entries for the right compound word checking.

    Thanks for your reports,
    Laci

     
  • Mau Chege
    Mau Chege
    2009-01-12

    Hi,
    I would like to know if the double affix problem has been fixed in future versions of munch and unmunch

    I am working on a project whose end is morphological generation of words

    When i prepare the rules with multiple affix combinations, they work ok and hunspell can accept the words

    However running munch/unmunch (i really dont know the difference in usage), it doesnt generate the words

    e.g.
    Affix File

    NEEDAFFIX X
    PFX A Y 1
    PFX A 0 pre/X

    SFX B Y 1
    SFX B 0 ized/A

    Dic File
    nasal/B

    If i run unmunch on this i get

    nasal
    nasalized
    nasal/B (this is obviously garbage, it doesnt apply the rule)

    How can i fix this?
    Any help will be appreciated