Hi,
With
-------------------------------------------------------------------------------------------
SET UTF-8
TRY esianrtolcdugmphbyfvkwzjqxESIANRTOLCDUGMPHBYFVKWZJQX'
SFX B Y 1
SFX B y ied y
-------------------------------------------------------------------------------------------
applied to
-------------------------------------------------------------------------------------------
1
try/B
-------------------------------------------------------------------------------------------
only generates "try", and not "tried". I do not
understand why.
It seems like a bug to me, but perhaps I am
misunderstanding how the rules work.
Some more questions:
1- is the order of the flags in *.dic-entries
important? (I guess not, but it is not mentioned anywhere)
2- is the order of the rules in *.aff important? What
happens if two lines of the same rule apply to a word:
is the first/last taken, or are two strings generated?
(e.g. SFX A y ied y and SFX A 0 ied .)
(Note: the example in hunspell(4) is wrong: the first
suffix rule should be
SFX B 0 d e
As it is, it generates moveed and removeed. As in my
example, "tried" is not generated. I'm using hunspell
1.1.0.
Thanks!
fouvry@acrolinx.de
Logged In: YES
user_id=726595
Hi,
Unfortunatelly, Unmunch hasn't supported Unicode and other
new Hunspell features yet. I don't know, when I will have
time to implement it.
Flag and rule orders don't matter. Spell checking stops
at the first match. Morphological analysis doesn't, but
repeating
results are dropped from the output.
Many thanks for your note and the bug report!
Laci
Logged In: YES
user_id=552189
Hi,
Thanks for your reply. We can live without UTF-8, that is
not really a problem. However, I still do not understand
why the rule does not generate the expected "tried".
Thanks!
Logged In: YES
user_id=552189
Hi,
Sorry, I didn't read your reply as carefully as I should
have done: I'll try to test hunspell without unmunch. (munch
generates superfluous entries with the German affix file,
and I was trying to find out why that is happening. E.g.
---------------
2
test
testen
---------------
with the current german.aff, gives
---------------
testen/W
test/P
---------------
which is obviously silly: testen/W only generates itself
again, but that is already covered by test/P. Question:
what is to blame: the rules or hunspell?
Cheers,
Frederik
Logged In: YES
user_id=39804
Originator: NO
Another feature which doesn't seem to be supported in unmunch yet is having .aff definitions like this:
SFX j 0 0/xoc .
A .dic which contains "Entwicklungs/j" will be expanded to:
Entwicklungs
Entwicklungs0/xoc
This affects the latest version of the German dictioanry, so it would be great to have this fixed.
Logged In: YES
user_id=726595
Originator: NO
Hi,
Newer versions of unmunch has already solved the original problem:
$ unmunch
correct syntax is:
unmunch dic_file affix_file
$ unmunch m.dic m.aff
parsing line: SET UTF-8
parsing line: TRY esianrtolcdugmphbyfvkwzjqxESIANRTOLCDUGMPHBYFVKWZJQX'
parsing line:
parsing line: SFX B Y 1
parsing B entries 1
affix: ied 3, strip: y 1
stable 0 num is 1 flag B
parsed in 0 prefixes and 1 suffixes
try
tried
But handling of the double suffixes is missing.
About testen/test: German dictionary developed by Björn Jacke uses redundant entries for the right compound word checking.
Thanks for your reports,
Laci
Hi,
I would like to know if the double affix problem has been fixed in future versions of munch and unmunch
I am working on a project whose end is morphological generation of words
When i prepare the rules with multiple affix combinations, they work ok and hunspell can accept the words
However running munch/unmunch (i really dont know the difference in usage), it doesnt generate the words
e.g.
Affix File
NEEDAFFIX X
PFX A Y 1
PFX A 0 pre/X
SFX B Y 1
SFX B 0 ized/A
Dic File
nasal/B
If i run unmunch on this i get
nasal
nasalized
nasal/B (this is obviously garbage, it doesnt apply the rule)
How can i fix this?
Any help will be appreciated
Please, someone close this bug report. It works as expected with a build from the current source code.