#4 wildcards in character stripping

closed
nobody
None
5
2006-10-09
2006-10-06
Anonymous
No

Would it be possible to have the "." wildcard work not
only as a condition, but also to identify character
stripping? For example

SFX A Y 1
SFX A . suf1 .
SFX A . suf2 .

with a dic file like

fooa/A
barb/A
bazc/A

would create

foosuf1
foosuf2
barsuf1
barsuf2
bazsuf1
bazsuf2

much more economically than the current

SFX A Y 1
SFX A a suf1 .
SFX A b suf1 .
SFX A c suf1 .
SFX A a suf2 .
SFX A b suf2 .
SFX A c suf2 .

In the language I'm dealing with, the dozens of suffix
forms for future and conditional all apply to all verb
conjugations, yet the lemmas for these conjugations
differ in their final letter, which is stripped when
the suffixes are added. It seems a waste to have to
duplicate the list of suffixes for each conjugation,
when stripping the wildcard "." would work for all of them.

Thanks

Discussion

  • Németh László

    Logged In: YES
    user_id=726595

    Hi,

    Unfortunatelly adding unknown characters (the suggested
    stripping patterns) during word analysis is quite resource
    critical. (By the way, now we work on an a similar, but
    restricted stripping pattern feature to handle infixes.)
    I suggest to implement or use pattern generators for affix
    file development.

    Thanks for the report,
    Best regards,
    Laci

     
  • Németh László

    • status: open --> closed
     

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:

JavaScript is required for this form.





No, thanks