Threefold affix stripping

Help
2009-02-21
2013-06-03
  • Arno Teigseth
    Arno Teigseth
    2009-02-21

    Hi

    Working on the Kichwa dictionary I have some issues with more than twofold suffixes.

    The grammar is
    stem[+case(kasus)]+conjugation[+particle[s]]

    kuyana (to love)
    kuyani (I love)
    kuyaRIni (I'm love myself)

    Up to here we're good, with "twofold suffix stripping" the particles ri and ni can be analyzed.

    However, in quite a few cases three suffixes are needed.

    The suffixes known to me (ri, gri, ku, ra) can be combined, so that they form at least three suffixes (well to this date I haven't seen any word with more than three)

    kuyaRIgriNI = (I'm going to love myself)

    The bad news are that not all suffix combinations are allowed...
    My idea was to list the good combinations:
    ri
    rigri
    rira
    riku
    gri
    ra
    ku
    kura
    ragri

    and give them and the corresponding words flags. It just seemed a lot easier if threefold suffix stripping was possible...
    SFX v: (verb conjugations)
    SFX A: ri/BCDv
    SFX B: ra/Dv
    SFX C: ku/Bv
    SFX D: gri/v

    This way stem+B+D+v would parse correctly...

    Is this possible any way?

     
    • Hi,

      I suggest to use only one suffix for the inflectional suffix combinations. An agglutinative language need a redundant dic file with a lot of word forms with derivational suffixes (or suffix combinations), or the second "suffix" of Hunspell. In fact, the main role of the second affixes of Hunspell is to store these derivational suffixes. By  the way, there is an old Bolivian Quetchua dictionary here:

      http://wiki.services.openoffice.org/w/index.php?title=Dictionaries&oldid=17640#Quetchua_.28Bolivia.29

      It contains 18 thousand (inflectional?) suffix combinations (and a script to generate this them). This dictionary or the script can be optimized to use second suffixes, too.

      Regards,
      László

       
      • Arno Teigseth
        Arno Teigseth
        2009-02-24

        I agree.  If any infix could be used with any verb and in any combination, it could be practical with a multifold suffix stripping. But since

        -not all combinations are allowed
        -not all orders are allowed,

        it's probably best to write a script to do
        stem+infix1/v
        stem+infix2/v
        etc for all the valid combinations, and flag them (eg /v is a verb), so that hunspell can conjugate + compound after that.

         
        • Arno Teigseth
          Arno Teigseth
          2009-02-25

          For the first release of the Kichwa hunspell dictionary I followed your suggestians and created a script to rewrite the .dic file. I use a .dic.MASTER file, and use hunspell-style flags to control the rewriting, in part because hunspell might in the future support multi-fold affix stripping. Another benefit is that the original .dic file does not need hardly any rewriting.

          All lines ending with the flag "v" are considered Verbs, so a

          kuyana//r>+-,whv   in the .dic.MASTER file will be written as
          charichina//v
          charikrina//v
          charikuna//v
          charimuna//v
          charina//v
          charirana//v
          charirina//v
          chariwana//v
                              and so on, to the resulting .dic file, while a less morphing(?) verb:
          mikuna//hv
                              is written
          mikuna//v
          mikuchina//v
                                  only.

          The first release including the script is located at
          http://arno.homelinux.org/files/kichwa/qu_EC-0.1.tgz