Menu

#246 identity handling in lookup / affix-guessify

future
open
nobody
None
1
2014-08-14
2014-06-01
No

when using @IDENTITY_SYMBOL@ to form prefix in affix-guessify, it doesn't lookup properly with non-trivial automata such as omorfi. When I replace @IDENTITY_SYMBOL@ with x it works like it should but only with x. See patch.

$ tools/src/hfst-affix-guessify -w 1 ~/Koodit/omorfi/src/temporary.ftb3.hfst -o guess.hfst
$ hfst-lookup guess.hfst 
hfst-lookup: warning: It is not possible to perform fast lookups with OpenFST, std arc, tropical semiring format automata.
Using HFST basic transducer format and performing slow lookups
> xtalo
xtalo   x#talo N Nom Sg 1,000000
xtalo   xalo N Nom Sg   1,000000
xtalo   xlo N Nom Sg    1,000000
xtalo   xlo N Nom Sg    1,000000
xtalo   xo N Nom Sg 1,000000
xtalo   xtalo N Nom Sg  1,000000
xtalo   xtalo N Prop Nom Sg 1,000000
xtalo   xtalo N Nom Sg  1,000000
xtalo   x N Abbr#talo N Nom Sg  2,000977
xtalo   x#talo N Nom Sg 2,000977
xtalo   x%#talo N Nom Sg    2,000977
xtalo   x%%#talo N Nom Sg   2,000977
xtalo   x%<Del%>%%#talo N Nom Sg    2,000977
xtalo   x%>%%#talo N Nom Sg 2,000977
xtalo   x-#talo N Nom Sg    2,000977
xtalo   x<Del%>%%#talo N Nom Sg 2,000977
xtalo   x>%%#talo N Nom Sg  2,000977
xtalo   xDel%>%%#talo N Nom Sg  2,000977
xtalo   xel%>%%#talo N Nom Sg   2,000977
xtalo   xet#talo N Nom Sg   2,000977
xtalo   xl%>%%#talo N Nom Sg    2,000977
xtalo   xo#talo N Nom Sg    2,000977
xtalo   xt#talo N Nom Sg    2,000977
xtalo   x←%<Del%>%%#talo N Nom Sg   2,000977

> ytalo
ytalo   ytalo+? inf
1 Attachments

Related

Bugs: #246

Discussion

  • Krister Lindén

    Krister Lindén - 2014-06-01

    The symptoms look like there would be no harmonization of the two
    automata "." and "omorfi" before concatenating them into a guesser,
    i.e. if x does not appear in OMorFi, it is treated as an unknown
    character by the final concatenated automaton, but as y likely exists in
    OMorFi, then without harmoniztion, it is treated as a known character
    that is intentionally left out from the .
    affix.
    --
    Krister

    On 1.6.2014 22:01, Flammie Pirinen wrote:


    [bugs:#246] http://sourceforge.net/p/hfst/bugs/246/ identity handling
    in lookup / affix-guessify

    Status: open
    Group: future
    Created: Sun Jun 01, 2014 07:01 PM UTC by Flammie Pirinen
    Last Updated: Sun Jun 01, 2014 07:01 PM UTC
    Owner: nobody

    when using @/IDENTITY_SYMBOL/@ to form prefix in affix-guessify, it
    doesn't lookup properly with non-trivial automata such as omorfi. When I
    replace @/IDENTITY_SYMBOL/@ with x it works like it should but only with
    x. See patch.

    $ tools/src/hfst-affix-guessify -w 1 ~/Koodit/omorfi/src/temporary.ftb3.hfst -o guess.hfst
    $ hfst-lookup guess.hfst
    hfst-lookup: warning: It is not possible to perform fast lookups with OpenFST, std arc, tropical semiring format automata.
    Using HFST basic transducer format and performing slow lookups

    xtalo
    xtalo x#talo N Nom Sg 1,000000
    xtalo xalo N Nom Sg 1,000000
    xtalo xlo N Nom Sg 1,000000
    xtalo xlo N Nom Sg 1,000000
    xtalo xo N Nom Sg 1,000000
    xtalo xtalo N Nom Sg 1,000000
    xtalo xtalo N Prop Nom Sg 1,000000
    xtalo xtalo N Nom Sg 1,000000
    xtalo x N Abbr#talo N Nom Sg 2,000977
    xtalo x#talo N Nom Sg 2,000977
    xtalo x%#talo N Nom Sg 2,000977
    xtalo x%%#talo N Nom Sg 2,000977
    xtalo x%<Del%>%%#talo N Nom Sg 2,000977
    xtalo x%>%%#talo N Nom Sg 2,000977
    xtalo x-#talo N Nom Sg 2,000977
    xtalo x<Del%>%%#talo N Nom Sg 2,000977
    xtalo x>%%#talo N Nom Sg 2,000977
    xtalo xDel%>%%#talo N Nom Sg 2,000977
    xtalo xel%>%%#talo N Nom Sg 2,000977
    xtalo xet#talo N Nom Sg 2,000977
    xtalo xl%>%%#talo N Nom Sg 2,000977
    xtalo xo#talo N Nom Sg 2,000977
    xtalo xt#talo N Nom Sg 2,000977
    xtalo x←%<Del%>%%#talo N Nom Sg 2,000977

    ytalo
    ytalo ytalo+? inf


    Sent from sourceforge.net because you indicated interest in
    https://sourceforge.net/p/hfst/bugs/246/

    To unsubscribe from further messages, please visit
    https://sourceforge.net/auth/subscriptions/

     

    Related

    Bugs: #246

  • Erik Axelson

    Erik Axelson - 2014-06-03

    It seems that the problem is in harmonization of the guesser transducer. Hfst-lookup seems to handle identities just fine:

    echo "?" | hfst-regexp2fst > tmp
    hfst-lookup tmp

    a
    a a 0.000000

    b
    b b 0.000000

    echo "? - a" | hfst-regexp2fst > tmp
    hfst-lookup tmp

    a
    a a+? inf

    b
    b b 0.000000

     
  • Flammie Pirinen

    Flammie Pirinen - 2014-06-03

    The guesser is not mere concatenation of < ? @"lexicon" > but can lead to any suffix of lexicon too so it is built by hand "unharmonised". Inserting all symbols from lexicon to all the identities "harmonising" seems too heavy. Maybe simple weighted guessers just cannot be done using HFST.