Menu

#88 add "lt-trim -d" that, instead of removing, turns analysis into "@form"

open
nobody
2015-12-17
2015-12-10
No

It used to be in traditional apertium that a word prefixed with *asterisk means that it is not found in source language dictionary, with an @at-sign that it's not found in bi-lingual dictionary and #hash-sigh that it's not in the target language dictionary, however, with the trimming feature, all first and second categories both prefixed with *asterisk, making debugging an extra complicated detective work :-(

Possible feature: an lt-trim -d option that makes a debug-trimmed transducer that actually includes the trimmed forms, but changes the analysis from "foo/foo<n><sg>" into "foo/@foo".
We could also give longest matched prefix in bidix, e.g. if bidix had foo<n>, then foo<n><sg> would give @foo<n>.

Discussion

  • Kevin Brubeck Unhammer

    In sme-nob, I have an untrimmed lexicon built as well (sme-nob.automorf-untrimmed.bin), and have a set of modes that use the untrimmed ones. I use that when I'm unsure of why something is unknown in the trimmed analyser. So a * from trimmed and @ from untrimmed means there's no translation of that form. But note that getting an @ from the untrimmed pipeline doesn't mean the form wouldn't get translated at all in the trimmed version, since the disambiguator might have REMOVE'd an analysis that you do have in bidix.

     
  • Kevin Brubeck Unhammer

    But here's an idea we might consider in the future: make a debug-trimmed transducer that actually includes the trimmed words, but changes the analysis from "foo/foo<n><sg>" into "foo/@foo".

     
  • Flammie Pirinen

    Flammie Pirinen - 2015-12-16

    Well having both automata sounds like a good option, I'd vote to have them both modes in default layout created by new scripts, I'll copy it over to my fins as I go. Wha'ts the canonised name for untrimmed modes? foo-bar-untrimmed-debug?

    I like the debug-trirmming idea, it might even give a path way to further useful hacks like knowing nearest matched prefix in bidix or so.

     
  • Kevin Brubeck Unhammer

    • labels: rfe --> rfe, lt-trim
    • summary: Trimmed lexicons don't show source of errors --> add "lt-trim -d" that, instead of removing, turns analysis into "@form"
    • Description has changed:

    Diff:

    --- old
    +++ new
    @@ -1,3 +1,3 @@
     It used to be in traditional apertium that a word prefixed with \*asterisk means that it is not found in source language dictionary, with an @at-sign that it's not found in bi-lingual dictionary and #hash-sigh that it's not in the target language dictionary, however, with the trimming feature, all first and second categories both prefixed with \*asterisk, making debugging an extra complicated detective work :-(
    
    -(sf.net may or may not screw up the formatting above)
    +Possible feature: an lt-trim -d option that makes a debug-trimmed transducer that actually includes the trimmed forms, but changes the analysis from "foo/foo<n><sg>" into "foo/@foo". 
    
     
  • Kevin Brubeck Unhammer

    • Description has changed:

    Diff:

    --- old
    +++ new
    @@ -1,3 +1,4 @@
     It used to be in traditional apertium that a word prefixed with \*asterisk means that it is not found in source language dictionary, with an @at-sign that it's not found in bi-lingual dictionary and #hash-sigh that it's not in the target language dictionary, however, with the trimming feature, all first and second categories both prefixed with \*asterisk, making debugging an extra complicated detective work :-(
    
     Possible feature: an lt-trim -d option that makes a debug-trimmed transducer that actually includes the trimmed forms, but changes the analysis from "foo/foo<n><sg>" into "foo/@foo". 
    +We could also give longest matched prefix in bidix, e.g. if bidix had foo<n><m>, then foo<n><sg> would give @foo<n>.
    
     

Log in to post a comment.