Apertium: Machine Translation Toolbox / Tickets / #88 add "lt-trim -d" that, instead of removing, turns analysis into "@form"

Kevin Brubeck Unhammer - 2015-12-10

In sme-nob, I have an untrimmed lexicon built as well (sme-nob.automorf-untrimmed.bin), and have a set of modes that use the untrimmed ones. I use that when I'm unsure of why something is unknown in the trimmed analyser. So a * from trimmed and @ from untrimmed means there's no translation of that form. But note that getting an @ from the untrimmed pipeline doesn't mean the form wouldn't get translated at all in the trimmed version, since the disambiguator might have REMOVE'd an analysis that you do have in bidix.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Kevin Brubeck Unhammer - 2015-12-10

But here's an idea we might consider in the future: make a debug-trimmed transducer that actually includes the trimmed words, but changes the analysis from "foo/foo<n><sg>" into "foo/@foo".

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Flammie Pirinen - 2015-12-16

Well having both automata sounds like a good option, I'd vote to have them both modes in default layout created by new scripts, I'll copy it over to my fins as I go. Wha'ts the canonised name for untrimmed modes? foo-bar-untrimmed-debug?

I like the debug-trirmming idea, it might even give a path way to further useful hacks like knowing nearest matched prefix in bidix or so.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

labels: rfe --> rfe, lt-trim
summary: Trimmed lexicons don't show source of errors --> add "lt-trim -d" that, instead of removing, turns analysis into "@form"
Description has changed:

Diff:

--- old
+++ new
@@ -1,3 +1,3 @@
 It used to be in traditional apertium that a word prefixed with \*asterisk means that it is not found in source language dictionary, with an @at-sign that it's not found in bi-lingual dictionary and #hash-sigh that it's not in the target language dictionary, however, with the trimming feature, all first and second categories both prefixed with \*asterisk, making debugging an extra complicated detective work :-(

-(sf.net may or may not screw up the formatting above)
+Possible feature: an lt-trim -d option that makes a debug-trimmed transducer that actually includes the trimmed forms, but changes the analysis from "foo/foo<n><sg>" into "foo/@foo".

Kevin Brubeck Unhammer - 2015-12-17

https://github.com/goavki/bootstrap/issues/17 has the Makefile/modes.xml feature request

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Description has changed:

Diff:

--- old
+++ new
@@ -1,3 +1,4 @@
 It used to be in traditional apertium that a word prefixed with \*asterisk means that it is not found in source language dictionary, with an @at-sign that it's not found in bi-lingual dictionary and #hash-sigh that it's not in the target language dictionary, however, with the trimming feature, all first and second categories both prefixed with \*asterisk, making debugging an extra complicated detective work :-(

 Possible feature: an lt-trim -d option that makes a debug-trimmed transducer that actually includes the trimmed forms, but changes the analysis from "foo/foo<n><sg>" into "foo/@foo". 
+We could also give longest matched prefix in bidix, e.g. if bidix had foo<n><m>, then foo<n><sg> would give @foo<n>.

add "lt-trim -d" that, instead of removing, turns analysis into "@form"

The free and open-source rule-based machine translation platform

Searches

Help

#88 add "lt-trim -d" that, instead of removing, turns analysis into "@form"

Discussion