APY should support passing a query parameter that prevents *, #, @, etc. from appearing in translation output. Html-tools should provide a checkbox for this functionality (perhaps extracted to a settings panel).
1) Plain removing *, #, / and @ from the stream after translation will also remove such marks that existed in input. One could escape the symbols in input, but that's quite error-prone (there's already escaping happening in apertium-deshtml, what would the interactions be there?).
The method uses by "apertium -u" is to say to remove them marks from the stream in the final generator step, by running "lt-proc -n" instead of "lt-proc -g".
2) A simple solution is to run two copies of each pipeline, one with -n and one with -g. But APY is already taking a lot of memory running all pairs, doubling that memory usage is not a good solution.
3) But since it's only the last step that differs, we could just have two copies of each generator running, and have the rest of the pipeline shared. Assuming FST's are about the same size and the main real memory hog in pipelines, this'd only mean about 1.3x memory increase rather than 2x.
4) A better, but more involved, solution for APY's case would be to implement stream commands similar to http://beta.visl.sdu.dk/cg3/chunked/streamcmds.html Say the stream command to turn off unknown word marking was [<apertium unknown="off">]</apertium>, lt-proc -g would look for that when reading the input stream and switch to no marking on-the-fly. APY could then easily input that for unknown-off requests.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
1) Plain removing *, #, / and @ from the stream after translation will also remove such marks that existed in input. One could escape the symbols in input, but that's quite error-prone (there's already escaping happening in apertium-deshtml, what would the interactions be there?).
The method uses by "apertium -u" is to say to remove them marks from the stream in the final generator step, by running "lt-proc -n" instead of "lt-proc -g".
2) A simple solution is to run two copies of each pipeline, one with -n and one with -g. But APY is already taking a lot of memory running all pairs, doubling that memory usage is not a good solution.
3) But since it's only the last step that differs, we could just have two copies of each generator running, and have the rest of the pipeline shared. Assuming FST's are about the same size and the main real memory hog in pipelines, this'd only mean about 1.3x memory increase rather than 2x.
4) A better, but more involved, solution for APY's case would be to implement stream commands similar to http://beta.visl.sdu.dk/cg3/chunked/streamcmds.html Say the stream command to turn off unknown word marking was [<apertium unknown="off">]</apertium>, lt-proc -g would look for that when reading the input stream and switch to no marking on-the-fly. APY could then easily input that for unknown-off requests.
r54015: /trunk/apertium-tools/apertium-apy/servlet.py: sub-optimal implementation of markUnknown for /translate needs review
Seems to work :)