$ echo hi | apertium-destxt
hi.[][
]
apertium-destxt inserts text that wasn't here: a period before EOF and at empty lines. Similarly, apertium-deshtml puts periods after
's and such.
Maybe this helps with handling certain headlines, but it makes too many assumptions (and often headlines aren't marked in such a way, or already have punctuation, or the language doesn't even use "." as end-of-sentence markers), and it can be a real annoyance. Can we remove it?
Kevin,
how can I comment on this ticket through SF?
Mikel
2015-04-16 12:17 GMT+02:00 Kevin Brubeck Unhammer unhammer@users.sf.net:
--
Mikel L. Forcada E-mail: mlf@dlsi.ua.es
Departament de Llenguatges Phone: +34-96-590-9776
i Sistemes Informàtics also +34-96-590-3772.
UNIVERSITAT D'ALACANT Fax: +34-96-590-9326, -3464
E-03071 ALACANT, Spain.
URL: http://www.dlsi.ua.es/~mlf
Related
Tickets:
#68Through the link to the ticket: http://sourceforge.net/p/apertium/tickets/68
but replying to the email seems to work, too.
You just did :-)
I'm more for making it a (non-default) option.
We need to insert the [] i^Hunconditionally though; I tried running without that but it messes up other tools down the line
maybe
Ah, ok. Updated.
:-) you commit? I see no downsides …
Mikel clearly had a comment to add, I'm going to wait to read it.
For some languages it will degrade the performance of the POS tagger and apply wrong rules to sentences when translating XML-based document types due to the lack of explicit boundaries in titles, lists and other sentences.
As you might see, the “.[]” is not introduced always but when it makes sense.
The string being inserted is .[]. Anyone interested can remove the string with a simple sed line.
Sergio
Related
Tickets:
#68but yeah, if we only remove it when the user specifies it, everyone should be ok?
A new option is always welcomed :)
Related
Tickets:
#68So... you're ok with my patch? (It adds a new option, -n, to remove the dot; it is not enabled by default)
Ok
Related
Tickets:
#68Great, committed in r59917.
Diff: