I found a weird bug where the penalty does not seem to apply to the match.
In my case, I want to apply a 10% penalty to any contents, to make sure I validate it. But the first segment has a 100% match to which the 10% penalty is not applied.
It looks like the "(1)" part at the end of the first segment is the reason why the penalty does not work.
You defined the folder as
penalty-xxx, notpenalty-<NUM>in your commit. @brandelunehttps://github.com/omegat-org/omegat/commit/fd343296
There is no such feature in OmegaT that
penalty-<NUM>reconize NUM to set penalty score.You can see a search result of
penaltyin OmegaT source code.https://github.com/search?q=repo%3Aomegat-org%2Fomegat+penalty+language%3AJava&type=code
Do you really implemented the feature?
This is wrong, the feature exists in FindMatches:
private static final Pattern SEARCH_FOR_PENALTY = Pattern.compile("penalty-(\\d+)");To activate it, you must create a directory with the number in the name,
for example tm/penalty-10
That is a little bit strange that your search does not find SEARCH_FOR_PENALTY in FuzzyMatches, I don't know why.
In any case creating a constant PENALTY_TM = "penalty-xxx" and even a directory getTMPenaltyRoot does not make sense, precisely because the number is not identical from one project to another.
I created this feature around 2012-2013, however at this time I did not have write access to SVN, so if the comment associated to the feature does not mention penalty, this is not my fault. At this time Didier considered that RFE is enough as a documentation, and indeed, what is implemented is conform to the description in https://sourceforge.net/p/omegat/feature-requests/786/
Last edit: Thomas CORDONNIER 2024-11-28
A "penalty-xxx" subfolder is created in a new project to ease feature discoverability. Same for the other 4 types of "special" tm folders.
There are only 5 commits in OmegaT development history that have word
penaltyin commit message.https://github.com/search?q=repo%3Aomegat-org%2Fomegat+penalty+&type=commits
I don't find the feature implementation in commits.
@didierbr leave a message in RFE#786 to claim implementation of the feature in svn repository and included in OmegaT 2.6.0.
A commit was https://github.com/omegat-org/omegat/commit/c8041b68
A code block which was added by @didierbr was removed by @alex73 in 2013 at OmegaT 3.0.8.
Merge matches search for matches pane and match statistic
https://github.com/omegat-org/omegat/commit/4124c9ed#diff-04a6822cd2f3c4b93deb163d62e3b6a5e3d6cfde4f056dd3a469b5578ee65374L198-L214
But the moved the code chunk in another source file
https://github.com/omegat-org/omegat/commit/4124c9ed#diff-4a7227925507f16d7212aa76934ef37dee06b2637ad648812e0d76eb752b3875R176-R193
Last edit: Hiroshi Miura 2024-02-15
When alex changed the code,
FindMatchesconstructor has a code ;and
searchmethod is;As comment express, even when the project configured that segmentation disabled, which is a same condition as reported example project, the
searchmethod split paragraph and match every segment.For example the reported project, it split the source "地力の搾取と浪費が現われる。(1)" to two segments, one is "地力の搾取と浪費が現われる。" and another is "(1)"
When you visit a source text "地力の搾取と浪費が現われる。(1)" and your project configured not to allow empty translation, OmegaT automatically add a translation as same as the source.
A partial match for "地力の搾取と浪費が現われる。" is match with the source and gives the result.
I do not know a reason but there is hard-coded to allow "SeparateSegmentMatch" to true.
which has been unchanged from when alex introduced the feature in 2013.
Currently the feature is at FindMatches class line 234
https://github.com/omegat-org/omegat/blob/master/src/org/omegat/core/statistics/FindMatches.java#L234-L240
A regex is defined as
When observing a code with debugger, penalty score, a variable
tmenPenaltyvalue10is added to result without any issue.A
fuzzy matchespane showsWhere is a problem?
This is what I get.
There should not be a 100% match.
This entry dos not come from
tm/penalty-xxx.The entry come from
tm/penalty-010gives 90%.The same entry in
segment_1.tmxgenerates the 90% match and the 100%.No, if you look in the first match in your screenshot there is nothing as file name
The first entry comes from project memory, so it does not come from tm/penalty-10 and the penalty does not apply to it
There is nothing in the project memory. Which is the reason why this is weird.
Diff:
Here is a new version of the project. With 4 sentences. Each sentence has one exact match in a tmx located in "penalty-010".
In segment 1, 2 and 3 the fuzzy in
segment_1.tmxappears as a project fuzzy and not as a/tmfuzzy. And thus the penalty is not applied. Even though the fuzzy has not yet entered theproject_save.tmx.Last edit: Jean-Christophe Helary 2024-02-16
This is not a bug in
tm/penalty-xxxfeature.Is it still happened in recent weekly release?
There is a different issue, where the match appears twice (I’ll open a different tracker later), but the penalty is respected so I consider that the fix is working.