Menu

#1248 tm/penalty-nnn value is not respected

6.1
open-fixed
None
5
2025-09-11
2024-02-06
No

I found a weird bug where the penalty does not seem to apply to the match.

In my case, I want to apply a 10% penalty to any contents, to make sure I validate it. But the first segment has a 100% match to which the 10% penalty is not applied.

It looks like the "(1)" part at the end of the first segment is the reason why the penalty does not work.

2 Attachments

Discussion

  • Hiroshi Miura

    Hiroshi Miura - 2024-02-15

    You defined the folder as penalty-xxx , not penalty-<NUM> in your commit. @brandelune

    https://github.com/omegat-org/omegat/commit/fd343296

       /** Project subfolder for generic penalty based translation memories. */
       public static final String PENALTY_TM = "penalty-xxx";
    

    There is no such feature in OmegaT that penalty-<NUM> reconize NUM to set penalty score.

    You can see a search result of penalty in OmegaT source code.
    https://github.com/search?q=repo%3Aomegat-org%2Fomegat+penalty+language%3AJava&type=code

    Do you really implemented the feature?

     
    • Thomas CORDONNIER

      There is no such feature in OmegaT that penalty-<num> reconize NUM to set penalty score.</num>

      This is wrong, the feature exists in FindMatches:
      private static final Pattern SEARCH_FOR_PENALTY = Pattern.compile("penalty-(\\d+)");

      To activate it, you must create a directory with the number in the name,
      for example tm/penalty-10
      That is a little bit strange that your search does not find SEARCH_FOR_PENALTY in FuzzyMatches, I don't know why.

      In any case creating a constant PENALTY_TM = "penalty-xxx" and even a directory getTMPenaltyRoot does not make sense, precisely because the number is not identical from one project to another.

      I created this feature around 2012-2013, however at this time I did not have write access to SVN, so if the comment associated to the feature does not mention penalty, this is not my fault. At this time Didier considered that RFE is enough as a documentation, and indeed, what is implemented is conform to the description in https://sourceforge.net/p/omegat/feature-requests/786/

       

      Last edit: Thomas CORDONNIER 2024-11-28
      • Jean-Christophe Helary

        A "penalty-xxx" subfolder is created in a new project to ease feature discoverability. Same for the other 4 types of "special" tm folders.

         
  • Hiroshi Miura

    Hiroshi Miura - 2024-02-15

    There are only 5 commits in OmegaT development history that have word penalty in commit message.
    https://github.com/search?q=repo%3Aomegat-org%2Fomegat+penalty+&type=commits

    I don't find the feature implementation in commits.

     
  • Hiroshi Miura

    Hiroshi Miura - 2024-02-15

    @didierbr leave a message in RFE#786 to claim implementation of the feature in svn repository and included in OmegaT 2.6.0.

    A commit was https://github.com/omegat-org/omegat/commit/c8041b68

     
  • Hiroshi Miura

    Hiroshi Miura - 2024-02-15
     

    Last edit: Hiroshi Miura 2024-02-15
    • Hiroshi Miura

      Hiroshi Miura - 2024-02-16

      When alex changed the code, FindMatches constructor has a code ;

      public FindMatches(IProject project, int maxCount, boolean allowSeparateSegmentMatch,
                  boolean searchExactlyTheSame, boolean applyThreshold) {
      }
      

      and search method is;

          if (allowSeparateSegmentMatch && !project.getProjectProperties().isSentenceSegmentingEnabled()) {
                  separateSegmentMatcher = new FindMatches(project, 1, false, true);
              }
      
      // ... snip ....
      
              if (separateSegmentMatcher != null) {
                  // split paragraph even when segmentation disabled, then find
                  // matches for every segment
      

      As comment express, even when the project configured that segmentation disabled, which is a same condition as reported example project, the search method split paragraph and match every segment.

      For example the reported project, it split the source "地力の搾取と浪費が現われる。(1)" to two segments, one is "地力の搾取と浪費が現われる。" and another is "(1)"

      When you visit a source text "地力の搾取と浪費が現われる。(1)" and your project configured not to allow empty translation, OmegaT automatically add a translation as same as the source.

      A partial match for "地力の搾取と浪費が現われる。" is match with the source and gives the result.

      I do not know a reason but there is hard-coded to allow "SeparateSegmentMatch" to true.

      FindMatches finder = new FindMatches(project, OConsts.MAX_NEAR_STRINGS, true, false); // 3rd argument is true
      

      which has been unchanged from when alex introduced the feature in 2013.

       
  • Hiroshi Miura

    Hiroshi Miura - 2024-02-16

    A regex is defined as

       private static final Pattern SEARCH_FOR_PENALTY = Pattern.compile("penalty-(\\d+)");
    
     
  • Hiroshi Miura

    Hiroshi Miura - 2024-02-16

    When observing a code with debugger, penalty score, a variable tmenPenalty value 10 is added to result without any issue.

     
  • Hiroshi Miura

    Hiroshi Miura - 2024-02-16

    A fuzzy matches pane shows

    1. 地力の搾取と浪費が現われる。(1)
      weird behavior
      <90/90/90% /home/miurahr/Projects/Translation/omegat-test/tm/penalty-010/segment_1.tmx>

    Where is a problem?

     
    • Jean-Christophe Helary

      This is what I get.

       
      • Jean-Christophe Helary

        There should not be a 100% match.

         
  • Hiroshi Miura

    Hiroshi Miura - 2024-02-16

    This entry dos not come from tm/penalty-xxx.
    The entry come from tm/penalty-010 gives 90%.

     
    • Jean-Christophe Helary

      The same entry in segment_1.tmx generates the 90% match and the 100%.

       
      • Thomas CORDONNIER

        No, if you look in the first match in your screenshot there is nothing as file name
        The first entry comes from project memory, so it does not come from tm/penalty-10 and the penalty does not apply to it

         
        • Jean-Christophe Helary

          There is nothing in the project memory. Which is the reason why this is weird.

           
  • Jean-Christophe Helary

    • Attachments has changed:

    Diff:

    --- old
    +++ new
    @@ -1 +1,2 @@
    +weird fuzzy v2.zip (22.0 kB; application/zip)
     weird fuzzy.zip (21.2 kB; application/zip)
    
     
  • Jean-Christophe Helary

    Here is a new version of the project. With 4 sentences. Each sentence has one exact match in a tmx located in "penalty-010".

    In segment 1, 2 and 3 the fuzzy in segment_1.tmx appears as a project fuzzy and not as a /tm fuzzy. And thus the penalty is not applied. Even though the fuzzy has not yet entered the project_save.tmx.

     

    Last edit: Jean-Christophe Helary 2024-02-16
  • Hiroshi Miura

    Hiroshi Miura - 2024-02-16
    • assigned_to: Hiroshi Miura
     
  • Hiroshi Miura

    Hiroshi Miura - 2024-02-16

    This is not a bug in tm/penalty-xxx feature.

     
  • Hiroshi Miura

    Hiroshi Miura - 2025-08-11
    • assigned_to: Hiroshi Miura --> nobody
     
  • Hiroshi Miura

    Hiroshi Miura - 2025-08-11

    Is it still happened in recent weekly release?

     
    👍
    1
    • Jean-Christophe Helary

      There is a different issue, where the match appears twice (I’ll open a different tracker later), but the penalty is respected so I consider that the fix is working.

       
  • Hiroshi Miura

    Hiroshi Miura - 2025-09-11
    • status: open --> open-fixed
    • assigned_to: Hiroshi Miura
     

Log in to post a comment.