Menu

#953 Replacement with regex capture groups and case manipulation

4.1
closed-fixed
5
2018-04-04
2014-02-03
No

It should be possible to use regular expressions (at least references to search groups, i.e. $n) for replacing in Search and Replace.

Among the uses of such functionality would be finding (and automatically populating/changing) segments where source and target should be identical, or segments where only a part of original is used in translation, like numbers with units etc., or working with similar languages where the stem of a certain word should remain intact, but inflexions should be changed.

Ideally, the Replace should be capable not only of using RegEx, but of things like .toUpperCase() or arithmetics.

Related

Bugs: #897
Bugs: #898
Feature Requests: #953

Discussion

1 2 > >> (Page 1 of 2)
  • Héctor Cartagena

    +1

     
  • Vojta Drabek

    Vojta Drabek - 2014-11-15

    I second this, at least search groups, it would be useful for quickly translating short expressions where there is different word order in the source and target languages which is now impossible with simple search and replace

     
  • Maynard Hogg

    Maynard Hogg - 2015-09-01

    Re: .toUpperCase(), etc.
    Some regexp engines use \u,\l, \U, and \L in replacements to change the case of just one letter or the whole "word."

    One problem: I would also hope that \u would be used for specifying Unicode code points: non-break spaces, thin spaces, curly quotes, etc.

     

    Last edit: Maynard Hogg 2015-09-01
  • Thomas CORDONNIER

    "One problem: I would also hope that \u would be used for specifying Unicode code points: non-break spaces, thin spaces, curly quotes, etc."

    As we are speaking about strings typed by the user, not in the code, such a substitution is done only if the programmer explicitely requires for it. So, nothing prevents you from using another syntax. I would suggest \x1234, as in Perl, so that \u can still be used for upercase.

     
  • Didier Briel

    Didier Briel - 2017-10-18
    • assigned_to: Thomas CORDONNIER
     
  • Aaron Madlon-Kay

    Thomas was kind enough to submit a patch for this. To facilitate code review, I have pushed the changes to a git branch and opened a PR on GitHub:
    https://github.com/omegat-org/omegat/pull/22

    Thomas, if you don't mind we can take the conversation over to GitHub. I can give you access to the OmegaT repo if you let me know your username.

     
  • Aaron Madlon-Kay

    • summary: RegEx for Replacing --> Replacement with regex capture groups and case manipulation
    • Group: future --> 4.1
     
  • Aaron Madlon-Kay

    • status: open --> open-fixed
     
  • Aaron Madlon-Kay

    This is now available in trunk, r10270.

     
  • Jean-Christophe Helary

    I am finding weird CPU use spikes with this trunk. When a project is launched it can go up to 200%, when I have a search widow open it stays at 100%.

     
  • Jean-Christophe Helary

    Ok, what I have more specifically is the following: when I open a project and just work with it I do have CPU use spikes but they are not much different from what I'd have with the build without this feature.
    When I start a search I get a spike to over 100% and then CPU use does not revert to normal even if I close the search window or reload the project. I need to close OmegaT and reopen the project.
    I am not seeing anything specific in the log.

     
  • Jean-Christophe Helary

    Ok, got it, the issue is not with this general commit with 312778c, the one regarding "Display replace string in EntryListPane". When I revert only that commit, I can work normally with OmegaT, when I include it, Main stays around 100% CPU use after I start a search.

     
  • Aaron Madlon-Kay

    The above issue was reported as [bugs:#897], now fixed.

     

    Related

    Bugs: #897

  • Jean-Christophe Helary

    I'm attaching the Bundle.properties diff that adds a reference to the group replace in the Replace window explanation text.

     
    • Aaron Madlon-Kay

      Thank you. Committed to trunk, r10283.

       
  • Jean-Christophe Helary

    Why is the group reference \n in search and $n in replace ?

     
    • Thomas CORDONNIER

      Hi Jean-Christophe

      This is the same in other languages.

      In Perl there is a reason:
      First of all : the search is a regular expression, but the replace string is not: it is an interpolated string, with own language, different from regular expression. It may also be possible to replace it by any scripting language (as for regex replacements in Groovy for example)
      In Perl, however, the search regular expression is also interpolated: if you put $1 in the regular expression, it will be replaced... by the value from the previous search!
      For that reason, Perl makes the distinction between previous variables, named $1 in the regular expression, and backtrackers in the expression itself, which are noted \1.
      In the replace string, backtrackers do not exist (again: this is not a regular expression) so we may support both $n and \n, as Perl does. Look here for more info:
      https://www.regular-expressions.info/replacebackref.html
      Seems that in most languages $n and \n are equivalent in replacement string (but not in the search regular expression!)

      In our case the difference in the search expression is not relevant (the regular expression is not interpolated so $1 should not exist). Then the syntax for replacement string is a question of choice : if all agree, I can do the modification to support both. Tell me what you think.

      Regards
      Thomas

       
      • Jean-Christophe Helary

        Thomas,

        Thank you for the detailed reply.

        If we put ourselves in the context of text editors, then using the same syntax for both fields is what is generally expected.

        JC

        2018/03/04 20:58、Thomas CORDONNIER t_cordonnier@users.sourceforge.netのメール:

        Hi Jean-Christophe

        This is the same in other languages.

        In Perl there is a reason:
        First of all : the search is a regular expression, but the replace string is not: it is an interpolated string, with own language, different from regular expression. It may also be possible to replace it by any scripting language (as for regex replacements in Groovy for example)
        In Perl, however, the search regular expression is also interpolated: if you put $1 in the regular expression, it will be replaced... by the value from the previous search!
        For that reason, Perl makes the distinction between previous variables, named $1 in the regular expression, and backtrackers in the expression itself, which are noted \1.
        In the replace string, backtrackers do not exist (again: this is not a regular expression) so we may support both $n and \n, as Perl does. Look here for more info:
        https://www.regular-expressions.info/replacebackref.html
        Seems that in most languages $n and \n are equivalent in replacement string (but not in the search regular expression!)

        In our case the difference in the search expression is not relevant (the regular expression is not interpolated so $1 should not exist). Then the syntax for replacement string is a question of choice : if all agree, I can do the modification to support both. Tell me what you think.

        Regards
        Thomas


        ** [feature-requests:#953] Replacement with regex capture groups and case manipulation**

        Status: open-fixed
        Group: 4.1
        Labels: Regular expressions replace search and replace
        Created: Mon Feb 03, 2014 04:38 PM UTC by Kos Ivantsov
        Last Updated: Sun Mar 04, 2018 10:01 AM UTC
        Owner: Thomas CORDONNIER

        It should be possible to use regular expressions (at least references to search groups, i.e. $n) for replacing in Search and Replace.

        Among the uses of such functionality would be finding (and automatically populating/changing) segments where source and target should be identical, or segments where only a part of original is used in translation, like numbers with units etc., or working with similar languages where the stem of a certain word should remain intact, but inflexions should be changed.

        Ideally, the Replace should be capable not only of using RegEx, but of things like .toUpperCase() or arithmetics.


        Sent from sourceforge.net because you indicated interest in https://sourceforge.net/p/omegat/feature-requests/953/

        To unsubscribe from further messages, please visit https://sourceforge.net/auth/subscriptions/

        Jean-Christophe Helary

        http://mac4translators.blogspot.com @brandelune

         

        Related

        Feature Requests: #953

        • Thomas CORDONNIER

          "If we put ourselves in the context of text editors, then using the same syntax for both fields is what is generally expected."

          Notepad++ uses $1 (just tested), and Ultraedit seems to use ^1 (according to documentation). I have found other text editors using \1, or $1. Difficult to keep this argument - all we can do is to make choice and document it.

           
  • Aaron Madlon-Kay

    I would definitely prefer not to support multiple syntaxes for the same thing. Java already uses $ for regular backreferences, and replacement references are conceptually similar, so I would rather stick with $.

    Correction: Java uses \ for regular backreferences and $ for replacements.

     

    Last edit: Aaron Madlon-Kay 2018-03-04
    • Thomas CORDONNIER

      Yes, I used Java as reference in my implementation (even if, strictly speaking, I don't use Java's replacement engine).
      However the list of existing syntaxes is complex:
      https://www.regular-expressions.info/refreplacebackref.html
      Some of them may be useful, for example \g<1> or ${1} if we have numbers just after (for example ${1}3 is $1 followed by 3, differs from $13)
      And I do not even speak about named references...

       
    • Jean-Christophe Helary

      2018/03/04 21:51、Aaron Madlon-Kay amake@users.sourceforge.netのメール:

      I would definitely prefer not to support multiple syntaxes for the same thing. Java already uses $n for regular backreferences, and replacement references are conceptually similar, so I would rather stick with $n.

      I'm fine with either, but the Search field does not support $n, only \n.

      For ex, I can search for サ(ー)バ\1 but not for サ(ー)バ$1

      Which means that in the replace windows I have to use:

      Search for: サ(ー)バ\1
      Replace with: サ(ー)バ$1

      Since \n has been in use since we have regexp in OmegaT search, that would be weird to stop using it just to adopt a new construct that nobody yet is using.

       
  • Aaron Madlon-Kay

    Before jumping into the deep end with every possible bell and whistle, I think we should wait and see if there's actually demand for any of that.

    (Also my objection is to supporting orthogonal syntaxes like $ and \; I think extending $ to also support ${} would be OK.)

     

    Last edit: Aaron Madlon-Kay 2018-03-04
  • Jean-Christophe Helary

    (it looks like my mail answer has not made it here, so I repost)

    I'm fine with either, but the Search field does not support $n, only \n.

    For ex, I can search for サ(ー)バ\1 but not for サ(ー)バ$1

    Which means that in the replace windows I have to use:

    Search for: サ(ー)バ\1
    Replace with: サ(ー)バ$1

    Since \n has been in use since we have regexp in OmegaT search, that would be weird to stop using it just to adopt a new construct that nobody yet is using.

     
    • Aaron Madlon-Kay

      It is what Java is using.

       
1 2 > >> (Page 1 of 2)

Log in to post a comment.

MongoDB Logo MongoDB