OmegaT - multiplatform CAT tool / Feature Requests / #953 Replacement with regex capture groups and case manipulation

Héctor Cartagena - 2014-08-27

+1

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Vojta Drabek - 2014-11-15

I second this, at least search groups, it would be useful for quickly translating short expressions where there is different word order in the source and target languages which is now impossible with simple search and replace

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Maynard Hogg - 2015-09-01

Re: .toUpperCase(), etc.
Some regexp engines use \u,\l, \U, and \L in replacements to change the case of just one letter or the whole "word."

One problem: I would also hope that \u would be used for specifying Unicode code points: non-break spaces, thin spaces, curly quotes, etc.

Last edit: Maynard Hogg 2015-09-01

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Thomas CORDONNIER - 2017-05-19

"One problem: I would also hope that \u would be used for specifying Unicode code points: non-break spaces, thin spaces, curly quotes, etc."

As we are speaking about strings typed by the user, not in the code, such a substitution is done only if the programmer explicitely requires for it. So, nothing prevents you from using another syntax. I would suggest \x1234, as in Perl, so that \u can still be used for upercase.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Didier Briel - 2017-10-18

assigned_to: Thomas CORDONNIER
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Aaron Madlon-Kay - 2018-02-05

Thomas was kind enough to submit a patch for this. To facilitate code review, I have pushed the changes to a git branch and opened a PR on GitHub:
https://github.com/omegat-org/omegat/pull/22

Thomas, if you don't mind we can take the conversation over to GitHub. I can give you access to the OmegaT repo if you let me know your username.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Aaron Madlon-Kay - 2018-02-15

summary: RegEx for Replacing --> Replacement with regex capture groups and case manipulation

Group: future --> 4.1
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Aaron Madlon-Kay - 2018-02-17

status: open --> open-fixed
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Aaron Madlon-Kay - 2018-02-17

This is now available in trunk, r10270.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Jean-Christophe Helary - 2018-02-28

I am finding weird CPU use spikes with this trunk. When a project is launched it can go up to 200%, when I have a search widow open it stays at 100%.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Jean-Christophe Helary - 2018-02-28

Ok, what I have more specifically is the following: when I open a project and just work with it I do have CPU use spikes but they are not much different from what I'd have with the build without this feature.
When I start a search I get a spike to over 100% and then CPU use does not revert to normal even if I close the search window or reload the project. I need to close OmegaT and reopen the project.
I am not seeing anything specific in the log.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Jean-Christophe Helary - 2018-02-28

Ok, got it, the issue is not with this general commit with 312778c, the one regarding "Display replace string in EntryListPane". When I revert only that commit, I can work normally with OmegaT, when I include it, Main stays around 100% CPU use after I start a search.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Aaron Madlon-Kay - 2018-02-28

The above issue was reported as [bugs:#897], now fixed.

Related

Bugs: ~~#897~~

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Jean-Christophe Helary - 2018-02-28

I'm attaching the Bundle.properties diff that adds a reference to the group replace in the Replace window explanation text.

Bundle.properties.diff.zip

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Aaron Madlon-Kay - 2018-02-28
  
  Thank you. Committed to trunk, r10283.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Jean-Christophe Helary - 2018-03-04

Why is the group reference \n in search and $n in replace ?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Thomas CORDONNIER - 2018-03-04
  
  Hi Jean-Christophe
  
  This is the same in other languages.
  
  In Perl there is a reason:
  First of all : the search is a regular expression, but the replace string is not: it is an interpolated string, with own language, different from regular expression. It may also be possible to replace it by any scripting language (as for regex replacements in Groovy for example)
  In Perl, however, the search regular expression is also interpolated: if you put $1 in the regular expression, it will be replaced... by the value from the previous search!
  For that reason, Perl makes the distinction between previous variables, named $1 in the regular expression, and backtrackers in the expression itself, which are noted \1.
  In the replace string, backtrackers do not exist (again: this is not a regular expression) so we may support both $n and \n, as Perl does. Look here for more info:
  https://www.regular-expressions.info/replacebackref.html
  Seems that in most languages $n and \n are equivalent in replacement string (but not in the search regular expression!)
  
  In our case the difference in the search expression is not relevant (the regular expression is not interpolated so $1 should not exist). Then the syntax for replacement string is a question of choice : if all agree, I can do the modification to support both. Tell me what you think.
  
  Regards
  Thomas
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Jean-Christophe Helary - 2018-03-04
    
    Thomas,
    
    Thank you for the detailed reply.
    
    If we put ourselves in the context of text editors, then using the same syntax for both fields is what is generally expected.
    
    JC
    
    2018/03/04 20:58、Thomas CORDONNIER t_cordonnier@users.sourceforge.netのメール:
    
    Hi Jean-Christophe
    
    This is the same in other languages.
    
    In Perl there is a reason:
    First of all : the search is a regular expression, but the replace string is not: it is an interpolated string, with own language, different from regular expression. It may also be possible to replace it by any scripting language (as for regex replacements in Groovy for example)
    In Perl, however, the search regular expression is also interpolated: if you put $1 in the regular expression, it will be replaced... by the value from the previous search!
    For that reason, Perl makes the distinction between previous variables, named $1 in the regular expression, and backtrackers in the expression itself, which are noted \1.
    In the replace string, backtrackers do not exist (again: this is not a regular expression) so we may support both $n and \n, as Perl does. Look here for more info:
    https://www.regular-expressions.info/replacebackref.html
    Seems that in most languages $n and \n are equivalent in replacement string (but not in the search regular expression!)
    
    In our case the difference in the search expression is not relevant (the regular expression is not interpolated so $1 should not exist). Then the syntax for replacement string is a question of choice : if all agree, I can do the modification to support both. Tell me what you think.
    
    Regards
    Thomas
    
    ** [feature-requests:#953] Replacement with regex capture groups and case manipulation**
    
    Status: open-fixed
    Group: 4.1
    Labels: Regular expressions replace search and replace
    Created: Mon Feb 03, 2014 04:38 PM UTC by Kos Ivantsov
    Last Updated: Sun Mar 04, 2018 10:01 AM UTC
    Owner: Thomas CORDONNIER
    
    It should be possible to use regular expressions (at least references to search groups, i.e. $n) for replacing in Search and Replace.
    
    Among the uses of such functionality would be finding (and automatically populating/changing) segments where source and target should be identical, or segments where only a part of original is used in translation, like numbers with units etc., or working with similar languages where the stem of a certain word should remain intact, but inflexions should be changed.
    
    Ideally, the Replace should be capable not only of using RegEx, but of things like .toUpperCase() or arithmetics.
    
    Sent from sourceforge.net because you indicated interest in https://sourceforge.net/p/omegat/feature-requests/953/
    
    To unsubscribe from further messages, please visit https://sourceforge.net/auth/subscriptions/
    
    Jean-Christophe Helary
    
    http://mac4translators.blogspot.com @brandelune
    
    Related
    
    Feature Requests: ~~#953~~
    
    alternate
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
    - Thomas CORDONNIER - 2018-03-04
      
      "If we put ourselves in the context of text editors, then using the same syntax for both fields is what is generally expected."
      
      Notepad++ uses $1 (just tested), and Ultraedit seems to use ^1 (according to documentation). I have found other text editors using \1, or $1. Difficult to keep this argument - all we can do is to make choice and document it.
      
      If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Aaron Madlon-Kay - 2018-03-04

I would definitely prefer not to support multiple syntaxes for the same thing. Java already uses $ ~~for regular backreferences, and replacement references are conceptually similar~~, so I would rather stick with $.

Correction: Java uses \ for regular backreferences and $ for replacements.

Last edit: Aaron Madlon-Kay 2018-03-04

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Thomas CORDONNIER - 2018-03-04
  
  Yes, I used Java as reference in my implementation (even if, strictly speaking, I don't use Java's replacement engine).
  However the list of existing syntaxes is complex:
  https://www.regular-expressions.info/refreplacebackref.html
  Some of them may be useful, for example \g<1> or ${1} if we have numbers just after (for example ${1}3 is $1 followed by 3, differs from $13)
  And I do not even speak about named references...
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Jean-Christophe Helary - 2018-03-04
  
  2018/03/04 21:51、Aaron Madlon-Kay amake@users.sourceforge.netのメール:
  
  I would definitely prefer not to support multiple syntaxes for the same thing. Java already uses $n for regular backreferences, and replacement references are conceptually similar, so I would rather stick with $n.
  
  I'm fine with either, but the Search field does not support $n, only \n.
  
  For ex, I can search for サ(ー)バ\1 but not for サ(ー)バ$1
  
  Which means that in the replace windows I have to use:
  
  Search for: サ(ー)バ\1
  Replace with: サ(ー)バ$1
  
  Since \n has been in use since we have regexp in OmegaT search, that would be weird to stop using it just to adopt a new construct that nobody yet is using.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Aaron Madlon-Kay - 2018-03-04

Before jumping into the deep end with every possible bell and whistle, I think we should wait and see if there's actually demand for any of that.

(Also my objection is to supporting orthogonal syntaxes like $ and \; I think extending $ to also support ${} would be OK.)

Last edit: Aaron Madlon-Kay 2018-03-04

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Jean-Christophe Helary - 2018-03-04

(it looks like my mail answer has not made it here, so I repost)

I'm fine with either, but the Search field does not support $n, only \n.

For ex, I can search for サ(ー)バ\1 but not for サ(ー)バ$1

Which means that in the replace windows I have to use:

Search for: サ(ー)バ\1
Replace with: サ(ー)バ$1

Since \n has been in use since we have regexp in OmegaT search, that would be weird to stop using it just to adopt a new construct that nobody yet is using.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Aaron Madlon-Kay - 2018-03-04
  
  It is what Java is using.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Replacement with regex capture groups and case manipulation

The free computer aided translation (CAT) tool for professionals

Group

Searches

Help

#953 Replacement with regex capture groups and case manipulation

Related

Discussion

Related

Jean-Christophe Helary

Related