It should be possible to use regular expressions (at least references to search groups, i.e. $n) for replacing in Search and Replace.
Among the uses of such functionality would be finding (and automatically populating/changing) segments where source and target should be identical, or segments where only a part of original is used in translation, like numbers with units etc., or working with similar languages where the stem of a certain word should remain intact, but inflexions should be changed.
Ideally, the Replace should be capable not only of using RegEx, but of things like .toUpperCase() or arithmetics.
+1
I second this, at least search groups, it would be useful for quickly translating short expressions where there is different word order in the source and target languages which is now impossible with simple search and replace
Re: .toUpperCase(), etc.
Some regexp engines use \u,\l, \U, and \L in replacements to change the case of just one letter or the whole "word."
One problem: I would also hope that \u would be used for specifying Unicode code points: non-break spaces, thin spaces, curly quotes, etc.
Last edit: Maynard Hogg 2015-09-01
"One problem: I would also hope that \u would be used for specifying Unicode code points: non-break spaces, thin spaces, curly quotes, etc."
As we are speaking about strings typed by the user, not in the code, such a substitution is done only if the programmer explicitely requires for it. So, nothing prevents you from using another syntax. I would suggest \x1234, as in Perl, so that \u can still be used for upercase.
Thomas was kind enough to submit a patch for this. To facilitate code review, I have pushed the changes to a git branch and opened a PR on GitHub:
https://github.com/omegat-org/omegat/pull/22
Thomas, if you don't mind we can take the conversation over to GitHub. I can give you access to the OmegaT repo if you let me know your username.
This is now available in trunk, r10270.
I am finding weird CPU use spikes with this trunk. When a project is launched it can go up to 200%, when I have a search widow open it stays at 100%.
Ok, what I have more specifically is the following: when I open a project and just work with it I do have CPU use spikes but they are not much different from what I'd have with the build without this feature.
When I start a search I get a spike to over 100% and then CPU use does not revert to normal even if I close the search window or reload the project. I need to close OmegaT and reopen the project.
I am not seeing anything specific in the log.
Ok, got it, the issue is not with this general commit with 312778c, the one regarding "Display replace string in EntryListPane". When I revert only that commit, I can work normally with OmegaT, when I include it, Main stays around 100% CPU use after I start a search.
The above issue was reported as [bugs:#897], now fixed.
Related
Bugs:
#897I'm attaching the Bundle.properties diff that adds a reference to the group replace in the Replace window explanation text.
Thank you. Committed to trunk, r10283.
Why is the group reference \n in search and $n in replace ?
Hi Jean-Christophe
This is the same in other languages.
In Perl there is a reason:
First of all : the search is a regular expression, but the replace string is not: it is an interpolated string, with own language, different from regular expression. It may also be possible to replace it by any scripting language (as for regex replacements in Groovy for example)
In Perl, however, the search regular expression is also interpolated: if you put $1 in the regular expression, it will be replaced... by the value from the previous search!
For that reason, Perl makes the distinction between previous variables, named $1 in the regular expression, and backtrackers in the expression itself, which are noted \1.
In the replace string, backtrackers do not exist (again: this is not a regular expression) so we may support both $n and \n, as Perl does. Look here for more info:
https://www.regular-expressions.info/replacebackref.html
Seems that in most languages $n and \n are equivalent in replacement string (but not in the search regular expression!)
In our case the difference in the search expression is not relevant (the regular expression is not interpolated so $1 should not exist). Then the syntax for replacement string is a question of choice : if all agree, I can do the modification to support both. Tell me what you think.
Regards
Thomas
Thomas,
Thank you for the detailed reply.
If we put ourselves in the context of text editors, then using the same syntax for both fields is what is generally expected.
JC
Jean-Christophe Helary
http://mac4translators.blogspot.com @brandelune
Related
Feature Requests:
#953"If we put ourselves in the context of text editors, then using the same syntax for both fields is what is generally expected."
Notepad++ uses $1 (just tested), and Ultraedit seems to use ^1 (according to documentation). I have found other text editors using \1, or $1. Difficult to keep this argument - all we can do is to make choice and document it.
I would definitely prefer not to support multiple syntaxes for the same thing. Java already uses
$for regular backreferences, and replacement references are conceptually similar, so I would rather stick with$.Correction: Java uses
\for regular backreferences and$for replacements.Last edit: Aaron Madlon-Kay 2018-03-04
Yes, I used Java as reference in my implementation (even if, strictly speaking, I don't use Java's replacement engine).
However the list of existing syntaxes is complex:
https://www.regular-expressions.info/refreplacebackref.html
Some of them may be useful, for example \g<1> or ${1} if we have numbers just after (for example ${1}3 is $1 followed by 3, differs from $13)
And I do not even speak about named references...
I'm fine with either, but the Search field does not support $n, only \n.
For ex, I can search for サ(ー)バ\1 but not for サ(ー)バ$1
Which means that in the replace windows I have to use:
Search for: サ(ー)バ\1
Replace with: サ(ー)バ$1
Since \n has been in use since we have regexp in OmegaT search, that would be weird to stop using it just to adopt a new construct that nobody yet is using.
Before jumping into the deep end with every possible bell and whistle, I think we should wait and see if there's actually demand for any of that.
(Also my objection is to supporting orthogonal syntaxes like
$and\; I think extending$to also support${}would be OK.)Last edit: Aaron Madlon-Kay 2018-03-04
(it looks like my mail answer has not made it here, so I repost)
I'm fine with either, but the Search field does not support $n, only \n.
For ex, I can search for サ(ー)バ\1 but not for サ(ー)バ$1
Which means that in the replace windows I have to use:
Search for: サ(ー)バ\1
Replace with: サ(ー)バ$1
Since \n has been in use since we have regexp in OmegaT search, that would be weird to stop using it just to adopt a new construct that nobody yet is using.
It is what Java is using.