Hi folks,
I'm having issues with C::B 16.01 using regular expressions for finding / replacing text within "normal" text files containing German umlaute (ä, ö, ü) (using advanced regexp feature).
The search function itself is working (in file src/src/find_replace.cpp) but the selected string is sometimes not shown at the correct start position and (if the search string contains special characters) with an invalid length.
The problem is that the position by wxRegEx.GetMatch delivers the character position (not the binary position!) and the number of characters found by the regexp.
Setting the correct selection (start position, selection length) requires the binary position and length.
Thus we required the binary length and position which can be calculated using function UTF8Length
from
#include "../src/scintilla/src/Uniconversion.h"
size_t utf8start = UTF8Length( text.c_str(), start );
size_t utf8len = UTF8Length( text.Mid( start, len ).c_str(), len );
I've already made a small patch to this to see if that would work to the ::Find function in line 1172 (don't forget to use the include mentioned above!) :
if (re.Matches(text)) { size_t start, len; re.GetMatch(&start, &len, 0); size_t utf8start = UTF8Length( text.c_str(), start ); pos = utf8start + data->start; // before: pos = start + data->start; size_t utf8len = UTF8Length( text.Mid( start, len ).c_str(), len ) ; len = utf8len; lengthFound = len; if ((start==0) && (len==0)) //For searches for "^" or "$" (and null returning variants on this) need to make sure we have forward progress and not simply matching on a previous BOL/EOL find { text = text.Mid(1); if (re.Matches(text)) { re.GetMatch(&start, &len, 0); utf8start = UTF8Length( text.c_str(), start ); pos = utf8start + data->start + 1; // pos = start + data->start + 1; utf8len = UTF8Length( text.Mid( start, len ).c_str(), len ); lengthFound = len; } else pos=-1; } }
To test the faulty behaviour I've made a very small file containing:
ävoid Test::Functionä0( void )
ävoid Test::Functionä1( void )
If you now start a regexp search using the pattern Test.+( you'll see that (without the patch) C::B selects " Test::Functionä" as result instead of "Test::Functionä0("
Here we see both effects in one:
1. the invalid start position
2. the invalid selection length
The behaviour can be explained very easily: the ä characters requires 2 bytes in UTF-8 representation (which is internally used by scintilla). The first "ä" leeds to the invalid start position while the second one leeds to the invalid length.
I really often use the find-and-replace-feature using regular expressions but with advanvced regexp option it is really unusable. For now I've deactivated the advanced regexp.
Could someone please have a look on that problem? I've only had a look on the find feature till now...
Kind regards
ChristophMS
Can you please provide a patch against latest trunk version?
See how to do it: http://wiki.codeblocks.org/index.php/Creating_a_patch_to_submit_(Patch_Tracker)
Sorry, I did not find the Click on "Submit A Patch" link. So I attached a diff file.
Hopefully you're able to use it.
I've also had a look into the other find/replace functions and extended the former "patch".
There are now three patched sections now. It's tested in a small environment and seemed to work.
Please check whether I've found all relevant code lines before applying that patch :-)
Kind regards
ChristophMS
Do you know that this is an internal scintilla header and it is not supposed to be used by application?
Note you could probably use those scintilla calls http://www.scintilla.org/ScintillaDoc.html#SCI_POSITIONBEFORE to get the correct possitions.
Sorry, I'm not (yet) familiar with the internals of scintilla and codeblocks. I've found that header while debugging through the code. The patch I made was a "fast" one to get it work for me.
I don't know how to use that scintilla stuff at the moment. If that's the better way this one should be used.
Can you make a patch using the scintilla commands?
I could, but I don't have time to spent on this at the moment...
It's the same for me at this moment but that's ok for now since the patch is working for me right now.
The more important thing is that this bug is kept open and that s.o. else could fix it for one of the next releases ...
Hi,
I have the same issue with French characters with version svn 13394.
I am attaching a test file to show the problem and I will try to update the patch according to Petrov.
Regards,
Robert