Re: [icu-support] question on performance of character searching in UnicodeString

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

 > Will using regular expression help the performance?

Probably not.  For something like this, you can't beat the speed of just 
scanning the data yourself, looking for the two characters of interest.

For good performance, avoid anything that will copy the string data, or 
that will make multiple scans of the string.  Using indexOf() would 
require two scans of the data, one for each of the characters of 
interest.  Regular expressions would do the check in a single pass, but 
will do a somewhat more expensive set membership test on each character.

If there is some point in your code that is already walking through the 
text a character at a time, adding the check at that point would be the 
fastest.  Avoids an extra scan.

Wang, June wrote:
> We are trying to log a warning message whenever a Unicode substitution 
> character 0xFFFD or windows-1252 substitution character 0x1A is found in 
> data. Because we would like to check all textual data in/out of DB, we 
> have concern with the performance of UnicodeString.indexOf()? Will using 
> regular expression help the performance?
> 

-- 
   -- Andy Heninger
      hen...@us...

Re: [icu-support] question on performance of character searching in UnicodeString

Open Source C/C++/Java libraries from Unicode

Re: [icu-support] question on performance of character searching in UnicodeString