From: Kazutoshi S. <k_s...@f2...> - 2011-02-17 16:23:56
|
Matthieu Casanova wrote: > 2011/2/17 Kazutoshi Satoda<k_s...@f2...>: >> Matthieu Casanova wrote: >>> To find a word character it is finally simple : >>> In the Pattern documentation there is a character class for word >>> characters : >>> >>> \w A word character: [a-zA-Z_0-9] >>> >>> So a word character is a letter or digit or _ >>> Do you agree with that ? >> >> I already noted in previous post that "\w" doesn't work for non-ASCII >> character, while "\b" recognize some non-ASCII word boundaries. > > I don't understand how it is possible. If a word char is letter or > digit or underscore, any other char should be refused isn't it ? No. There are many many non-ASCII letters, digit and possibly other word characters like underscore. Unfortunately, there are two different definitions of "word character" in Java regexp; one for "\w" shown above, another one used for "\b" or "\B". The former is ASCII only, and the latter includes non-ASCII characters. > Do you have examples that I could try in jEdit to make the feature better ? Try searching in ":âîûêô:". "\w" doesn't match, while "\b" matches after the first ":" andbefore the last ":". -- k_satoda |