Ted wrote:
> I read somewhere Gobo Regexp doesn't support Unicode yet.
> After some investigation. I found some minor changes would make it
> support Unicode (See following patch).
>
> Can someone confirm? And merge it into the library if possible.
I just committed your modifications in SVN.
However, in addition to Colin's remark about case-insensitivity, I also
noticed that character classes (e.g. "[a-z]") will not work if they
contain characters with code greater then 255.
For your information, this regexp library in Gobo was born out of
a translation to Eiffel of the code of the PCRE C library. The version
of the C library was 3.9 if I remember correctly, and it was the
very early stage of its support for unicode. The version number today
is 7.7 and they claim to support unicode now, supposedly with a regexp
pattern syntax and behavior compatible with Perl's regexp. It might
be worth looking at the PCRE C library:
http://www.pcre.org/
in order to implement unicode support in a way that does not depart
too much from the original PCRE library and at the same time preventing
us from having to reinvent the wheel.
I suggest that we move this discussion to the gobo-eiffel-develop
mailing list.
--
Eric Bezault
mailto:er...@go...
http://www.gobosoft.com
|