From: Colin P. A. <co...@co...> - 2007-02-12 07:22:54
|
>>>>> "Franck" == Franck Arnaud <fr...@ne...> writes: >> For XML 1.1, the equivalent to \c is 3830417 bytes long. >> >> This is definitely too big, so something is wrong with the test >> program. Can anyone see where the fault is (I know it's not the >> most efficient way of doing it): Franck> it should get (much?) more compact by collapsing adjacent Franck> ranges into forms "\x\y[\z1-\z2]" or "\x\y." (example for Franck> 3 byte utf8 sequences.) I don't understand what you mean. But you have drawn my attention to ranges. E.g. [a-z] is fine, because both ends are single-byte UTF-8 sequences. But if you put latin-1 accented characters (for instance) at the end of the ranges, then nothing works. -- Colin Adams Preston Lancashire |