From: Eric B. <er...@go...> - 2007-02-12 07:48:03
|
Colin Paul Adams wrote: >>>>>> "Franck" == Franck Arnaud <fr...@ne...> writes: > > >> For XML 1.1, the equivalent to \c is 3830417 bytes long. > >> > >> This is definitely too big, so something is wrong with the test > >> program. Can anyone see where the fault is (I know it's not the > >> most efficient way of doing it): > > Franck> it should get (much?) more compact by collapsing adjacent > Franck> ranges into forms "\x\y[\z1-\z2]" or "\x\y." (example for > Franck> 3 byte utf8 sequences.) > > I don't understand what you mean. Instead of having: (...|\x\y\z1|\x\y\z2|...|\x\y\zn|...) where \z1, \z2, ..., \zn are consecutive bytes, you can generate: (...|\x\y[\z1-\zn]|...) -- Eric Bezault mailto:er...@go... http://www.gobosoft.com |