From: Mark D. <mar...@us...> - 2001-02-22 02:24:29
|
That text is messy, I agree. However, it says (somewhere) that the regular expression version is the reference; the pair implementation is only an approximation to it. Mark ___ Mark Davis, IBM GCoC, Cupertino (408) 777-5850 [fax: 5891], mar...@us..., pre...@un... http://maps.yahoo.com/py/maps.py?Pyt=Tmap&addr=10275+N.+De+Anza&csz=95014 "Edward J. Batutis" <ejb...@ya...>@dwoss.lotus.com on 02-15-2001 19:40:23 Sent by: own...@dw... To: Eric Mader/Cupertino/IBM@IBMUS cc: Alan Liu <al...@fi...>, ic...@dw..., icu...@dw... Subject: Re: Line breaking aaa(aaa: ICU 1.7 --- Eric Mader <er...@us...> wrote: > > Ed, > > In general, the pair table approach doesn't work > quite as well as regular > expressions because there are cases where you need > more context than the > two surrounding characters. (cf. the last paragraph > on page 125 - section > 5.15 right before "Example Specifications.") > I've re-implemented the line breaking rules based on the line breaking properties file on unicode.org and based partially on UTR 14. My new line??.brk files implement line breaking that is closer to UTR 14. After some additional testing I hope to contribute it to ICU/ICU4J. In any case, after struggling with it for several days I'm not too happy with UTR 14. UTR 14 attempts to describe proper line breaking using both regular expressions and pairs, but it is clear that the author had a pair implementation in mind. He tries to break some of the regular expressions down into pairs, but admits that this is only approximate. On the other hand although the regular expressions can be implemented using a regular expression engine, the pairs cannot (at least not with the ICU engine). The result is a description of line breaking that isn't entirely satisfactory for either a pair table implementation or a regular expression implementation. I would rather see a spec that aims clearly at one target - or both targets separately - and hits it directly. Line breaking is inherently a bit messy. Ideally it would vary based on the content of the text it was operating on and the like, but it seems that a clearer and more implementable description of line breaking for general text should be possible. I've attempted to contact the author and will try to forward my comments on to him directly. =Ed __________________________________________________ Do You Yahoo!? Get personalized email addresses from Yahoo! Mail - only $35 a year! http://personal.mail.yahoo.com/ |