From: Larry M. <lm...@bi...> - 2011-07-31 00:25:43
|
On Sun, Jul 31, 2011 at 09:18:50AM +1000, Steve Bennett wrote: > A recent post to c.l.t encouraged me to add support for Jim Tcl for unicode > literals (\uNNNN) past the BMP. Non-literals are easy because Jim uses > pure utf-8 internally, but with literals we have the problem with > situations like this: > > set x "\u2702b" > > Currently \u can accept up to 4 hex digits, but extending past the BMP > requires that we accept more. In this case do we have one character or two? > > Two options are: > > 1. Add \U which allows up to 6 hex digits. e.g. > > set x "\U2702b" > > 2. Allow braced \u escapes. e.g. > > set x "\u{2702b}" > > One day Tcl may face the same issue, so I would be interested in thoughts > from anyone here. I am leaning towards option 2. Option 1, if you do it, should be a fixed number of digits so you can parse it. I like option 2 myself. -- --- Larry McVoy lm at bitmover.com http://www.bitkeeper.com |