From: Peter M. <pet...@gm...> - 2011-08-09 15:55:59
|
Hi, I'm working on patching flex to add true Unicode support. So far, I have UTF-8 encoding working, \u[0-9a-fA-F]{4} and \U[0-9a-fA-F]{8}, the '.' matching all code points from 0 to 0x10ffff, and switches to enable/disable UTF-8 via commandline (--utf8), directive (%option utf8), and pattern modifier (?u:). On my todo list still is proper case insensitive matching, UTF-16 support (*not* wide chars, true support, including surrogate handling and possibly some mechanism to handle BOMs/endianness), property handling, \x and \X escapes for grapheme matches. My current work is here: http://github.com/PeterMartini/flex. I know there have been various patches suggested for Unicode support before, so if anyone who's worked on those would care to comment, please drop me a line / respond to this / fork my github repo! I'd love to get this support into flex, if its still being maintained. Regards, Peter Martini |