Re: [Flex-devel] UTF-8 integration
flex is a tool for generating scanners
Brought to you by:
wlestes
From: Will E. <wes...@gm...> - 2012-07-19 23:40:02
|
Paul, This is great. I'm not opposed in principle to the change you suggest, but I want to think about some other issues first. It looks like importing the cvs tree into git is going to be a breeze, and it looks like it'll come out quite pretty. I've got time on my books tomorrow to put out a new release of flex to catch up the released code to what's in cvs. Then we can get flex into git and make all of the UTF work that much easier. Fair warning that once flex is in git, the source code will get moved into an src/ directory and I want to look at redoing the way the test suite is handled, especially since you've added so many tests. You're right that flex does need a lot more test coverage. --Will On Thursday, 19 July 2012, 6:55 pm -0400, Paul <pa...@pr...> wrote: > Have the barest bones of an integration of flex-cvs, flex utf-16, & > flex utf-8. > Passes 106 of 107 tests. The last is a problem with ccl lists in utf16-mode. > Besides this fix it needs many more tests. > > I'm wondering if the change from the ccl bit map to lists is ok in > general in all modes. > I'm using it for the tests and there is no perceptible speed difference. > Having a bit map for non utf-8 and a list for it would be a pain. > > Paul > > On 12-07-19 05:25 AM, Peter Martini wrote: > >I'd be thrilled, actually. I started trying to integrate the > >changes that flex had gone through and the patches you were > >sending into my branch directly, and ended up deferring the work > >each time when I got intimidated by the scale of the diffs :-) > > > >Hopefully the commits in my branch are discrete enough to be useful. > > > >On Thu, Jul 19, 2012 at 5:21 AM, Paul <pa...@pr... > ><mailto:pa...@pr...>> wrote: > > > > Would you mind if I took your changes, made them conditional on > > the utf8 flag and integrated them into the unicode16 version. > > I would also like to add more tests. > > > > > > Paul > > > > On 12-07-18 11:29 AM, Peter Martini wrote: > >> Not nearly as thoroughly as you did. I confirmed that with or > >> without UTF-8 parsing enabled, if its only ASCII text, the same > >> exact tables are generated, and I added tests to make sure that > >> the unicode escapes and the '.' work as designed. As designed is > >> a very broad definition though. > >> > >> Take a look at > >> https://github.com/PeterMartini/flex/commits/master - I made a > >> lot of small commits, so you can see the changes I was > >>making. The biggest change structurally was I changed the > >>CCL bitmap to > >> be a linked list of ranges, to speed up processing the 0x10 FFFF > >> allowed values, and then added a secondary step to convert those > >> unicode characters to an equivalent 8-bit pattern. > >> > >> But my commits also track some housekeeping I did to get rid of > >> warnings and get those C++ tests to pass, and I think some of > >> that has been merged into the main branch now. > >> > >> On Mon, Jul 16, 2012 at 5:03 PM, Paul <pa...@pr... > >> <mailto:pa...@pr...>> wrote: > >> > >> How completely have you tested your utf8 version of flex? > >> > >> Paul > >> > >> > > > > > > > > > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > _______________________________________________ > Flex-devel mailing list > Fle...@li... > https://lists.sourceforge.net/lists/listinfo/flex-devel -- Will Estes (wl...@us...) Flex Project Maintainer http://flex.sourceforge.net/ |