flex-devel Mailing List for flex: the fast lexical analyser (Page 8)
flex is a tool for generating scanners
Brought to you by:
wlestes
You can subscribe to this list here.
2005 |
Jan
|
Feb
|
Mar
(1) |
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
---|---|---|---|---|---|---|---|---|---|---|---|---|
2006 |
Jan
|
Feb
|
Mar
|
Apr
|
May
(4) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(1) |
Nov
(1) |
Dec
|
2007 |
Jan
|
Feb
(1) |
Mar
(4) |
Apr
(5) |
May
(2) |
Jun
(2) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
(2) |
Dec
(3) |
2008 |
Jan
(1) |
Feb
(2) |
Mar
(1) |
Apr
(2) |
May
(1) |
Jun
|
Jul
|
Aug
(5) |
Sep
(3) |
Oct
(33) |
Nov
(4) |
Dec
(4) |
2009 |
Jan
|
Feb
|
Mar
(1) |
Apr
(1) |
May
|
Jun
|
Jul
|
Aug
(1) |
Sep
|
Oct
(1) |
Nov
|
Dec
|
2011 |
Jan
|
Feb
|
Mar
|
Apr
|
May
(1) |
Jun
|
Jul
|
Aug
(10) |
Sep
|
Oct
(1) |
Nov
(1) |
Dec
(1) |
2012 |
Jan
|
Feb
(11) |
Mar
(12) |
Apr
|
May
|
Jun
(3) |
Jul
(62) |
Aug
(2) |
Sep
(1) |
Oct
(1) |
Nov
|
Dec
|
2013 |
Jan
|
Feb
(3) |
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
(2) |
2014 |
Jan
|
Feb
|
Mar
|
Apr
(1) |
May
|
Jun
(5) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2015 |
Jan
|
Feb
(4) |
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
(2) |
Sep
|
Oct
(3) |
Nov
(33) |
Dec
(31) |
2016 |
Jan
(2) |
Feb
|
Mar
(1) |
Apr
|
May
(2) |
Jun
|
Jul
|
Aug
(2) |
Sep
(5) |
Oct
|
Nov
|
Dec
|
2017 |
Jan
|
Feb
|
Mar
|
Apr
|
May
(1) |
Jun
(2) |
Jul
|
Aug
|
Sep
(3) |
Oct
|
Nov
(4) |
Dec
|
2020 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
(3) |
Sep
|
Oct
|
Nov
|
Dec
|
2021 |
Jan
|
Feb
(5) |
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2024 |
Jan
|
Feb
(2) |
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2025 |
Jan
|
Feb
(2) |
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
From: Paul <pa...@pr...> - 2012-07-06 15:29:36
|
The unicode16 version of flex uses unsigned short. Without opening a large can of worms re types, to pass the tests utest-posix and utest-posixly-correct (nothing to do with posix compatibility) with the -U flag on, I have altered the line: char * tests[NUM_TESTS] = { "ababab"}; /* non unicode */ to (simplified): unsigned short testu[7]={'a','b','a','b','a','b',0}; /* altered for unicode pn */ The test then passes with testu OK. This makes initializing a pain since: a. wchar_t is 4 bytes wide in gcc b. L"ababab" is also 4 bytes wide. whereas unsigned short & unicode 16 is 2 bytes wide. I will note that char16_t is not defined in C, only C++. Thoughts please. Paul |
From: Will E. <wes...@gm...> - 2012-07-06 12:27:07
|
No, you should catch that particular combination of options and ensure that that particular output is given. (Because it'd be a bug if it weren't.) Good catch though. --Will On Friday, 6 July 2012, 7:53 am -0400, Paul <pa...@pr...> wrote: > Test utest-table-opts fails the %option unicode tests with the message: > flex: Can't use -C with -U > Since table compression and unicode are mutually exclusive should this test > be inverted to pass when the check for -C & -U fails? > > Paul Neelands > > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > _______________________________________________ > Flex-devel mailing list > Fle...@li... > https://lists.sourceforge.net/lists/listinfo/flex-devel -- Will Estes (wl...@us...) Flex Project Maintainer http://flex.sourceforge.net/ |
From: Paul <pa...@pr...> - 2012-07-06 11:54:08
|
Test utest-table-opts fails the %option unicode tests with the message: flex: Can't use -C with -U Since table compression and unicode are mutually exclusive should this test be inverted to pass when the check for -C & -U fails? Paul Neelands |
From: Will E. <wes...@gm...> - 2012-07-05 14:17:21
|
Yes. Please forward it. --Will On Thursday, 5 July 2012, 8:29 am -0400, Paul <pa...@pr...> wrote: > I now have a unicode version of the the 1 July 2012 flex cvs. > This version passes all 46 non-unicode tests. > It has an additional 46 test with the %unicode option turned on. > Of these presently 10 fail. Am investigating the failures. > Some are required failures with unicode. Some are unknown. > Perhaps it is time for more eyes on this work. > Should I forward the diff. > > Paul Neelands > > > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > _______________________________________________ > Flex-devel mailing list > Fle...@li... > https://lists.sourceforge.net/lists/listinfo/flex-devel -- Will Estes (wl...@us...) Flex Project Maintainer http://flex.sourceforge.net/ |
From: Paul <pa...@pr...> - 2012-07-05 12:30:03
|
I now have a unicode version of the the 1 July 2012 flex cvs. This version passes all 46 non-unicode tests. It has an additional 46 test with the %unicode option turned on. Of these presently 10 fail. Am investigating the failures. Some are required failures with unicode. Some are unknown. Perhaps it is time for more eyes on this work. Should I forward the diff. Paul Neelands |
From: Will E. <wes...@gm...> - 2012-07-03 17:38:28
|
Paul, See the file tests/README in the flex distribution for details on what to do. There's a section called "HOW TO ADD A NEW TEST TO THE TEST SUITE". That documentation should work; if it doesn't, please report that as a bug. On Tuesday, 3 July 2012, 1:22 pm -0400, Paul <pa...@pr...> wrote: > I find the config/make arrangement for flex a bit confusing. > Would some kind soul please explain how I can have: > make check > run tests on 2 directories (tests & testsu). > I wish to run all 46 tests converted to %option unicode > as well as the present 46 tests. > Clearly adding testsu to the indentfiles list in makefile.am > & > adding testsu/Makefile ... etc to configure.in > is not enough. > For testing I made an exact copy of tests to testu > with the above changes, but no go. > > Paul Neelands > > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > _______________________________________________ > Flex-devel mailing list > Fle...@li... > https://lists.sourceforge.net/lists/listinfo/flex-devel -- Will Estes (wl...@us...) Flex Project Maintainer http://flex.sourceforge.net/ |
From: Paul <pa...@pr...> - 2012-07-03 17:22:38
|
I find the config/make arrangement for flex a bit confusing. Would some kind soul please explain how I can have: make check run tests on 2 directories (tests & testsu). I wish to run all 46 tests converted to %option unicode as well as the present 46 tests. Clearly adding testsu to the indentfiles list in makefile.am & adding testsu/Makefile ... etc to configure.in is not enough. For testing I made an exact copy of tests to testu with the above changes, but no go. Paul Neelands |
From: Will E. <wes...@gm...> - 2012-07-01 17:48:13
|
Paul, Can you see how your changes look against the current codebase of flex? There has been a good amount of change in the test suite that is in cvs but that has not been released yet. (I expect I'll have a release of flex done in the next couple months.) --Will On Sunday, 1 July 2012, 11:12 am -0400, Paul <pa...@pr...> wrote: > All the unicode patches from the original flex-2.5.4.U through 2.5.35.U2 > do not write the YY_CHAR typedef to the header file when the header file > is requested. Thus failing 15 or so tests. After more understanding of > flex internals, I have yet another version of flex which correctly > writes the header file typedef and passes 43 of the 46 tests. The > remaining 3 failed tests also fail with version 2.5.35 original on my > machine Kubuntu 12.04. Two failures are EOF not defined in C++ tests and > the pthreads test. > > Um .. I could post it. > > Paul Neelands > > > > > > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > _______________________________________________ > Flex-devel mailing list > Fle...@li... > https://lists.sourceforge.net/lists/listinfo/flex-devel -- Will Estes (wl...@us...) Flex Project Maintainer http://flex.sourceforge.net/ |
From: Paul <pa...@pr...> - 2012-07-01 15:12:21
|
All the unicode patches from the original flex-2.5.4.U through 2.5.35.U2 do not write the YY_CHAR typedef to the header file when the header file is requested. Thus failing 15 or so tests. After more understanding of flex internals, I have yet another version of flex which correctly writes the header file typedef and passes 43 of the 46 tests. The remaining 3 failed tests also fail with version 2.5.35 original on my machine Kubuntu 12.04. Two failures are EOF not defined in C++ tests and the pthreads test. Um .. I could post it. Paul Neelands |
From: Paul <pa...@pr...> - 2012-06-30 13:18:38
|
Attached is the diff of flex 2.5.35 to flex 2.5.35.U2 This version fixes type bugs with unput() and %option array when using %option unicode or -U. Paul Neelands |
From: Aaron S. <aa...@se...> - 2012-06-29 19:33:34
|
Please do post, thank you! On Fri, Jun 29, 2012 at 11:41 AM, Paul <pa...@pr...> wrote: > Hi, > > Using the patch I previously distributed for 2.5.35, called 2.5.35.u > there are bugs in the handling of %option array. > In particular the typing in unput is char instead of YY_CHAR. I have a > fixed version, but I am still working on seperating the patch into two > pieces as requested. I could post the diff if requested. > > Paul Neelands > > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > _______________________________________________ > Flex-devel mailing list > Fle...@li... > https://lists.sourceforge.net/lists/listinfo/flex-devel |
From: Paul <pa...@pr...> - 2012-06-29 18:59:49
|
Hi, Using the patch I previously distributed for 2.5.35, called 2.5.35.u there are bugs in the handling of %option array. In particular the typing in unput is char instead of YY_CHAR. I have a fixed version, but I am still working on seperating the patch into two pieces as requested. I could post the diff if requested. Paul Neelands |
From: Aaron S. <aa...@se...> - 2012-03-28 18:50:48
|
Adding to the thread, I just looked at the C11 standard, and saw these: char16_t and char32_t for 16-bit and 32-bit Unicode, in UTF-16 and UTF-32, respectively. Source document: http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1040.pdf We should use those, rather than wchar_t, and provide local definitions in our headers as backwards compatibility for (most) systems that don't (yet) provide them. On Mon, Mar 12, 2012 at 2:31 PM, Peter Martini <pet...@gm...> wrote: > On Mon, Mar 12, 2012 at 3:45 PM, Aaron Stone <aa...@se...> wrote: >> Hi Paul, >> >> I think for clarity, could you break your patch into two parts? First, >> changing all 'char *' declarations to YY_CHAR *, and in a second patch >> add code that handles different sizes of YY_CHAR. >> >> Am I reading correctly that gist of this Unicode mode is handling text >> in UTF-16 -- 16-bits per character? Searching the web for a few >> minutes, that's going to be the same representation used by Windows, >> Java, .NET, and OS X, and also Python and Perl. Linux appears to use >> UTF-32 internall(?). Ruby uses UTF-8 internally(?). >> > > I can speak with experience that perl uses UTF-8 internally, and I > believe linux and Python do as well. In general, UTF-8 is preferred > for systems that have a lot of legacy, C library concerns, largely > because of the property that ASCII (characters 0 - 127) are completely > unchanged, and the fact that there is never any confusion about NUL - > if you see a NUL byte anywhere in a UTF-8 encoded stream, its > guaranteed to be NUL, whereas in UTF-16 or UTF-32, it could just be > padding for a character less than 256. > >> 16-bit seems reasonable to me as a first step. We should probably also >> provide validation tables so that user code does not need to provide >> its own validation, which may lead to buggy implementations and >> needlessly duplicated code. >> >> Thought experiment: let's say that every package on a Linux >> distribution has a config file, and every package has a flex-bison >> parser to read that config file. If every package wanted to support >> UTF-8 configuration tomorrow, what's the best way to get there? >> > > Simple - I'd started this on https://github.com/PeterMartini/flex, and > using that flex right now would produce identical code without the > UTF-8 flag in the parser turned on, since it requires no changes to > the byte sizes of the tables. > >> I'd love if it we could also support UTF-8 scanner definitions, rather >> than \x1234 all over the place. It'd be really nice for an engineer >> who speaks [furrin lanwidge] to be able to write code in [furrin >> letters] and have it Just Work (TM). >> > > Agreed. > >> Cheers, >> Aaron >> >> >> On Sun, Mar 11, 2012 at 7:52 AM, Paul <pa...@pr...> wrote: >>> Well actually I found it relatively straight forward once the the type >>> casting was sorted out. >>> Because of the fuss about 16bit vs 32 bit unicode, I would be quite >>> happy to see this >>> work using the partially implemented -16bit flag if that would be more >>> politically >>> acceptable. >> >> What's the 32-bit fuss? >> >> Maybe we set up the -U flag to take an argument, e.g. -U utf-8 or -U >> 8/16LE/16BE/32LE/32BE ? >> >>> It was tested with the three styles of flex, normal re-entrant and class >>> although class >>> is a bit useless. Also works with bison-bridge. Have been using it >>> heavily for the last three >>> months. I will also work using the %unicode option. >>> An example rule using flex & bison with an encapsulating class would be >>> ([\xe00a]) {printf("flex Unicode @\n"); return x_yy::x_yyparse::token::SUM;} >>> Where the \xe00a is in the user defined part of unicode. >>> >>> Paul Neelands >>> >>> On 03/11/2012 10:39 AM, Will Estes wrote: >>>> Paul, >>>> >>>> Thanks for your posting of this patch. >>>> >>>> As you know, unicode support is not a trivial change, so we'll be >>>> evaluating this to make sure it's what we want for flex. >>>> >>>> Any and all, your ideas, suggestions and comments on this patch. >>>> >>>> --Will >>>> >>>> On Sunday, 11 March 2012, 10:24 am -0400, Paul<pa...@pr...> wrote: >>>> >>>>> Attached is the diff of flex-2.5.35 to flex-2.5.35.U >>>>> The flag -U has been added to enable Unicode 16, otherwise it >>>>> behaves as flex 2.5.35 >>>>> To enter a Unicode character in a rule use \x0000. i.e. \x and >>>>> exactly 4 hex digits. >>>>> An example of a rule is: >>>>> ID ([a-zA-Z\x391-\x3a9\x3b1-\x3c9][a-zA-Z0-9\x391-\x3a9\x3b1-\x3c9]*) >>>>> Which is a-Z, 0-9, and the Greek upper& lower case Unicode letters. >>>>> For licenses, whatever covers flex2.5.35, covers this as well. >>>>> I have only tested this with Kubuntu 11.10. >>>>> Much thanks to the Unicode patch for flex-2.5.4a which was the basis >>>>> for this work. >>>>> >>>>> Cheers, >>>>> >>>>> Paul Neelands >>>>> diff flex-2.5.35/ccl.c flex-2.5.35.U/ccl.c >>>>> 83c83 >>>>> < ccltbl = reallocate_Character_array (ccltbl, >>>>> --- >>>>>> ccltbl = reallocate_wchar_array( ccltbl, >>>>> Only in flex-2.5.35.U: ccl.c~ >>>>> Only in flex-2.5.35: config.h >>>>> Only in flex-2.5.35: config.log >>>>> Only in flex-2.5.35: config.status >>>>> Only in flex-2.5.35: .deps >>>>> Common subdirectories: flex-2.5.35/doc and flex-2.5.35.U/doc >>>>> diff flex-2.5.35/ecs.c flex-2.5.35.U/ecs.c >>>>> 116c116 >>>>> < Char ccls[]; >>>>> --- >>>>>> wchar_t ccls[]; >>>>> Only in flex-2.5.35.U: ecs.c~ >>>>> Common subdirectories: flex-2.5.35/examples and flex-2.5.35.U/examples >>>>> diff flex-2.5.35/flexdef.h flex-2.5.35.U/flexdef.h >>>>> 108,109c108,109 >>>>> < /* Always be prepared to generate an 8-bit scanner. */ >>>>> < #define CSIZE 256 >>>>> --- >>>>>> /* Always be prepared to generate a 16-bit scanner. */ >>>>>> #define CSIZE 65536 >>>>> 648c648 >>>>> < extern Char *ccltbl; >>>>> --- >>>>>> extern wchar_t *ccltbl; >>>>> 678a679,684 >>>>>> #define allocate_wchar_array(size) \ >>>>>> (wchar_t *) allocate_array( size, sizeof( wchar_t ) ) >>>>>> >>>>>> #define reallocate_wchar_array(array,size) \ >>>>>> (wchar_t *) reallocate_array( (void *) array, size, sizeof( wchar_t ) ) >>>>>> >>>>> 778c784 >>>>> < extern void mkeccl PROTO ((Char[], int, int[], int[], int, int)); >>>>> --- >>>>>> extern void mkeccl PROTO ((wchar_t[], int, int[], int[], int, int)); >>>>> 866c872 >>>>> < extern void cshell PROTO ((Char[], int, int)); >>>>> --- >>>>>> extern void cshell PROTO ((wchar_t[], int, int)); >>>>> 930c936 >>>>> < extern Char myesc PROTO ((Char[])); >>>>> --- >>>>>> extern int myesc PROTO ((Char[])); >>>>> Only in flex-2.5.35.U: flexdef.h~ >>>>> diff flex-2.5.35/FlexLexer.h flex-2.5.35.U/FlexLexer.h >>>>> 36a37,38 >>>>>> // Since this header is generic for all sizes of flex scanners, you must >>>>>> // define the type YY_CHAR before including it: >>>>> 39a42 >>>>>> // typedef xxx YY_CHAR; >>>>> 43a47 >>>>>> // typedef xxx YY_CHAR; >>>>> 65c69 >>>>> < const char* YYText() const { return yytext; } >>>>> --- >>>>>> const YY_CHAR* YYText() const { return yytext; } >>>>> 95c99 >>>>> < char* yytext; >>>>> --- >>>>>> YY_CHAR* yytext; >>>>> 133,134c137,138 >>>>> < virtual int LexerInput( char* buf, int max_size ); >>>>> < virtual void LexerOutput( const char* buf, int size ); >>>>> --- >>>>>> virtual int LexerInput( YY_CHAR* buf, int max_size ); >>>>>> virtual void LexerOutput( const YY_CHAR* buf, int size ); >>>>> 137c141 >>>>> < void yyunput( int c, char* buf_ptr ); >>>>> --- >>>>>> void yyunput( int c, YY_CHAR* buf_ptr ); >>>>> 160c164 >>>>> < char yy_hold_char; >>>>> --- >>>>>> YY_CHAR yy_hold_char; >>>>> 166c170 >>>>> < char* yy_c_buf_p; >>>>> --- >>>>>> YY_CHAR* yy_c_buf_p; >>>>> 185c189 >>>>> < char* yy_last_accepting_cpos; >>>>> --- >>>>>> YY_CHAR* yy_last_accepting_cpos; >>>>> 190c194 >>>>> < char* yy_full_match; >>>>> --- >>>>>> YY_CHAR* yy_full_match; >>>>> Only in flex-2.5.35.U: FlexLexer.h~ >>>>> diff flex-2.5.35/flex.skl flex-2.5.35.U/flex.skl >>>>> 126c126 >>>>> < M4_GEN_PREFIX(`_scan_bytes') >>>>> --- >>>>>> M4_GEN_PREFIX(`_scan_chars') >>>>> 274a275 >>>>>> *out for U pn >>>>> 276c277 >>>>> < #define YY_SC_TO_UI(c) ((unsigned int) (unsigned char) c) >>>>> --- >>>>>> /* #define YY_SC_TO_UI(c) ((unsigned int) (unsigned char) c) pn*/ >>>>> 543,544c544,545 >>>>> < char *yy_ch_buf; /* input buffer */ >>>>> < char *yy_buf_pos; /* current position in input buffer */ >>>>> --- >>>>>> YY_CHAR *yy_ch_buf; /* input buffer */ >>>>>> YY_CHAR *yy_buf_pos; /* current position in input buffer */ >>>>> 546c547 >>>>> < /* Size of input buffer in bytes, not including room for EOB >>>>> --- >>>>>> /* Size of input buffer in chars, not including room for EOB >>>>> 642c643 >>>>> < static char yy_hold_char; >>>>> --- >>>>>> static YY_CHAR yy_hold_char; >>>>> 647c648 >>>>> < static char *yy_c_buf_p = (char *) 0; >>>>> --- >>>>>> static YY_CHAR *yy_c_buf_p = (char *) 0; >>>>> 678,680c679,684 >>>>> < YY_BUFFER_STATE yy_scan_buffer M4_YY_PARAMS( char *base, yy_size_t size M4_YY_PROTO_LAST_ARG ); >>>>> < YY_BUFFER_STATE yy_scan_string M4_YY_PARAMS( yyconst char *yy_str M4_YY_PROTO_LAST_ARG ); >>>>> < YY_BUFFER_STATE yy_scan_bytes M4_YY_PARAMS( yyconst char *bytes, int len M4_YY_PROTO_LAST_ARG ); >>>>> --- >>>>>> YY_BUFFER_STATE yy_scan_buffer M4_YY_PARAMS( YY_CHAR *base, yy_size_t size M4_YY_PROTO_LAST_ARG ); >>>>>> YY_BUFFER_STATE yy_scan_string M4_YY_PARAMS( yyconst YY_CHAR *yy_str M4_YY_PROTO_LAST_ARG ); >>>>>> /* This is the old yy_scan_bytes function - renamed to avoid >>>>>> * confusion since a character may now be 1 or 2 bytes. >>>>>> */ >>>>>> YY_BUFFER_STATE yy_scan_chars M4_YY_PARAMS( yyconst YY_CHAR *chars, int len M4_YY_PROTO_LAST_ARG ); >>>>> 747c751 >>>>> < *yy_cp = '\0'; \ >>>>> --- >>>>>> *yy_cp = (YY_CHAR) '\0'; \ >>>>> 805c809 >>>>> < char yy_hold_char; >>>>> --- >>>>>> YY_CHAR yy_hold_char; >>>>> 808c812 >>>>> < char *yy_c_buf_p; >>>>> --- >>>>>> YY_CHAR *yy_c_buf_p; >>>>> 816c820 >>>>> < char* yy_last_accepting_cpos; >>>>> --- >>>>>> YY_CHAR* yy_last_accepting_cpos; >>>>> 825c829 >>>>> < char *yy_full_match; >>>>> --- >>>>>> YY_CHAR *yy_full_match; >>>>> 837,838c841,842 >>>>> < char yytext_r[YYLMAX]; >>>>> < char *yytext_ptr; >>>>> --- >>>>>> YY_CHAR yytext_r[YYLMAX]; >>>>>> YY_CHAR *yytext_ptr; >>>>> 843c847 >>>>> < char *yytext_r; >>>>> --- >>>>>> YY_CHAR *yytext_r; >>>>> 999c1003 >>>>> < static void yyunput M4_YY_PARAMS( int c, char *buf_ptr M4_YY_PROTO_LAST_ARG); >>>>> --- >>>>>> static void yyunput M4_YY_PARAMS( int c, (YY_CHAR) *buf_ptr M4_YY_PROTO_LAST_ARG); >>>>> 1005c1009 >>>>> < static void yy_flex_strncpy M4_YY_PARAMS( char *, yyconst char *, int M4_YY_PROTO_LAST_ARG); >>>>> --- >>>>>> static void yy_flex_strncpy M4_YY_PARAMS( (YY_CHAR) *, yyconst char *, int M4_YY_PROTO_LAST_ARG); >>>>> 1009c1013 >>>>> < static int yy_flex_strlen M4_YY_PARAMS( yyconst char * M4_YY_PROTO_LAST_ARG); >>>>> --- >>>>>> static int yy_flex_strlen M4_YY_PARAMS( yyconst (YY_CHAR) * M4_YY_PROTO_LAST_ARG); >>>>> 1077c1081 >>>>> < #define ECHO fwrite( yytext, yyleng, 1, yyout ) >>>>> --- >>>>>> #define ECHO (void) fwrite( yytext, sizeof( YY_CHAR ), yyleng, yyout ) >>>>> 1095c1099 >>>>> < if ( (result = LexerInput( (char *) buf, max_size ))< 0 ) \ >>>>> --- >>>>>> if ( (result = LexerInput( buf, max_size ))< 0 ) \ >>>>> 1239c1243 >>>>> < register char *yy_cp, *yy_bp; >>>>> --- >>>>>> register YY_CHAR *yy_cp, *yy_bp; >>>>> 1535c1539 >>>>> < int yyFlexLexer::LexerInput( char* buf, int /* max_size */ ) >>>>> --- >>>>>> int yyFlexLexer::LexerInput( YY_CHAR* buf, int /* max_size */ ) >>>>> 1537c1541 >>>>> < int yyFlexLexer::LexerInput( char* buf, int max_size ) >>>>> --- >>>>>> int yyFlexLexer::LexerInput( YY_CHAR* buf, int max_size ) >>>>> 1544c1548 >>>>> < yyin->get( buf[0] ); >>>>> --- >>>>>> (void) yyin->read((unsigned char *) buf, sizeof( YY_CHAR ) ); >>>>> 1555c1559 >>>>> < (void) yyin->read( buf, max_size ); >>>>> --- >>>>>> (void) yyin->read((unsigned char *) buf, max_size * sizeof( YY_CHAR ) ); >>>>> 1560c1564 >>>>> < return yyin->gcount(); >>>>> --- >>>>>> return ( yyin->gcount() / sizeof( YY_CHAR ) ); >>>>> 1564c1568 >>>>> < void yyFlexLexer::LexerOutput( const char* buf, int size ) >>>>> --- >>>>>> void yyFlexLexer::LexerOutput( const YY_CHAR* buf, int size ) >>>>> 1566c1570 >>>>> < (void) yyout->write( buf, size ); >>>>> --- >>>>>> (void) yyout->write((unsigned char *) buf, size * sizeof( YY_CHAR ) ); >>>>> 1588,1589c1592,1593 >>>>> < register char *dest = YY_CURRENT_BUFFER_LVALUE->yy_ch_buf; >>>>> < register char *source = YY_G(yytext_ptr); >>>>> --- >>>>>> register YY_CHAR *dest = YY_CURRENT_BUFFER_LVALUE->yy_ch_buf; >>>>>> register YY_CHAR *source = YY_G(yytext_ptr); >>>>> 1658c1662 >>>>> < b->yy_ch_buf = (char *) >>>>> --- >>>>>> b->yy_ch_buf = (YY_CHAR *) >>>>> 1661c1665 >>>>> < b->yy_buf_size + 2 M4_YY_CALL_LAST_ARG ); >>>>> --- >>>>>> (b->yy_buf_size + 2)*sizeof( YY_CHAR ) M4_YY_CALL_LAST_ARG ); >>>>> 1737c1741 >>>>> < register char *yy_cp; >>>>> --- >>>>>> register YY_CHAR *yy_cp; >>>>> 1774c1778 >>>>> < static void yyunput YYFARGS2( int,c, register char *,yy_bp) >>>>> --- >>>>>> static void yyunput YYFARGS2( int,c, register YY_CHAR *,yy_bp) >>>>> 1777c1781 >>>>> < void yyFlexLexer::yyunput( int c, register char* yy_bp) >>>>> --- >>>>>> void yyFlexLexer::yyunput( int c, register YY_CHAR* yy_bp) >>>>> 1780c1784 >>>>> < register char *yy_cp; >>>>> --- >>>>>> register YY_CHAR *yy_cp; >>>>> 1792c1796 >>>>> < register char *dest =&YY_CURRENT_BUFFER_LVALUE->yy_ch_buf[ >>>>> --- >>>>>> register YY_CHAR *dest =&YY_CURRENT_BUFFER_LVALUE->yy_ch_buf[ >>>>> 1794c1798 >>>>> < register char *source = >>>>> --- >>>>>> register YY_CHAR *source = >>>>> 1809c1813 >>>>> < *--yy_cp = (char) c; >>>>> --- >>>>>> *--yy_cp = (YY_CHAR) c; >>>>> 1853c1857 >>>>> < *YY_G(yy_c_buf_p) = '\0'; >>>>> --- >>>>>> *YY_G(yy_c_buf_p) = (YY_CHAR)'\0'; >>>>> 1900c1904 >>>>> < *YY_G(yy_c_buf_p) = '\0'; /* preserve yytext */ >>>>> --- >>>>>> *YY_G(yy_c_buf_p) = (YY_CHAR)'\0'; /* preserve yytext */ >>>>> 2016c2020 >>>>> < b->yy_ch_buf = (char *) yyalloc( b->yy_buf_size + 2 M4_YY_CALL_LAST_ARG ); >>>>> --- >>>>>> b->yy_ch_buf = (YY_CHAR *) yyalloc( (b->yy_buf_size + 2)* sizeof( YY_CHAR ) M4_YY_CALL_LAST_ARG ); >>>>> 2292c2296 >>>>> < YY_BUFFER_STATE yy_scan_buffer YYFARGS2( char *,base, yy_size_t ,size) >>>>> --- >>>>>> YY_BUFFER_STATE yy_scan_buffer YYFARGS2( YY_CHAR *,base, yy_size_t ,size) >>>>> 2336c2340 >>>>> < YY_BUFFER_STATE yy_scan_string YYFARGS1( yyconst char *, yystr) >>>>> --- >>>>>> YY_BUFFER_STATE yy_scan_string YYFARGS1( yyconst YY_CHAR *, yystr) >>>>> 2338a2343,2345 >>>>>> int len; >>>>>> for ( len = 0; yy_str[len]; ++len ) >>>>>> ; >>>>> 2340c2347 >>>>> < return yy_scan_bytes( yystr, strlen(yystr) M4_YY_CALL_LAST_ARG); >>>>> --- >>>>>> return yy_scan_chars( yystr, len M4_YY_CALL_LAST_ARG); >>>>> 2356c2363 >>>>> < YY_BUFFER_STATE yy_scan_bytes YYFARGS2( yyconst char *,yybytes, int ,_yybytes_len) >>>>> --- >>>>>> YY_BUFFER_STATE yy_scan_chars YYFARGS2( yyconst YY_CHAR *,yychars, int ,_yybytes_len) >>>>> 2359c2366 >>>>> < char *buf; >>>>> --- >>>>>> YY_CHAR *buf; >>>>> 2365,2366c2372,2373 >>>>> < n = _yybytes_len + 2; >>>>> < buf = (char *) yyalloc( n M4_YY_CALL_LAST_ARG ); >>>>> --- >>>>>> n = _yychars_len + 2; >>>>>> buf = (YY_CHAR *) yyalloc( n sizeof( YY_CHAR ) M4_YY_CALL_LAST_ARG ); >>>>> 2370,2371c2377,2378 >>>>> < for ( i = 0; i< _yybytes_len; ++i ) >>>>> < buf[i] = yybytes[i]; >>>>> --- >>>>>> for ( i = 0; i< _yychars_len; ++i ) >>>>>> buf[i] = yychars[i]; >>>>> 2373c2380 >>>>> < buf[_yybytes_len] = buf[_yybytes_len+1] = YY_END_OF_BUFFER_CHAR; >>>>> --- >>>>>> buf[_yychars_len] = buf[_yychars_len+1] = YY_END_OF_BUFFER_CHAR; >>>>> 2377c2384 >>>>> < YY_FATAL_ERROR( "bad buffer in yy_scan_bytes()" ); >>>>> --- >>>>>> YY_FATAL_ERROR( "bad buffer in yy_scan_chars()" ); >>>>> 2462c2469 >>>>> < static void yy_fatal_error YYFARGS1(yyconst char*, msg) >>>>> --- >>>>>> static void yy_fatal_error YYFARGS1(yyconst YY_CHAR*, msg) >>>>> 2490c2497 >>>>> < *YY_G(yy_c_buf_p) = '\0'; \ >>>>> --- >>>>>> *YY_G(yy_c_buf_p) = (YY_CHAR) '\0'; \ >>>>> 2945c2952 >>>>> < static void yy_flex_strncpy YYFARGS3( char*,s1, yyconst char *,s2, int,n) >>>>> --- >>>>>> static void yy_flex_strncpy YYFARGS3( YY_CHAR*,s1, yyconst YY_CHAR *,s2, int,n) >>>>> 2957c2964 >>>>> < static int yy_flex_strlen YYFARGS1( yyconst char *,s) >>>>> --- >>>>>> static int yy_flex_strlen YYFARGS1( yyconst YY_CHAR *,s) >>>>> Only in flex-2.5.35.U: flex.skl~ >>>>> diff flex-2.5.35/gen.c flex-2.5.35.U/gen.c >>>>> 941c941 >>>>> < indent_puts ("register char *yy_cp = YY_G(yy_c_buf_p);"); >>>>> --- >>>>>> indent_puts ("register YY_CHAR *yy_cp = YY_G(yy_c_buf_p);"); >>>>> 1690c1690 >>>>> < ("static char *yy_last_accepting_cpos;\n"); >>>>> --- >>>>>> ("static YY_CHAR *yy_last_accepting_cpos;\n"); >>>>> 1762c1762 >>>>> < outn ("static char *yy_full_match;"); >>>>> --- >>>>>> outn ("static YY_CHAR *yy_full_match;"); >>>>> 1857,1858c1857,1858 >>>>> < outn ("char yytext[YYLMAX];"); >>>>> < outn ("char *yytext_ptr;"); >>>>> --- >>>>>> outn ("YY_CHAR yytext[YYLMAX];"); >>>>>> outn ("YY_CHAR *yytext_ptr;"); >>>>> 1864c1864 >>>>> < outn ("char *yytext;"); >>>>> --- >>>>>> outn ("YY_CHAR *yytext;"); >>>>> 1877c1877 >>>>> < outn ("\twhile ( (result = read( fileno(yyin), (char *) buf, max_size ))< 0 ) \\"); >>>>> --- >>>>>> outn ("\twhile ( (result = read( fileno(yyin), (char *) buf, max_size* sizeof( YY_CHAR ) ))< 0 ) \\"); >>>>> 1895,1896c1895,1905 >>>>> < outn ("\t\t\t (c = getc( yyin )) != EOF&& c != '\\n'; ++n ) \\"); >>>>> < outn ("\t\t\tbuf[n] = (char) c; \\"); >>>>> --- >>>>>> >>>>>> if ( csize == 65536 ) >>>>>> outn( >>>>>> "\t\t\t (c = getwc( yyin )) != WEOF&& c != '\\n'; ++n ) \\" ); >>>>>> else >>>>>> outn( >>>>>> "\t\t\t (c = getc( yyin )) != EOF&& c != '\\n'; ++n ) \\" ); >>>>>> >>>>>> outn( "\t\t\tbuf[n] = (YY_CHAR) c; \\" ); >>>>>> >>>>> 1898,1899c1907,1918 >>>>> < outn ("\t\t\tbuf[n++] = (char) c; \\"); >>>>> < outn ("\t\tif ( c == EOF&& ferror( yyin ) ) \\"); >>>>> --- >>>>>> outn( "\t\t\tbuf[n++] = (YY_CHAR) c; \\" ); >>>>>> >>>>>> if ( csize == 65536 ) >>>>>> outn( >>>>>> "\t\tif ( c == WEOF&& ferror( yyin ) ) \\" ); >>>>>> else >>>>>> outn( >>>>>> "\t\tif ( c == EOF&& ferror( yyin ) ) \\" ); >>>>>> >>>>>> >>>>>> >>>>> 1902a1922 >>>>> 1906c1926 >>>>> < outn ("\t\twhile ( (result = fread(buf, 1, max_size, yyin))==0&& ferror(yyin)) \\"); >>>>> --- >>>>>> outn ("\t\twhile ( (result = fread(buf, sizeof( YY_CHAR ), max_size, yyin))==0&& ferror(yyin)) \\"); >>>>> Only in flex-2.5.35.U: gen.c~ >>>>> Common subdirectories: flex-2.5.35/m4 and flex-2.5.35.U/m4 >>>>> diff flex-2.5.35/main.c flex-2.5.35.U/main.c >>>>> 96c96 >>>>> < Char *ccltbl; >>>>> --- >>>>>> wchar_t *ccltbl; >>>>> 265c265 >>>>> < csize = CSIZE; >>>>> --- >>>>>> csize = 256; >>>>> 306a307,326 >>>>>> if ( csize == 65536 ) >>>>>> { >>>>>> if ( fulltbl ) >>>>>> { >>>>>> if ( use_read ) >>>>>> flexerror( _( "Can't use -f with -U" ) ); >>>>>> else >>>>>> flexerror( _( "Can't use -Cf with -U" ) ); >>>>>> } >>>>>> else if ( fullspd ) >>>>>> { >>>>>> if ( use_read ) >>>>>> flexerror( _( "Can't use -F with -U" ) ); >>>>>> else >>>>>> flexerror( _( "Can't use -CF with -U" ) ); >>>>>> } >>>>>> else if ( ! useecs&& ! usemecs ) >>>>>> flexerror( _( "Can't use -C with -U" ) ); >>>>>> } >>>>>> >>>>> 483a504,532 >>>>>> outn( "/* Define the YY_CHAR type. */" ); >>>>>> >>>>>> switch (csize) { >>>>>> case 65536: >>>>>> outn( "#include<wchar.h>" ); >>>>>> outn( "typedef unsigned short YY_CHAR;" ); >>>>>> break; >>>>>> case 256: >>>>>> outn( "typedef unsigned char YY_CHAR;" ); >>>>>> break; >>>>>> default: >>>>>> outn( "typedef char YY_CHAR;" ); >>>>>> break; >>>>>> } >>>>>> >>>>>> outn( "\n/* Promotes a YY_CHAR to an unsigned integer for use as an array index. */"); >>>>>> >>>>>> switch (csize) { >>>>>> case 65536: >>>>>> case 256: >>>>>> outn( "#define YY_SC_TO_UI(c) ((unsigned int) (unsigned short) c)" ); >>>>>> break; >>>>>> default: >>>>>> outn( >>>>>> "#define YY_SC_TO_UI(c) ((unsigned int) (unsigned char) c)" ); >>>>>> break; >>>>>> } >>>>>> >>>>>> skelout(); >>>>> 789a839,840 >>>>>> else if ( csize == 256 ) >>>>>> putc( '8', stderr ); >>>>> 791c842 >>>>> < putc ('8', stderr); >>>>> --- >>>>>> putc( 'U', stderr ); >>>>> 1208c1259,1263 >>>>> < csize = CSIZE; >>>>> --- >>>>>> csize = 256; >>>>>> break; >>>>>> >>>>>> case OPT_UNICODE: >>>>>> csize = 65536; >>>>> 1589,1592c1644,1647 >>>>> < if (csize == 256) >>>>> < outn ("typedef unsigned char YY_CHAR;"); >>>>> < else >>>>> < outn ("typedef char YY_CHAR;"); >>>>> --- >>>>>> //if (csize == 256) >>>>>> // outn ("typedef unsigned char YY_CHAR;"); >>>>>> //else >>>>>> // outn ("typedef char YY_CHAR;"); >>>>> 1677c1732 >>>>> < outn ("extern char yytext[];\n"); >>>>> --- >>>>>> outn ("extern YY_CHAR yytext[];\n"); >>>>> 1684c1739 >>>>> < outn ("extern char *yytext;"); >>>>> --- >>>>>> outn ("extern YY_CHAR *yytext;"); >>>>> 1744c1799 >>>>> < ccltbl = allocate_Character_array (current_max_ccl_tbl_size); >>>>> --- >>>>>> ccltbl = allocate_wchar_array (current_max_ccl_tbl_size); >>>>> 1830c1885,1886 >>>>> < " -B, --batch generate batch scanner (opposite of -I)\n" >>>>> --- >>>>>> " -U, generate 16-bit scanner\n" >>>>>> " -B, --batch generate batch scanner (opposite of -I)\n" >>>>> Only in flex-2.5.35.U: main.c~ >>>>> Only in flex-2.5.35: Makefile >>>>> diff flex-2.5.35/misc.c flex-2.5.35.U/misc.c >>>>> 254,256c254,264 >>>>> < lerrsf (_ >>>>> < ("scanner requires -8 flag to use the character %s"), >>>>> < readable_form (c)); >>>>> --- >>>>>> { >>>>>> if ( c< 256 ) >>>>>> lerrsf( >>>>>> _( "scanner requires -8 flag to use the character %s" ), >>>>>> readable_form( c ) ); >>>>>> else >>>>>> lerrsf( >>>>>> _( "scanner requires -U flag to use the character %s" ), >>>>>> readable_form( c ) ); >>>>>> >>>>>> } >>>>> 336c344 >>>>> < Char v[]; >>>>> --- >>>>>> wchar_t v[]; >>>>> 340c348 >>>>> < Char k; >>>>> --- >>>>>> wchar_t k; >>>>> 615c623 >>>>> < Char myesc (array) >>>>> --- >>>>>> int myesc (array) >>>>> 618c626,627 >>>>> < Char c, esc_char; >>>>> --- >>>>>> Char c; >>>>>> unsigned int esc_char; >>>>> Only in flex-2.5.35.U: misc.c~ >>>>> diff flex-2.5.35/options.c flex-2.5.35.U/options.c >>>>> 200,201c200,201 >>>>> < {"-U", OPT_8BIT, 0} >>>>> < , /* Do not include unistd.h */ >>>>> --- >>>>>> {"-U", OPT_UNICODE, 0} >>>>>> , >>>>> Only in flex-2.5.35: options.c~ >>>>> diff flex-2.5.35/options.h flex-2.5.35.U/options.h >>>>> 44a45 >>>>>> OPT_UNICODE, >>>>> Common subdirectories: flex-2.5.35/po and flex-2.5.35.U/po >>>>> Only in flex-2.5.35: stamp-h1 >>>>> diff flex-2.5.35/tblcmp.c flex-2.5.35.U/tblcmp.c >>>>> 687c687 >>>>> < Char transset[CSIZE + 1]; >>>>> --- >>>>>> wchar_t transset[CSIZE + 1]; >>>>> Only in flex-2.5.35.U: tblcmp.c~ >>>>> Common subdirectories: flex-2.5.35/tests and flex-2.5.35.U/tests >>>>> Common subdirectories: flex-2.5.35/tools and flex-2.5.35.U/tools >>>>> ------------------------------------------------------------------------------ >>>>> Virtualization& Cloud Management Using Capacity Planning >>>>> Cloud computing makes use of virtualization - but cloud computing >>>>> also focuses on allowing computing to be delivered as a service. >>>>> http://www.accelacomm.com/jaw/sfnl/114/51521223/ >>>>> _______________________________________________ >>>>> Flex-devel mailing list >>>>> Fle...@li... >>>>> https://lists.sourceforge.net/lists/listinfo/flex-devel >>>> >>> >>> ------------------------------------------------------------------------------ >>> Virtualization & Cloud Management Using Capacity Planning >>> Cloud computing makes use of virtualization - but cloud computing >>> also focuses on allowing computing to be delivered as a service. >>> http://www.accelacomm.com/jaw/sfnl/114/51521223/ >>> _______________________________________________ >>> Flex-devel mailing list >>> Fle...@li... >>> https://lists.sourceforge.net/lists/listinfo/flex-devel >> >> ------------------------------------------------------------------------------ >> Try before you buy = See our experts in action! >> The most comprehensive online learning library for Microsoft developers >> is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3, >> Metro Style Apps, more. Free future releases when you subscribe now! >> http://p.sf.net/sfu/learndevnow-dev2 >> _______________________________________________ >> Flex-devel mailing list >> Fle...@li... >> https://lists.sourceforge.net/lists/listinfo/flex-devel |
From: Aaron S. <aa...@se...> - 2012-03-13 00:16:23
|
On Mon, Mar 12, 2012 at 4:46 PM, Peter Martini <pet...@gm...> wrote: > A few things to keep in mind for flex to support Unicode. > > Broadly, Unicode support can be split into three parts - parsing bytes > into code points, assigning one (or more!) of those code points to > characters, and and assigning properties to those characters. > > The first part is much simpler: Unicode is generally encoded in either > UTF-8 (a variable width encoding scheme, optimized for backwards > compatibility with ASCII, and to a lesser extent, Latin-1) or > UTF-16/UTF-16LE/UTF-16BE (most notably Windows and Mac). UTF-16 > includes a BOM (0xFEFF) at the start of the text to allow the parser > to infer whether the text was written with little-endian or big-endian > tools; the UTF-16BE and UTF-16LE variants, as their names imply, are > specifically not supposed to have the BOM since the name of the > variant identifies which encoding to use. SPARC and PowerPC are very > common big-endian server architectures. Its worth noting that Mac OS > X made the transition from a big-endian to a little-endian platform, > and does quite a lot to hide those details from the programmer, but > flex would be operating at a level where that could be significant. I > don't recall their file encoding, and don't have my Mac OS X / PowerPC > machine handy to test. > > Supporting any one encoding isn't too difficult; we've just seen the > work to change from a single byte to a double byte encoding on this > list, and I've done some work separately > (https://github.com/PeterMartini/flex) to support UTF-8. Even > supporting a compiler flag is pretty straightforward. Supporting an > option in the lexer though could get a little hairy; do we want to > support transitioning from one encoding to another? > > There's also the issue of what to do about the BOM, which I was able > to side-step in my UTF-8 work, since as far as UTF-8 is concerned, its > a noncharacter. I'd like to get this part handled for sure. At this point, I think the state of Unicode has settled down a bit, with the major winners being UTF-8 and UTF-16. The UCS encoding are mercifully dead. (If anybody on the list knows of other encodings with an important constituency, please speak up). The wchar_t patch posted earlier is probably not an ideal approach. Rather than hoping for system provided 16-bit wchar_t, I think flex should define its own 16-bit type. That way, you know you're working with 16 bits. http://icu-project.org/docs/papers/unicode_wchar_t.html > So, that's part 1, parsing text into codepoints (with the additional > complication that in UTF-16, a single codepoint must be encoded as a > pair of surrogates). What I'm calling part 2 is combining character > sequences into graphemes. A grapheme is multiple codepoints visually > represented as one unit on your screen / page. The canonical example > of this is, and one that shows where it can get complex, is á - it can > be stored as either U+00C1 (a-acute) or the two codepoints > U+0041,U+0301 (a followed by combining acute). It's up to the > application to determine whether the two are considered equivalent; > something which flex could legitimately leave to the application > developer, but would be a useful thing to have. IIRC, the rule tables are fairly sizable, and subject to change. I'd prefer to punt on this. Recommending ICU seems to be the way to go: http://icu-project.org/ > Finally, part 3, applying Unicode properties. This is the moving > target that makes which version of the Unicode standard an application > supports relevant. The simplest properties are character names - you > could reference WHITE SMILING FACE instead of U+263A and mean the same > thing. Case sensitivity is actually a fairly complicated property; > one of the canonical examples here is the German Eszett, ß, which is > equivalent to ss when matched case insensitively. While flex could > get away with not supporting many properties, handling case > insensitivity in some manner should be addressed. Almost certainly has to be external libraries here, however flex does have the "-i" flag for case-insensitive scanning. I think it'd be reasonable to say that applies only to ASCII, and recommend that applications perform their own case matching with a Unicode library in the future. > Anyway, this is just a brain dump, please feel free to pick at the > details or ask questions; I'm hardly an expert. > > Regards, > Peter Martini > > ------------------------------------------------------------------------------ > Try before you buy = See our experts in action! > The most comprehensive online learning library for Microsoft developers > is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3, > Metro Style Apps, more. Free future releases when you subscribe now! > http://p.sf.net/sfu/learndevnow-dev2 > _______________________________________________ > Flex-devel mailing list > Fle...@li... > https://lists.sourceforge.net/lists/listinfo/flex-devel |
From: Peter M. <pet...@gm...> - 2012-03-12 23:46:18
|
A few things to keep in mind for flex to support Unicode. Broadly, Unicode support can be split into three parts - parsing bytes into code points, assigning one (or more!) of those code points to characters, and and assigning properties to those characters. The first part is much simpler: Unicode is generally encoded in either UTF-8 (a variable width encoding scheme, optimized for backwards compatibility with ASCII, and to a lesser extent, Latin-1) or UTF-16/UTF-16LE/UTF-16BE (most notably Windows and Mac). UTF-16 includes a BOM (0xFEFF) at the start of the text to allow the parser to infer whether the text was written with little-endian or big-endian tools; the UTF-16BE and UTF-16LE variants, as their names imply, are specifically not supposed to have the BOM since the name of the variant identifies which encoding to use. SPARC and PowerPC are very common big-endian server architectures. Its worth noting that Mac OS X made the transition from a big-endian to a little-endian platform, and does quite a lot to hide those details from the programmer, but flex would be operating at a level where that could be significant. I don't recall their file encoding, and don't have my Mac OS X / PowerPC machine handy to test. Supporting any one encoding isn't too difficult; we've just seen the work to change from a single byte to a double byte encoding on this list, and I've done some work separately (https://github.com/PeterMartini/flex) to support UTF-8. Even supporting a compiler flag is pretty straightforward. Supporting an option in the lexer though could get a little hairy; do we want to support transitioning from one encoding to another? There's also the issue of what to do about the BOM, which I was able to side-step in my UTF-8 work, since as far as UTF-8 is concerned, its a noncharacter. So, that's part 1, parsing text into codepoints (with the additional complication that in UTF-16, a single codepoint must be encoded as a pair of surrogates). What I'm calling part 2 is combining character sequences into graphemes. A grapheme is multiple codepoints visually represented as one unit on your screen / page. The canonical example of this is, and one that shows where it can get complex, is á - it can be stored as either U+00C1 (a-acute) or the two codepoints U+0041,U+0301 (a followed by combining acute). It's up to the application to determine whether the two are considered equivalent; something which flex could legitimately leave to the application developer, but would be a useful thing to have. Finally, part 3, applying Unicode properties. This is the moving target that makes which version of the Unicode standard an application supports relevant. The simplest properties are character names - you could reference WHITE SMILING FACE instead of U+263A and mean the same thing. Case sensitivity is actually a fairly complicated property; one of the canonical examples here is the German Eszett, ß, which is equivalent to ss when matched case insensitively. While flex could get away with not supporting many properties, handling case insensitivity in some manner should be addressed. Anyway, this is just a brain dump, please feel free to pick at the details or ask questions; I'm hardly an expert. Regards, Peter Martini |
From: Peter M. <pet...@gm...> - 2012-03-12 21:31:52
|
On Mon, Mar 12, 2012 at 3:45 PM, Aaron Stone <aa...@se...> wrote: > Hi Paul, > > I think for clarity, could you break your patch into two parts? First, > changing all 'char *' declarations to YY_CHAR *, and in a second patch > add code that handles different sizes of YY_CHAR. > > Am I reading correctly that gist of this Unicode mode is handling text > in UTF-16 -- 16-bits per character? Searching the web for a few > minutes, that's going to be the same representation used by Windows, > Java, .NET, and OS X, and also Python and Perl. Linux appears to use > UTF-32 internall(?). Ruby uses UTF-8 internally(?). > I can speak with experience that perl uses UTF-8 internally, and I believe linux and Python do as well. In general, UTF-8 is preferred for systems that have a lot of legacy, C library concerns, largely because of the property that ASCII (characters 0 - 127) are completely unchanged, and the fact that there is never any confusion about NUL - if you see a NUL byte anywhere in a UTF-8 encoded stream, its guaranteed to be NUL, whereas in UTF-16 or UTF-32, it could just be padding for a character less than 256. > 16-bit seems reasonable to me as a first step. We should probably also > provide validation tables so that user code does not need to provide > its own validation, which may lead to buggy implementations and > needlessly duplicated code. > > Thought experiment: let's say that every package on a Linux > distribution has a config file, and every package has a flex-bison > parser to read that config file. If every package wanted to support > UTF-8 configuration tomorrow, what's the best way to get there? > Simple - I'd started this on https://github.com/PeterMartini/flex, and using that flex right now would produce identical code without the UTF-8 flag in the parser turned on, since it requires no changes to the byte sizes of the tables. > I'd love if it we could also support UTF-8 scanner definitions, rather > than \x1234 all over the place. It'd be really nice for an engineer > who speaks [furrin lanwidge] to be able to write code in [furrin > letters] and have it Just Work (TM). > Agreed. > Cheers, > Aaron > > > On Sun, Mar 11, 2012 at 7:52 AM, Paul <pa...@pr...> wrote: >> Well actually I found it relatively straight forward once the the type >> casting was sorted out. >> Because of the fuss about 16bit vs 32 bit unicode, I would be quite >> happy to see this >> work using the partially implemented -16bit flag if that would be more >> politically >> acceptable. > > What's the 32-bit fuss? > > Maybe we set up the -U flag to take an argument, e.g. -U utf-8 or -U > 8/16LE/16BE/32LE/32BE ? > >> It was tested with the three styles of flex, normal re-entrant and class >> although class >> is a bit useless. Also works with bison-bridge. Have been using it >> heavily for the last three >> months. I will also work using the %unicode option. >> An example rule using flex & bison with an encapsulating class would be >> ([\xe00a]) {printf("flex Unicode @\n"); return x_yy::x_yyparse::token::SUM;} >> Where the \xe00a is in the user defined part of unicode. >> >> Paul Neelands >> >> On 03/11/2012 10:39 AM, Will Estes wrote: >>> Paul, >>> >>> Thanks for your posting of this patch. >>> >>> As you know, unicode support is not a trivial change, so we'll be >>> evaluating this to make sure it's what we want for flex. >>> >>> Any and all, your ideas, suggestions and comments on this patch. >>> >>> --Will >>> >>> On Sunday, 11 March 2012, 10:24 am -0400, Paul<pa...@pr...> wrote: >>> >>>> Attached is the diff of flex-2.5.35 to flex-2.5.35.U >>>> The flag -U has been added to enable Unicode 16, otherwise it >>>> behaves as flex 2.5.35 >>>> To enter a Unicode character in a rule use \x0000. i.e. \x and >>>> exactly 4 hex digits. >>>> An example of a rule is: >>>> ID ([a-zA-Z\x391-\x3a9\x3b1-\x3c9][a-zA-Z0-9\x391-\x3a9\x3b1-\x3c9]*) >>>> Which is a-Z, 0-9, and the Greek upper& lower case Unicode letters. >>>> For licenses, whatever covers flex2.5.35, covers this as well. >>>> I have only tested this with Kubuntu 11.10. >>>> Much thanks to the Unicode patch for flex-2.5.4a which was the basis >>>> for this work. >>>> >>>> Cheers, >>>> >>>> Paul Neelands >>>> diff flex-2.5.35/ccl.c flex-2.5.35.U/ccl.c >>>> 83c83 >>>> < ccltbl = reallocate_Character_array (ccltbl, >>>> --- >>>>> ccltbl = reallocate_wchar_array( ccltbl, >>>> Only in flex-2.5.35.U: ccl.c~ >>>> Only in flex-2.5.35: config.h >>>> Only in flex-2.5.35: config.log >>>> Only in flex-2.5.35: config.status >>>> Only in flex-2.5.35: .deps >>>> Common subdirectories: flex-2.5.35/doc and flex-2.5.35.U/doc >>>> diff flex-2.5.35/ecs.c flex-2.5.35.U/ecs.c >>>> 116c116 >>>> < Char ccls[]; >>>> --- >>>>> wchar_t ccls[]; >>>> Only in flex-2.5.35.U: ecs.c~ >>>> Common subdirectories: flex-2.5.35/examples and flex-2.5.35.U/examples >>>> diff flex-2.5.35/flexdef.h flex-2.5.35.U/flexdef.h >>>> 108,109c108,109 >>>> < /* Always be prepared to generate an 8-bit scanner. */ >>>> < #define CSIZE 256 >>>> --- >>>>> /* Always be prepared to generate a 16-bit scanner. */ >>>>> #define CSIZE 65536 >>>> 648c648 >>>> < extern Char *ccltbl; >>>> --- >>>>> extern wchar_t *ccltbl; >>>> 678a679,684 >>>>> #define allocate_wchar_array(size) \ >>>>> (wchar_t *) allocate_array( size, sizeof( wchar_t ) ) >>>>> >>>>> #define reallocate_wchar_array(array,size) \ >>>>> (wchar_t *) reallocate_array( (void *) array, size, sizeof( wchar_t ) ) >>>>> >>>> 778c784 >>>> < extern void mkeccl PROTO ((Char[], int, int[], int[], int, int)); >>>> --- >>>>> extern void mkeccl PROTO ((wchar_t[], int, int[], int[], int, int)); >>>> 866c872 >>>> < extern void cshell PROTO ((Char[], int, int)); >>>> --- >>>>> extern void cshell PROTO ((wchar_t[], int, int)); >>>> 930c936 >>>> < extern Char myesc PROTO ((Char[])); >>>> --- >>>>> extern int myesc PROTO ((Char[])); >>>> Only in flex-2.5.35.U: flexdef.h~ >>>> diff flex-2.5.35/FlexLexer.h flex-2.5.35.U/FlexLexer.h >>>> 36a37,38 >>>>> // Since this header is generic for all sizes of flex scanners, you must >>>>> // define the type YY_CHAR before including it: >>>> 39a42 >>>>> // typedef xxx YY_CHAR; >>>> 43a47 >>>>> // typedef xxx YY_CHAR; >>>> 65c69 >>>> < const char* YYText() const { return yytext; } >>>> --- >>>>> const YY_CHAR* YYText() const { return yytext; } >>>> 95c99 >>>> < char* yytext; >>>> --- >>>>> YY_CHAR* yytext; >>>> 133,134c137,138 >>>> < virtual int LexerInput( char* buf, int max_size ); >>>> < virtual void LexerOutput( const char* buf, int size ); >>>> --- >>>>> virtual int LexerInput( YY_CHAR* buf, int max_size ); >>>>> virtual void LexerOutput( const YY_CHAR* buf, int size ); >>>> 137c141 >>>> < void yyunput( int c, char* buf_ptr ); >>>> --- >>>>> void yyunput( int c, YY_CHAR* buf_ptr ); >>>> 160c164 >>>> < char yy_hold_char; >>>> --- >>>>> YY_CHAR yy_hold_char; >>>> 166c170 >>>> < char* yy_c_buf_p; >>>> --- >>>>> YY_CHAR* yy_c_buf_p; >>>> 185c189 >>>> < char* yy_last_accepting_cpos; >>>> --- >>>>> YY_CHAR* yy_last_accepting_cpos; >>>> 190c194 >>>> < char* yy_full_match; >>>> --- >>>>> YY_CHAR* yy_full_match; >>>> Only in flex-2.5.35.U: FlexLexer.h~ >>>> diff flex-2.5.35/flex.skl flex-2.5.35.U/flex.skl >>>> 126c126 >>>> < M4_GEN_PREFIX(`_scan_bytes') >>>> --- >>>>> M4_GEN_PREFIX(`_scan_chars') >>>> 274a275 >>>>> *out for U pn >>>> 276c277 >>>> < #define YY_SC_TO_UI(c) ((unsigned int) (unsigned char) c) >>>> --- >>>>> /* #define YY_SC_TO_UI(c) ((unsigned int) (unsigned char) c) pn*/ >>>> 543,544c544,545 >>>> < char *yy_ch_buf; /* input buffer */ >>>> < char *yy_buf_pos; /* current position in input buffer */ >>>> --- >>>>> YY_CHAR *yy_ch_buf; /* input buffer */ >>>>> YY_CHAR *yy_buf_pos; /* current position in input buffer */ >>>> 546c547 >>>> < /* Size of input buffer in bytes, not including room for EOB >>>> --- >>>>> /* Size of input buffer in chars, not including room for EOB >>>> 642c643 >>>> < static char yy_hold_char; >>>> --- >>>>> static YY_CHAR yy_hold_char; >>>> 647c648 >>>> < static char *yy_c_buf_p = (char *) 0; >>>> --- >>>>> static YY_CHAR *yy_c_buf_p = (char *) 0; >>>> 678,680c679,684 >>>> < YY_BUFFER_STATE yy_scan_buffer M4_YY_PARAMS( char *base, yy_size_t size M4_YY_PROTO_LAST_ARG ); >>>> < YY_BUFFER_STATE yy_scan_string M4_YY_PARAMS( yyconst char *yy_str M4_YY_PROTO_LAST_ARG ); >>>> < YY_BUFFER_STATE yy_scan_bytes M4_YY_PARAMS( yyconst char *bytes, int len M4_YY_PROTO_LAST_ARG ); >>>> --- >>>>> YY_BUFFER_STATE yy_scan_buffer M4_YY_PARAMS( YY_CHAR *base, yy_size_t size M4_YY_PROTO_LAST_ARG ); >>>>> YY_BUFFER_STATE yy_scan_string M4_YY_PARAMS( yyconst YY_CHAR *yy_str M4_YY_PROTO_LAST_ARG ); >>>>> /* This is the old yy_scan_bytes function - renamed to avoid >>>>> * confusion since a character may now be 1 or 2 bytes. >>>>> */ >>>>> YY_BUFFER_STATE yy_scan_chars M4_YY_PARAMS( yyconst YY_CHAR *chars, int len M4_YY_PROTO_LAST_ARG ); >>>> 747c751 >>>> < *yy_cp = '\0'; \ >>>> --- >>>>> *yy_cp = (YY_CHAR) '\0'; \ >>>> 805c809 >>>> < char yy_hold_char; >>>> --- >>>>> YY_CHAR yy_hold_char; >>>> 808c812 >>>> < char *yy_c_buf_p; >>>> --- >>>>> YY_CHAR *yy_c_buf_p; >>>> 816c820 >>>> < char* yy_last_accepting_cpos; >>>> --- >>>>> YY_CHAR* yy_last_accepting_cpos; >>>> 825c829 >>>> < char *yy_full_match; >>>> --- >>>>> YY_CHAR *yy_full_match; >>>> 837,838c841,842 >>>> < char yytext_r[YYLMAX]; >>>> < char *yytext_ptr; >>>> --- >>>>> YY_CHAR yytext_r[YYLMAX]; >>>>> YY_CHAR *yytext_ptr; >>>> 843c847 >>>> < char *yytext_r; >>>> --- >>>>> YY_CHAR *yytext_r; >>>> 999c1003 >>>> < static void yyunput M4_YY_PARAMS( int c, char *buf_ptr M4_YY_PROTO_LAST_ARG); >>>> --- >>>>> static void yyunput M4_YY_PARAMS( int c, (YY_CHAR) *buf_ptr M4_YY_PROTO_LAST_ARG); >>>> 1005c1009 >>>> < static void yy_flex_strncpy M4_YY_PARAMS( char *, yyconst char *, int M4_YY_PROTO_LAST_ARG); >>>> --- >>>>> static void yy_flex_strncpy M4_YY_PARAMS( (YY_CHAR) *, yyconst char *, int M4_YY_PROTO_LAST_ARG); >>>> 1009c1013 >>>> < static int yy_flex_strlen M4_YY_PARAMS( yyconst char * M4_YY_PROTO_LAST_ARG); >>>> --- >>>>> static int yy_flex_strlen M4_YY_PARAMS( yyconst (YY_CHAR) * M4_YY_PROTO_LAST_ARG); >>>> 1077c1081 >>>> < #define ECHO fwrite( yytext, yyleng, 1, yyout ) >>>> --- >>>>> #define ECHO (void) fwrite( yytext, sizeof( YY_CHAR ), yyleng, yyout ) >>>> 1095c1099 >>>> < if ( (result = LexerInput( (char *) buf, max_size ))< 0 ) \ >>>> --- >>>>> if ( (result = LexerInput( buf, max_size ))< 0 ) \ >>>> 1239c1243 >>>> < register char *yy_cp, *yy_bp; >>>> --- >>>>> register YY_CHAR *yy_cp, *yy_bp; >>>> 1535c1539 >>>> < int yyFlexLexer::LexerInput( char* buf, int /* max_size */ ) >>>> --- >>>>> int yyFlexLexer::LexerInput( YY_CHAR* buf, int /* max_size */ ) >>>> 1537c1541 >>>> < int yyFlexLexer::LexerInput( char* buf, int max_size ) >>>> --- >>>>> int yyFlexLexer::LexerInput( YY_CHAR* buf, int max_size ) >>>> 1544c1548 >>>> < yyin->get( buf[0] ); >>>> --- >>>>> (void) yyin->read((unsigned char *) buf, sizeof( YY_CHAR ) ); >>>> 1555c1559 >>>> < (void) yyin->read( buf, max_size ); >>>> --- >>>>> (void) yyin->read((unsigned char *) buf, max_size * sizeof( YY_CHAR ) ); >>>> 1560c1564 >>>> < return yyin->gcount(); >>>> --- >>>>> return ( yyin->gcount() / sizeof( YY_CHAR ) ); >>>> 1564c1568 >>>> < void yyFlexLexer::LexerOutput( const char* buf, int size ) >>>> --- >>>>> void yyFlexLexer::LexerOutput( const YY_CHAR* buf, int size ) >>>> 1566c1570 >>>> < (void) yyout->write( buf, size ); >>>> --- >>>>> (void) yyout->write((unsigned char *) buf, size * sizeof( YY_CHAR ) ); >>>> 1588,1589c1592,1593 >>>> < register char *dest = YY_CURRENT_BUFFER_LVALUE->yy_ch_buf; >>>> < register char *source = YY_G(yytext_ptr); >>>> --- >>>>> register YY_CHAR *dest = YY_CURRENT_BUFFER_LVALUE->yy_ch_buf; >>>>> register YY_CHAR *source = YY_G(yytext_ptr); >>>> 1658c1662 >>>> < b->yy_ch_buf = (char *) >>>> --- >>>>> b->yy_ch_buf = (YY_CHAR *) >>>> 1661c1665 >>>> < b->yy_buf_size + 2 M4_YY_CALL_LAST_ARG ); >>>> --- >>>>> (b->yy_buf_size + 2)*sizeof( YY_CHAR ) M4_YY_CALL_LAST_ARG ); >>>> 1737c1741 >>>> < register char *yy_cp; >>>> --- >>>>> register YY_CHAR *yy_cp; >>>> 1774c1778 >>>> < static void yyunput YYFARGS2( int,c, register char *,yy_bp) >>>> --- >>>>> static void yyunput YYFARGS2( int,c, register YY_CHAR *,yy_bp) >>>> 1777c1781 >>>> < void yyFlexLexer::yyunput( int c, register char* yy_bp) >>>> --- >>>>> void yyFlexLexer::yyunput( int c, register YY_CHAR* yy_bp) >>>> 1780c1784 >>>> < register char *yy_cp; >>>> --- >>>>> register YY_CHAR *yy_cp; >>>> 1792c1796 >>>> < register char *dest =&YY_CURRENT_BUFFER_LVALUE->yy_ch_buf[ >>>> --- >>>>> register YY_CHAR *dest =&YY_CURRENT_BUFFER_LVALUE->yy_ch_buf[ >>>> 1794c1798 >>>> < register char *source = >>>> --- >>>>> register YY_CHAR *source = >>>> 1809c1813 >>>> < *--yy_cp = (char) c; >>>> --- >>>>> *--yy_cp = (YY_CHAR) c; >>>> 1853c1857 >>>> < *YY_G(yy_c_buf_p) = '\0'; >>>> --- >>>>> *YY_G(yy_c_buf_p) = (YY_CHAR)'\0'; >>>> 1900c1904 >>>> < *YY_G(yy_c_buf_p) = '\0'; /* preserve yytext */ >>>> --- >>>>> *YY_G(yy_c_buf_p) = (YY_CHAR)'\0'; /* preserve yytext */ >>>> 2016c2020 >>>> < b->yy_ch_buf = (char *) yyalloc( b->yy_buf_size + 2 M4_YY_CALL_LAST_ARG ); >>>> --- >>>>> b->yy_ch_buf = (YY_CHAR *) yyalloc( (b->yy_buf_size + 2)* sizeof( YY_CHAR ) M4_YY_CALL_LAST_ARG ); >>>> 2292c2296 >>>> < YY_BUFFER_STATE yy_scan_buffer YYFARGS2( char *,base, yy_size_t ,size) >>>> --- >>>>> YY_BUFFER_STATE yy_scan_buffer YYFARGS2( YY_CHAR *,base, yy_size_t ,size) >>>> 2336c2340 >>>> < YY_BUFFER_STATE yy_scan_string YYFARGS1( yyconst char *, yystr) >>>> --- >>>>> YY_BUFFER_STATE yy_scan_string YYFARGS1( yyconst YY_CHAR *, yystr) >>>> 2338a2343,2345 >>>>> int len; >>>>> for ( len = 0; yy_str[len]; ++len ) >>>>> ; >>>> 2340c2347 >>>> < return yy_scan_bytes( yystr, strlen(yystr) M4_YY_CALL_LAST_ARG); >>>> --- >>>>> return yy_scan_chars( yystr, len M4_YY_CALL_LAST_ARG); >>>> 2356c2363 >>>> < YY_BUFFER_STATE yy_scan_bytes YYFARGS2( yyconst char *,yybytes, int ,_yybytes_len) >>>> --- >>>>> YY_BUFFER_STATE yy_scan_chars YYFARGS2( yyconst YY_CHAR *,yychars, int ,_yybytes_len) >>>> 2359c2366 >>>> < char *buf; >>>> --- >>>>> YY_CHAR *buf; >>>> 2365,2366c2372,2373 >>>> < n = _yybytes_len + 2; >>>> < buf = (char *) yyalloc( n M4_YY_CALL_LAST_ARG ); >>>> --- >>>>> n = _yychars_len + 2; >>>>> buf = (YY_CHAR *) yyalloc( n sizeof( YY_CHAR ) M4_YY_CALL_LAST_ARG ); >>>> 2370,2371c2377,2378 >>>> < for ( i = 0; i< _yybytes_len; ++i ) >>>> < buf[i] = yybytes[i]; >>>> --- >>>>> for ( i = 0; i< _yychars_len; ++i ) >>>>> buf[i] = yychars[i]; >>>> 2373c2380 >>>> < buf[_yybytes_len] = buf[_yybytes_len+1] = YY_END_OF_BUFFER_CHAR; >>>> --- >>>>> buf[_yychars_len] = buf[_yychars_len+1] = YY_END_OF_BUFFER_CHAR; >>>> 2377c2384 >>>> < YY_FATAL_ERROR( "bad buffer in yy_scan_bytes()" ); >>>> --- >>>>> YY_FATAL_ERROR( "bad buffer in yy_scan_chars()" ); >>>> 2462c2469 >>>> < static void yy_fatal_error YYFARGS1(yyconst char*, msg) >>>> --- >>>>> static void yy_fatal_error YYFARGS1(yyconst YY_CHAR*, msg) >>>> 2490c2497 >>>> < *YY_G(yy_c_buf_p) = '\0'; \ >>>> --- >>>>> *YY_G(yy_c_buf_p) = (YY_CHAR) '\0'; \ >>>> 2945c2952 >>>> < static void yy_flex_strncpy YYFARGS3( char*,s1, yyconst char *,s2, int,n) >>>> --- >>>>> static void yy_flex_strncpy YYFARGS3( YY_CHAR*,s1, yyconst YY_CHAR *,s2, int,n) >>>> 2957c2964 >>>> < static int yy_flex_strlen YYFARGS1( yyconst char *,s) >>>> --- >>>>> static int yy_flex_strlen YYFARGS1( yyconst YY_CHAR *,s) >>>> Only in flex-2.5.35.U: flex.skl~ >>>> diff flex-2.5.35/gen.c flex-2.5.35.U/gen.c >>>> 941c941 >>>> < indent_puts ("register char *yy_cp = YY_G(yy_c_buf_p);"); >>>> --- >>>>> indent_puts ("register YY_CHAR *yy_cp = YY_G(yy_c_buf_p);"); >>>> 1690c1690 >>>> < ("static char *yy_last_accepting_cpos;\n"); >>>> --- >>>>> ("static YY_CHAR *yy_last_accepting_cpos;\n"); >>>> 1762c1762 >>>> < outn ("static char *yy_full_match;"); >>>> --- >>>>> outn ("static YY_CHAR *yy_full_match;"); >>>> 1857,1858c1857,1858 >>>> < outn ("char yytext[YYLMAX];"); >>>> < outn ("char *yytext_ptr;"); >>>> --- >>>>> outn ("YY_CHAR yytext[YYLMAX];"); >>>>> outn ("YY_CHAR *yytext_ptr;"); >>>> 1864c1864 >>>> < outn ("char *yytext;"); >>>> --- >>>>> outn ("YY_CHAR *yytext;"); >>>> 1877c1877 >>>> < outn ("\twhile ( (result = read( fileno(yyin), (char *) buf, max_size ))< 0 ) \\"); >>>> --- >>>>> outn ("\twhile ( (result = read( fileno(yyin), (char *) buf, max_size* sizeof( YY_CHAR ) ))< 0 ) \\"); >>>> 1895,1896c1895,1905 >>>> < outn ("\t\t\t (c = getc( yyin )) != EOF&& c != '\\n'; ++n ) \\"); >>>> < outn ("\t\t\tbuf[n] = (char) c; \\"); >>>> --- >>>>> >>>>> if ( csize == 65536 ) >>>>> outn( >>>>> "\t\t\t (c = getwc( yyin )) != WEOF&& c != '\\n'; ++n ) \\" ); >>>>> else >>>>> outn( >>>>> "\t\t\t (c = getc( yyin )) != EOF&& c != '\\n'; ++n ) \\" ); >>>>> >>>>> outn( "\t\t\tbuf[n] = (YY_CHAR) c; \\" ); >>>>> >>>> 1898,1899c1907,1918 >>>> < outn ("\t\t\tbuf[n++] = (char) c; \\"); >>>> < outn ("\t\tif ( c == EOF&& ferror( yyin ) ) \\"); >>>> --- >>>>> outn( "\t\t\tbuf[n++] = (YY_CHAR) c; \\" ); >>>>> >>>>> if ( csize == 65536 ) >>>>> outn( >>>>> "\t\tif ( c == WEOF&& ferror( yyin ) ) \\" ); >>>>> else >>>>> outn( >>>>> "\t\tif ( c == EOF&& ferror( yyin ) ) \\" ); >>>>> >>>>> >>>>> >>>> 1902a1922 >>>> 1906c1926 >>>> < outn ("\t\twhile ( (result = fread(buf, 1, max_size, yyin))==0&& ferror(yyin)) \\"); >>>> --- >>>>> outn ("\t\twhile ( (result = fread(buf, sizeof( YY_CHAR ), max_size, yyin))==0&& ferror(yyin)) \\"); >>>> Only in flex-2.5.35.U: gen.c~ >>>> Common subdirectories: flex-2.5.35/m4 and flex-2.5.35.U/m4 >>>> diff flex-2.5.35/main.c flex-2.5.35.U/main.c >>>> 96c96 >>>> < Char *ccltbl; >>>> --- >>>>> wchar_t *ccltbl; >>>> 265c265 >>>> < csize = CSIZE; >>>> --- >>>>> csize = 256; >>>> 306a307,326 >>>>> if ( csize == 65536 ) >>>>> { >>>>> if ( fulltbl ) >>>>> { >>>>> if ( use_read ) >>>>> flexerror( _( "Can't use -f with -U" ) ); >>>>> else >>>>> flexerror( _( "Can't use -Cf with -U" ) ); >>>>> } >>>>> else if ( fullspd ) >>>>> { >>>>> if ( use_read ) >>>>> flexerror( _( "Can't use -F with -U" ) ); >>>>> else >>>>> flexerror( _( "Can't use -CF with -U" ) ); >>>>> } >>>>> else if ( ! useecs&& ! usemecs ) >>>>> flexerror( _( "Can't use -C with -U" ) ); >>>>> } >>>>> >>>> 483a504,532 >>>>> outn( "/* Define the YY_CHAR type. */" ); >>>>> >>>>> switch (csize) { >>>>> case 65536: >>>>> outn( "#include<wchar.h>" ); >>>>> outn( "typedef unsigned short YY_CHAR;" ); >>>>> break; >>>>> case 256: >>>>> outn( "typedef unsigned char YY_CHAR;" ); >>>>> break; >>>>> default: >>>>> outn( "typedef char YY_CHAR;" ); >>>>> break; >>>>> } >>>>> >>>>> outn( "\n/* Promotes a YY_CHAR to an unsigned integer for use as an array index. */"); >>>>> >>>>> switch (csize) { >>>>> case 65536: >>>>> case 256: >>>>> outn( "#define YY_SC_TO_UI(c) ((unsigned int) (unsigned short) c)" ); >>>>> break; >>>>> default: >>>>> outn( >>>>> "#define YY_SC_TO_UI(c) ((unsigned int) (unsigned char) c)" ); >>>>> break; >>>>> } >>>>> >>>>> skelout(); >>>> 789a839,840 >>>>> else if ( csize == 256 ) >>>>> putc( '8', stderr ); >>>> 791c842 >>>> < putc ('8', stderr); >>>> --- >>>>> putc( 'U', stderr ); >>>> 1208c1259,1263 >>>> < csize = CSIZE; >>>> --- >>>>> csize = 256; >>>>> break; >>>>> >>>>> case OPT_UNICODE: >>>>> csize = 65536; >>>> 1589,1592c1644,1647 >>>> < if (csize == 256) >>>> < outn ("typedef unsigned char YY_CHAR;"); >>>> < else >>>> < outn ("typedef char YY_CHAR;"); >>>> --- >>>>> //if (csize == 256) >>>>> // outn ("typedef unsigned char YY_CHAR;"); >>>>> //else >>>>> // outn ("typedef char YY_CHAR;"); >>>> 1677c1732 >>>> < outn ("extern char yytext[];\n"); >>>> --- >>>>> outn ("extern YY_CHAR yytext[];\n"); >>>> 1684c1739 >>>> < outn ("extern char *yytext;"); >>>> --- >>>>> outn ("extern YY_CHAR *yytext;"); >>>> 1744c1799 >>>> < ccltbl = allocate_Character_array (current_max_ccl_tbl_size); >>>> --- >>>>> ccltbl = allocate_wchar_array (current_max_ccl_tbl_size); >>>> 1830c1885,1886 >>>> < " -B, --batch generate batch scanner (opposite of -I)\n" >>>> --- >>>>> " -U, generate 16-bit scanner\n" >>>>> " -B, --batch generate batch scanner (opposite of -I)\n" >>>> Only in flex-2.5.35.U: main.c~ >>>> Only in flex-2.5.35: Makefile >>>> diff flex-2.5.35/misc.c flex-2.5.35.U/misc.c >>>> 254,256c254,264 >>>> < lerrsf (_ >>>> < ("scanner requires -8 flag to use the character %s"), >>>> < readable_form (c)); >>>> --- >>>>> { >>>>> if ( c< 256 ) >>>>> lerrsf( >>>>> _( "scanner requires -8 flag to use the character %s" ), >>>>> readable_form( c ) ); >>>>> else >>>>> lerrsf( >>>>> _( "scanner requires -U flag to use the character %s" ), >>>>> readable_form( c ) ); >>>>> >>>>> } >>>> 336c344 >>>> < Char v[]; >>>> --- >>>>> wchar_t v[]; >>>> 340c348 >>>> < Char k; >>>> --- >>>>> wchar_t k; >>>> 615c623 >>>> < Char myesc (array) >>>> --- >>>>> int myesc (array) >>>> 618c626,627 >>>> < Char c, esc_char; >>>> --- >>>>> Char c; >>>>> unsigned int esc_char; >>>> Only in flex-2.5.35.U: misc.c~ >>>> diff flex-2.5.35/options.c flex-2.5.35.U/options.c >>>> 200,201c200,201 >>>> < {"-U", OPT_8BIT, 0} >>>> < , /* Do not include unistd.h */ >>>> --- >>>>> {"-U", OPT_UNICODE, 0} >>>>> , >>>> Only in flex-2.5.35: options.c~ >>>> diff flex-2.5.35/options.h flex-2.5.35.U/options.h >>>> 44a45 >>>>> OPT_UNICODE, >>>> Common subdirectories: flex-2.5.35/po and flex-2.5.35.U/po >>>> Only in flex-2.5.35: stamp-h1 >>>> diff flex-2.5.35/tblcmp.c flex-2.5.35.U/tblcmp.c >>>> 687c687 >>>> < Char transset[CSIZE + 1]; >>>> --- >>>>> wchar_t transset[CSIZE + 1]; >>>> Only in flex-2.5.35.U: tblcmp.c~ >>>> Common subdirectories: flex-2.5.35/tests and flex-2.5.35.U/tests >>>> Common subdirectories: flex-2.5.35/tools and flex-2.5.35.U/tools >>>> ------------------------------------------------------------------------------ >>>> Virtualization& Cloud Management Using Capacity Planning >>>> Cloud computing makes use of virtualization - but cloud computing >>>> also focuses on allowing computing to be delivered as a service. >>>> http://www.accelacomm.com/jaw/sfnl/114/51521223/ >>>> _______________________________________________ >>>> Flex-devel mailing list >>>> Fle...@li... >>>> https://lists.sourceforge.net/lists/listinfo/flex-devel >>> >> >> ------------------------------------------------------------------------------ >> Virtualization & Cloud Management Using Capacity Planning >> Cloud computing makes use of virtualization - but cloud computing >> also focuses on allowing computing to be delivered as a service. >> http://www.accelacomm.com/jaw/sfnl/114/51521223/ >> _______________________________________________ >> Flex-devel mailing list >> Fle...@li... >> https://lists.sourceforge.net/lists/listinfo/flex-devel > > ------------------------------------------------------------------------------ > Try before you buy = See our experts in action! > The most comprehensive online learning library for Microsoft developers > is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3, > Metro Style Apps, more. Free future releases when you subscribe now! > http://p.sf.net/sfu/learndevnow-dev2 > _______________________________________________ > Flex-devel mailing list > Fle...@li... > https://lists.sourceforge.net/lists/listinfo/flex-devel |
From: Aaron S. <aa...@se...> - 2012-03-12 19:45:58
|
Hi Paul, I think for clarity, could you break your patch into two parts? First, changing all 'char *' declarations to YY_CHAR *, and in a second patch add code that handles different sizes of YY_CHAR. Am I reading correctly that gist of this Unicode mode is handling text in UTF-16 -- 16-bits per character? Searching the web for a few minutes, that's going to be the same representation used by Windows, Java, .NET, and OS X, and also Python and Perl. Linux appears to use UTF-32 internall(?). Ruby uses UTF-8 internally(?). 16-bit seems reasonable to me as a first step. We should probably also provide validation tables so that user code does not need to provide its own validation, which may lead to buggy implementations and needlessly duplicated code. Thought experiment: let's say that every package on a Linux distribution has a config file, and every package has a flex-bison parser to read that config file. If every package wanted to support UTF-8 configuration tomorrow, what's the best way to get there? I'd love if it we could also support UTF-8 scanner definitions, rather than \x1234 all over the place. It'd be really nice for an engineer who speaks [furrin lanwidge] to be able to write code in [furrin letters] and have it Just Work (TM). Cheers, Aaron On Sun, Mar 11, 2012 at 7:52 AM, Paul <pa...@pr...> wrote: > Well actually I found it relatively straight forward once the the type > casting was sorted out. > Because of the fuss about 16bit vs 32 bit unicode, I would be quite > happy to see this > work using the partially implemented -16bit flag if that would be more > politically > acceptable. What's the 32-bit fuss? Maybe we set up the -U flag to take an argument, e.g. -U utf-8 or -U 8/16LE/16BE/32LE/32BE ? > It was tested with the three styles of flex, normal re-entrant and class > although class > is a bit useless. Also works with bison-bridge. Have been using it > heavily for the last three > months. I will also work using the %unicode option. > An example rule using flex & bison with an encapsulating class would be > ([\xe00a]) {printf("flex Unicode @\n"); return x_yy::x_yyparse::token::SUM;} > Where the \xe00a is in the user defined part of unicode. > > Paul Neelands > > On 03/11/2012 10:39 AM, Will Estes wrote: >> Paul, >> >> Thanks for your posting of this patch. >> >> As you know, unicode support is not a trivial change, so we'll be >> evaluating this to make sure it's what we want for flex. >> >> Any and all, your ideas, suggestions and comments on this patch. >> >> --Will >> >> On Sunday, 11 March 2012, 10:24 am -0400, Paul<pa...@pr...> wrote: >> >>> Attached is the diff of flex-2.5.35 to flex-2.5.35.U >>> The flag -U has been added to enable Unicode 16, otherwise it >>> behaves as flex 2.5.35 >>> To enter a Unicode character in a rule use \x0000. i.e. \x and >>> exactly 4 hex digits. >>> An example of a rule is: >>> ID ([a-zA-Z\x391-\x3a9\x3b1-\x3c9][a-zA-Z0-9\x391-\x3a9\x3b1-\x3c9]*) >>> Which is a-Z, 0-9, and the Greek upper& lower case Unicode letters. >>> For licenses, whatever covers flex2.5.35, covers this as well. >>> I have only tested this with Kubuntu 11.10. >>> Much thanks to the Unicode patch for flex-2.5.4a which was the basis >>> for this work. >>> >>> Cheers, >>> >>> Paul Neelands >>> diff flex-2.5.35/ccl.c flex-2.5.35.U/ccl.c >>> 83c83 >>> < ccltbl = reallocate_Character_array (ccltbl, >>> --- >>>> ccltbl = reallocate_wchar_array( ccltbl, >>> Only in flex-2.5.35.U: ccl.c~ >>> Only in flex-2.5.35: config.h >>> Only in flex-2.5.35: config.log >>> Only in flex-2.5.35: config.status >>> Only in flex-2.5.35: .deps >>> Common subdirectories: flex-2.5.35/doc and flex-2.5.35.U/doc >>> diff flex-2.5.35/ecs.c flex-2.5.35.U/ecs.c >>> 116c116 >>> < Char ccls[]; >>> --- >>>> wchar_t ccls[]; >>> Only in flex-2.5.35.U: ecs.c~ >>> Common subdirectories: flex-2.5.35/examples and flex-2.5.35.U/examples >>> diff flex-2.5.35/flexdef.h flex-2.5.35.U/flexdef.h >>> 108,109c108,109 >>> < /* Always be prepared to generate an 8-bit scanner. */ >>> < #define CSIZE 256 >>> --- >>>> /* Always be prepared to generate a 16-bit scanner. */ >>>> #define CSIZE 65536 >>> 648c648 >>> < extern Char *ccltbl; >>> --- >>>> extern wchar_t *ccltbl; >>> 678a679,684 >>>> #define allocate_wchar_array(size) \ >>>> (wchar_t *) allocate_array( size, sizeof( wchar_t ) ) >>>> >>>> #define reallocate_wchar_array(array,size) \ >>>> (wchar_t *) reallocate_array( (void *) array, size, sizeof( wchar_t ) ) >>>> >>> 778c784 >>> < extern void mkeccl PROTO ((Char[], int, int[], int[], int, int)); >>> --- >>>> extern void mkeccl PROTO ((wchar_t[], int, int[], int[], int, int)); >>> 866c872 >>> < extern void cshell PROTO ((Char[], int, int)); >>> --- >>>> extern void cshell PROTO ((wchar_t[], int, int)); >>> 930c936 >>> < extern Char myesc PROTO ((Char[])); >>> --- >>>> extern int myesc PROTO ((Char[])); >>> Only in flex-2.5.35.U: flexdef.h~ >>> diff flex-2.5.35/FlexLexer.h flex-2.5.35.U/FlexLexer.h >>> 36a37,38 >>>> // Since this header is generic for all sizes of flex scanners, you must >>>> // define the type YY_CHAR before including it: >>> 39a42 >>>> // typedef xxx YY_CHAR; >>> 43a47 >>>> // typedef xxx YY_CHAR; >>> 65c69 >>> < const char* YYText() const { return yytext; } >>> --- >>>> const YY_CHAR* YYText() const { return yytext; } >>> 95c99 >>> < char* yytext; >>> --- >>>> YY_CHAR* yytext; >>> 133,134c137,138 >>> < virtual int LexerInput( char* buf, int max_size ); >>> < virtual void LexerOutput( const char* buf, int size ); >>> --- >>>> virtual int LexerInput( YY_CHAR* buf, int max_size ); >>>> virtual void LexerOutput( const YY_CHAR* buf, int size ); >>> 137c141 >>> < void yyunput( int c, char* buf_ptr ); >>> --- >>>> void yyunput( int c, YY_CHAR* buf_ptr ); >>> 160c164 >>> < char yy_hold_char; >>> --- >>>> YY_CHAR yy_hold_char; >>> 166c170 >>> < char* yy_c_buf_p; >>> --- >>>> YY_CHAR* yy_c_buf_p; >>> 185c189 >>> < char* yy_last_accepting_cpos; >>> --- >>>> YY_CHAR* yy_last_accepting_cpos; >>> 190c194 >>> < char* yy_full_match; >>> --- >>>> YY_CHAR* yy_full_match; >>> Only in flex-2.5.35.U: FlexLexer.h~ >>> diff flex-2.5.35/flex.skl flex-2.5.35.U/flex.skl >>> 126c126 >>> < M4_GEN_PREFIX(`_scan_bytes') >>> --- >>>> M4_GEN_PREFIX(`_scan_chars') >>> 274a275 >>>> *out for U pn >>> 276c277 >>> < #define YY_SC_TO_UI(c) ((unsigned int) (unsigned char) c) >>> --- >>>> /* #define YY_SC_TO_UI(c) ((unsigned int) (unsigned char) c) pn*/ >>> 543,544c544,545 >>> < char *yy_ch_buf; /* input buffer */ >>> < char *yy_buf_pos; /* current position in input buffer */ >>> --- >>>> YY_CHAR *yy_ch_buf; /* input buffer */ >>>> YY_CHAR *yy_buf_pos; /* current position in input buffer */ >>> 546c547 >>> < /* Size of input buffer in bytes, not including room for EOB >>> --- >>>> /* Size of input buffer in chars, not including room for EOB >>> 642c643 >>> < static char yy_hold_char; >>> --- >>>> static YY_CHAR yy_hold_char; >>> 647c648 >>> < static char *yy_c_buf_p = (char *) 0; >>> --- >>>> static YY_CHAR *yy_c_buf_p = (char *) 0; >>> 678,680c679,684 >>> < YY_BUFFER_STATE yy_scan_buffer M4_YY_PARAMS( char *base, yy_size_t size M4_YY_PROTO_LAST_ARG ); >>> < YY_BUFFER_STATE yy_scan_string M4_YY_PARAMS( yyconst char *yy_str M4_YY_PROTO_LAST_ARG ); >>> < YY_BUFFER_STATE yy_scan_bytes M4_YY_PARAMS( yyconst char *bytes, int len M4_YY_PROTO_LAST_ARG ); >>> --- >>>> YY_BUFFER_STATE yy_scan_buffer M4_YY_PARAMS( YY_CHAR *base, yy_size_t size M4_YY_PROTO_LAST_ARG ); >>>> YY_BUFFER_STATE yy_scan_string M4_YY_PARAMS( yyconst YY_CHAR *yy_str M4_YY_PROTO_LAST_ARG ); >>>> /* This is the old yy_scan_bytes function - renamed to avoid >>>> * confusion since a character may now be 1 or 2 bytes. >>>> */ >>>> YY_BUFFER_STATE yy_scan_chars M4_YY_PARAMS( yyconst YY_CHAR *chars, int len M4_YY_PROTO_LAST_ARG ); >>> 747c751 >>> < *yy_cp = '\0'; \ >>> --- >>>> *yy_cp = (YY_CHAR) '\0'; \ >>> 805c809 >>> < char yy_hold_char; >>> --- >>>> YY_CHAR yy_hold_char; >>> 808c812 >>> < char *yy_c_buf_p; >>> --- >>>> YY_CHAR *yy_c_buf_p; >>> 816c820 >>> < char* yy_last_accepting_cpos; >>> --- >>>> YY_CHAR* yy_last_accepting_cpos; >>> 825c829 >>> < char *yy_full_match; >>> --- >>>> YY_CHAR *yy_full_match; >>> 837,838c841,842 >>> < char yytext_r[YYLMAX]; >>> < char *yytext_ptr; >>> --- >>>> YY_CHAR yytext_r[YYLMAX]; >>>> YY_CHAR *yytext_ptr; >>> 843c847 >>> < char *yytext_r; >>> --- >>>> YY_CHAR *yytext_r; >>> 999c1003 >>> < static void yyunput M4_YY_PARAMS( int c, char *buf_ptr M4_YY_PROTO_LAST_ARG); >>> --- >>>> static void yyunput M4_YY_PARAMS( int c, (YY_CHAR) *buf_ptr M4_YY_PROTO_LAST_ARG); >>> 1005c1009 >>> < static void yy_flex_strncpy M4_YY_PARAMS( char *, yyconst char *, int M4_YY_PROTO_LAST_ARG); >>> --- >>>> static void yy_flex_strncpy M4_YY_PARAMS( (YY_CHAR) *, yyconst char *, int M4_YY_PROTO_LAST_ARG); >>> 1009c1013 >>> < static int yy_flex_strlen M4_YY_PARAMS( yyconst char * M4_YY_PROTO_LAST_ARG); >>> --- >>>> static int yy_flex_strlen M4_YY_PARAMS( yyconst (YY_CHAR) * M4_YY_PROTO_LAST_ARG); >>> 1077c1081 >>> < #define ECHO fwrite( yytext, yyleng, 1, yyout ) >>> --- >>>> #define ECHO (void) fwrite( yytext, sizeof( YY_CHAR ), yyleng, yyout ) >>> 1095c1099 >>> < if ( (result = LexerInput( (char *) buf, max_size ))< 0 ) \ >>> --- >>>> if ( (result = LexerInput( buf, max_size ))< 0 ) \ >>> 1239c1243 >>> < register char *yy_cp, *yy_bp; >>> --- >>>> register YY_CHAR *yy_cp, *yy_bp; >>> 1535c1539 >>> < int yyFlexLexer::LexerInput( char* buf, int /* max_size */ ) >>> --- >>>> int yyFlexLexer::LexerInput( YY_CHAR* buf, int /* max_size */ ) >>> 1537c1541 >>> < int yyFlexLexer::LexerInput( char* buf, int max_size ) >>> --- >>>> int yyFlexLexer::LexerInput( YY_CHAR* buf, int max_size ) >>> 1544c1548 >>> < yyin->get( buf[0] ); >>> --- >>>> (void) yyin->read((unsigned char *) buf, sizeof( YY_CHAR ) ); >>> 1555c1559 >>> < (void) yyin->read( buf, max_size ); >>> --- >>>> (void) yyin->read((unsigned char *) buf, max_size * sizeof( YY_CHAR ) ); >>> 1560c1564 >>> < return yyin->gcount(); >>> --- >>>> return ( yyin->gcount() / sizeof( YY_CHAR ) ); >>> 1564c1568 >>> < void yyFlexLexer::LexerOutput( const char* buf, int size ) >>> --- >>>> void yyFlexLexer::LexerOutput( const YY_CHAR* buf, int size ) >>> 1566c1570 >>> < (void) yyout->write( buf, size ); >>> --- >>>> (void) yyout->write((unsigned char *) buf, size * sizeof( YY_CHAR ) ); >>> 1588,1589c1592,1593 >>> < register char *dest = YY_CURRENT_BUFFER_LVALUE->yy_ch_buf; >>> < register char *source = YY_G(yytext_ptr); >>> --- >>>> register YY_CHAR *dest = YY_CURRENT_BUFFER_LVALUE->yy_ch_buf; >>>> register YY_CHAR *source = YY_G(yytext_ptr); >>> 1658c1662 >>> < b->yy_ch_buf = (char *) >>> --- >>>> b->yy_ch_buf = (YY_CHAR *) >>> 1661c1665 >>> < b->yy_buf_size + 2 M4_YY_CALL_LAST_ARG ); >>> --- >>>> (b->yy_buf_size + 2)*sizeof( YY_CHAR ) M4_YY_CALL_LAST_ARG ); >>> 1737c1741 >>> < register char *yy_cp; >>> --- >>>> register YY_CHAR *yy_cp; >>> 1774c1778 >>> < static void yyunput YYFARGS2( int,c, register char *,yy_bp) >>> --- >>>> static void yyunput YYFARGS2( int,c, register YY_CHAR *,yy_bp) >>> 1777c1781 >>> < void yyFlexLexer::yyunput( int c, register char* yy_bp) >>> --- >>>> void yyFlexLexer::yyunput( int c, register YY_CHAR* yy_bp) >>> 1780c1784 >>> < register char *yy_cp; >>> --- >>>> register YY_CHAR *yy_cp; >>> 1792c1796 >>> < register char *dest =&YY_CURRENT_BUFFER_LVALUE->yy_ch_buf[ >>> --- >>>> register YY_CHAR *dest =&YY_CURRENT_BUFFER_LVALUE->yy_ch_buf[ >>> 1794c1798 >>> < register char *source = >>> --- >>>> register YY_CHAR *source = >>> 1809c1813 >>> < *--yy_cp = (char) c; >>> --- >>>> *--yy_cp = (YY_CHAR) c; >>> 1853c1857 >>> < *YY_G(yy_c_buf_p) = '\0'; >>> --- >>>> *YY_G(yy_c_buf_p) = (YY_CHAR)'\0'; >>> 1900c1904 >>> < *YY_G(yy_c_buf_p) = '\0'; /* preserve yytext */ >>> --- >>>> *YY_G(yy_c_buf_p) = (YY_CHAR)'\0'; /* preserve yytext */ >>> 2016c2020 >>> < b->yy_ch_buf = (char *) yyalloc( b->yy_buf_size + 2 M4_YY_CALL_LAST_ARG ); >>> --- >>>> b->yy_ch_buf = (YY_CHAR *) yyalloc( (b->yy_buf_size + 2)* sizeof( YY_CHAR ) M4_YY_CALL_LAST_ARG ); >>> 2292c2296 >>> < YY_BUFFER_STATE yy_scan_buffer YYFARGS2( char *,base, yy_size_t ,size) >>> --- >>>> YY_BUFFER_STATE yy_scan_buffer YYFARGS2( YY_CHAR *,base, yy_size_t ,size) >>> 2336c2340 >>> < YY_BUFFER_STATE yy_scan_string YYFARGS1( yyconst char *, yystr) >>> --- >>>> YY_BUFFER_STATE yy_scan_string YYFARGS1( yyconst YY_CHAR *, yystr) >>> 2338a2343,2345 >>>> int len; >>>> for ( len = 0; yy_str[len]; ++len ) >>>> ; >>> 2340c2347 >>> < return yy_scan_bytes( yystr, strlen(yystr) M4_YY_CALL_LAST_ARG); >>> --- >>>> return yy_scan_chars( yystr, len M4_YY_CALL_LAST_ARG); >>> 2356c2363 >>> < YY_BUFFER_STATE yy_scan_bytes YYFARGS2( yyconst char *,yybytes, int ,_yybytes_len) >>> --- >>>> YY_BUFFER_STATE yy_scan_chars YYFARGS2( yyconst YY_CHAR *,yychars, int ,_yybytes_len) >>> 2359c2366 >>> < char *buf; >>> --- >>>> YY_CHAR *buf; >>> 2365,2366c2372,2373 >>> < n = _yybytes_len + 2; >>> < buf = (char *) yyalloc( n M4_YY_CALL_LAST_ARG ); >>> --- >>>> n = _yychars_len + 2; >>>> buf = (YY_CHAR *) yyalloc( n sizeof( YY_CHAR ) M4_YY_CALL_LAST_ARG ); >>> 2370,2371c2377,2378 >>> < for ( i = 0; i< _yybytes_len; ++i ) >>> < buf[i] = yybytes[i]; >>> --- >>>> for ( i = 0; i< _yychars_len; ++i ) >>>> buf[i] = yychars[i]; >>> 2373c2380 >>> < buf[_yybytes_len] = buf[_yybytes_len+1] = YY_END_OF_BUFFER_CHAR; >>> --- >>>> buf[_yychars_len] = buf[_yychars_len+1] = YY_END_OF_BUFFER_CHAR; >>> 2377c2384 >>> < YY_FATAL_ERROR( "bad buffer in yy_scan_bytes()" ); >>> --- >>>> YY_FATAL_ERROR( "bad buffer in yy_scan_chars()" ); >>> 2462c2469 >>> < static void yy_fatal_error YYFARGS1(yyconst char*, msg) >>> --- >>>> static void yy_fatal_error YYFARGS1(yyconst YY_CHAR*, msg) >>> 2490c2497 >>> < *YY_G(yy_c_buf_p) = '\0'; \ >>> --- >>>> *YY_G(yy_c_buf_p) = (YY_CHAR) '\0'; \ >>> 2945c2952 >>> < static void yy_flex_strncpy YYFARGS3( char*,s1, yyconst char *,s2, int,n) >>> --- >>>> static void yy_flex_strncpy YYFARGS3( YY_CHAR*,s1, yyconst YY_CHAR *,s2, int,n) >>> 2957c2964 >>> < static int yy_flex_strlen YYFARGS1( yyconst char *,s) >>> --- >>>> static int yy_flex_strlen YYFARGS1( yyconst YY_CHAR *,s) >>> Only in flex-2.5.35.U: flex.skl~ >>> diff flex-2.5.35/gen.c flex-2.5.35.U/gen.c >>> 941c941 >>> < indent_puts ("register char *yy_cp = YY_G(yy_c_buf_p);"); >>> --- >>>> indent_puts ("register YY_CHAR *yy_cp = YY_G(yy_c_buf_p);"); >>> 1690c1690 >>> < ("static char *yy_last_accepting_cpos;\n"); >>> --- >>>> ("static YY_CHAR *yy_last_accepting_cpos;\n"); >>> 1762c1762 >>> < outn ("static char *yy_full_match;"); >>> --- >>>> outn ("static YY_CHAR *yy_full_match;"); >>> 1857,1858c1857,1858 >>> < outn ("char yytext[YYLMAX];"); >>> < outn ("char *yytext_ptr;"); >>> --- >>>> outn ("YY_CHAR yytext[YYLMAX];"); >>>> outn ("YY_CHAR *yytext_ptr;"); >>> 1864c1864 >>> < outn ("char *yytext;"); >>> --- >>>> outn ("YY_CHAR *yytext;"); >>> 1877c1877 >>> < outn ("\twhile ( (result = read( fileno(yyin), (char *) buf, max_size ))< 0 ) \\"); >>> --- >>>> outn ("\twhile ( (result = read( fileno(yyin), (char *) buf, max_size* sizeof( YY_CHAR ) ))< 0 ) \\"); >>> 1895,1896c1895,1905 >>> < outn ("\t\t\t (c = getc( yyin )) != EOF&& c != '\\n'; ++n ) \\"); >>> < outn ("\t\t\tbuf[n] = (char) c; \\"); >>> --- >>>> >>>> if ( csize == 65536 ) >>>> outn( >>>> "\t\t\t (c = getwc( yyin )) != WEOF&& c != '\\n'; ++n ) \\" ); >>>> else >>>> outn( >>>> "\t\t\t (c = getc( yyin )) != EOF&& c != '\\n'; ++n ) \\" ); >>>> >>>> outn( "\t\t\tbuf[n] = (YY_CHAR) c; \\" ); >>>> >>> 1898,1899c1907,1918 >>> < outn ("\t\t\tbuf[n++] = (char) c; \\"); >>> < outn ("\t\tif ( c == EOF&& ferror( yyin ) ) \\"); >>> --- >>>> outn( "\t\t\tbuf[n++] = (YY_CHAR) c; \\" ); >>>> >>>> if ( csize == 65536 ) >>>> outn( >>>> "\t\tif ( c == WEOF&& ferror( yyin ) ) \\" ); >>>> else >>>> outn( >>>> "\t\tif ( c == EOF&& ferror( yyin ) ) \\" ); >>>> >>>> >>>> >>> 1902a1922 >>> 1906c1926 >>> < outn ("\t\twhile ( (result = fread(buf, 1, max_size, yyin))==0&& ferror(yyin)) \\"); >>> --- >>>> outn ("\t\twhile ( (result = fread(buf, sizeof( YY_CHAR ), max_size, yyin))==0&& ferror(yyin)) \\"); >>> Only in flex-2.5.35.U: gen.c~ >>> Common subdirectories: flex-2.5.35/m4 and flex-2.5.35.U/m4 >>> diff flex-2.5.35/main.c flex-2.5.35.U/main.c >>> 96c96 >>> < Char *ccltbl; >>> --- >>>> wchar_t *ccltbl; >>> 265c265 >>> < csize = CSIZE; >>> --- >>>> csize = 256; >>> 306a307,326 >>>> if ( csize == 65536 ) >>>> { >>>> if ( fulltbl ) >>>> { >>>> if ( use_read ) >>>> flexerror( _( "Can't use -f with -U" ) ); >>>> else >>>> flexerror( _( "Can't use -Cf with -U" ) ); >>>> } >>>> else if ( fullspd ) >>>> { >>>> if ( use_read ) >>>> flexerror( _( "Can't use -F with -U" ) ); >>>> else >>>> flexerror( _( "Can't use -CF with -U" ) ); >>>> } >>>> else if ( ! useecs&& ! usemecs ) >>>> flexerror( _( "Can't use -C with -U" ) ); >>>> } >>>> >>> 483a504,532 >>>> outn( "/* Define the YY_CHAR type. */" ); >>>> >>>> switch (csize) { >>>> case 65536: >>>> outn( "#include<wchar.h>" ); >>>> outn( "typedef unsigned short YY_CHAR;" ); >>>> break; >>>> case 256: >>>> outn( "typedef unsigned char YY_CHAR;" ); >>>> break; >>>> default: >>>> outn( "typedef char YY_CHAR;" ); >>>> break; >>>> } >>>> >>>> outn( "\n/* Promotes a YY_CHAR to an unsigned integer for use as an array index. */"); >>>> >>>> switch (csize) { >>>> case 65536: >>>> case 256: >>>> outn( "#define YY_SC_TO_UI(c) ((unsigned int) (unsigned short) c)" ); >>>> break; >>>> default: >>>> outn( >>>> "#define YY_SC_TO_UI(c) ((unsigned int) (unsigned char) c)" ); >>>> break; >>>> } >>>> >>>> skelout(); >>> 789a839,840 >>>> else if ( csize == 256 ) >>>> putc( '8', stderr ); >>> 791c842 >>> < putc ('8', stderr); >>> --- >>>> putc( 'U', stderr ); >>> 1208c1259,1263 >>> < csize = CSIZE; >>> --- >>>> csize = 256; >>>> break; >>>> >>>> case OPT_UNICODE: >>>> csize = 65536; >>> 1589,1592c1644,1647 >>> < if (csize == 256) >>> < outn ("typedef unsigned char YY_CHAR;"); >>> < else >>> < outn ("typedef char YY_CHAR;"); >>> --- >>>> //if (csize == 256) >>>> // outn ("typedef unsigned char YY_CHAR;"); >>>> //else >>>> // outn ("typedef char YY_CHAR;"); >>> 1677c1732 >>> < outn ("extern char yytext[];\n"); >>> --- >>>> outn ("extern YY_CHAR yytext[];\n"); >>> 1684c1739 >>> < outn ("extern char *yytext;"); >>> --- >>>> outn ("extern YY_CHAR *yytext;"); >>> 1744c1799 >>> < ccltbl = allocate_Character_array (current_max_ccl_tbl_size); >>> --- >>>> ccltbl = allocate_wchar_array (current_max_ccl_tbl_size); >>> 1830c1885,1886 >>> < " -B, --batch generate batch scanner (opposite of -I)\n" >>> --- >>>> " -U, generate 16-bit scanner\n" >>>> " -B, --batch generate batch scanner (opposite of -I)\n" >>> Only in flex-2.5.35.U: main.c~ >>> Only in flex-2.5.35: Makefile >>> diff flex-2.5.35/misc.c flex-2.5.35.U/misc.c >>> 254,256c254,264 >>> < lerrsf (_ >>> < ("scanner requires -8 flag to use the character %s"), >>> < readable_form (c)); >>> --- >>>> { >>>> if ( c< 256 ) >>>> lerrsf( >>>> _( "scanner requires -8 flag to use the character %s" ), >>>> readable_form( c ) ); >>>> else >>>> lerrsf( >>>> _( "scanner requires -U flag to use the character %s" ), >>>> readable_form( c ) ); >>>> >>>> } >>> 336c344 >>> < Char v[]; >>> --- >>>> wchar_t v[]; >>> 340c348 >>> < Char k; >>> --- >>>> wchar_t k; >>> 615c623 >>> < Char myesc (array) >>> --- >>>> int myesc (array) >>> 618c626,627 >>> < Char c, esc_char; >>> --- >>>> Char c; >>>> unsigned int esc_char; >>> Only in flex-2.5.35.U: misc.c~ >>> diff flex-2.5.35/options.c flex-2.5.35.U/options.c >>> 200,201c200,201 >>> < {"-U", OPT_8BIT, 0} >>> < , /* Do not include unistd.h */ >>> --- >>>> {"-U", OPT_UNICODE, 0} >>>> , >>> Only in flex-2.5.35: options.c~ >>> diff flex-2.5.35/options.h flex-2.5.35.U/options.h >>> 44a45 >>>> OPT_UNICODE, >>> Common subdirectories: flex-2.5.35/po and flex-2.5.35.U/po >>> Only in flex-2.5.35: stamp-h1 >>> diff flex-2.5.35/tblcmp.c flex-2.5.35.U/tblcmp.c >>> 687c687 >>> < Char transset[CSIZE + 1]; >>> --- >>>> wchar_t transset[CSIZE + 1]; >>> Only in flex-2.5.35.U: tblcmp.c~ >>> Common subdirectories: flex-2.5.35/tests and flex-2.5.35.U/tests >>> Common subdirectories: flex-2.5.35/tools and flex-2.5.35.U/tools >>> ------------------------------------------------------------------------------ >>> Virtualization& Cloud Management Using Capacity Planning >>> Cloud computing makes use of virtualization - but cloud computing >>> also focuses on allowing computing to be delivered as a service. >>> http://www.accelacomm.com/jaw/sfnl/114/51521223/ >>> _______________________________________________ >>> Flex-devel mailing list >>> Fle...@li... >>> https://lists.sourceforge.net/lists/listinfo/flex-devel >> > > ------------------------------------------------------------------------------ > Virtualization & Cloud Management Using Capacity Planning > Cloud computing makes use of virtualization - but cloud computing > also focuses on allowing computing to be delivered as a service. > http://www.accelacomm.com/jaw/sfnl/114/51521223/ > _______________________________________________ > Flex-devel mailing list > Fle...@li... > https://lists.sourceforge.net/lists/listinfo/flex-devel |
From: Paul <pa...@pr...> - 2012-03-11 15:13:16
|
Attached please find the diff of the unicode version of flex to the cvs module flex on this date Paul Neelands |
From: Paul <pa...@pr...> - 2012-03-11 14:52:31
|
Well actually I found it relatively straight forward once the the type casting was sorted out. Because of the fuss about 16bit vs 32 bit unicode, I would be quite happy to see this work using the partially implemented -16bit flag if that would be more politically acceptable. It was tested with the three styles of flex, normal re-entrant and class although class is a bit useless. Also works with bison-bridge. Have been using it heavily for the last three months. I will also work using the %unicode option. An example rule using flex & bison with an encapsulating class would be ([\xe00a]) {printf("flex Unicode @\n"); return x_yy::x_yyparse::token::SUM;} Where the \xe00a is in the user defined part of unicode. Paul Neelands On 03/11/2012 10:39 AM, Will Estes wrote: > Paul, > > Thanks for your posting of this patch. > > As you know, unicode support is not a trivial change, so we'll be > evaluating this to make sure it's what we want for flex. > > Any and all, your ideas, suggestions and comments on this patch. > > --Will > > On Sunday, 11 March 2012, 10:24 am -0400, Paul<pa...@pr...> wrote: > >> Attached is the diff of flex-2.5.35 to flex-2.5.35.U >> The flag -U has been added to enable Unicode 16, otherwise it >> behaves as flex 2.5.35 >> To enter a Unicode character in a rule use \x0000. i.e. \x and >> exactly 4 hex digits. >> An example of a rule is: >> ID ([a-zA-Z\x391-\x3a9\x3b1-\x3c9][a-zA-Z0-9\x391-\x3a9\x3b1-\x3c9]*) >> Which is a-Z, 0-9, and the Greek upper& lower case Unicode letters. >> For licenses, whatever covers flex2.5.35, covers this as well. >> I have only tested this with Kubuntu 11.10. >> Much thanks to the Unicode patch for flex-2.5.4a which was the basis >> for this work. >> >> Cheers, >> >> Paul Neelands >> diff flex-2.5.35/ccl.c flex-2.5.35.U/ccl.c >> 83c83 >> < ccltbl = reallocate_Character_array (ccltbl, >> --- >>> ccltbl = reallocate_wchar_array( ccltbl, >> Only in flex-2.5.35.U: ccl.c~ >> Only in flex-2.5.35: config.h >> Only in flex-2.5.35: config.log >> Only in flex-2.5.35: config.status >> Only in flex-2.5.35: .deps >> Common subdirectories: flex-2.5.35/doc and flex-2.5.35.U/doc >> diff flex-2.5.35/ecs.c flex-2.5.35.U/ecs.c >> 116c116 >> < Char ccls[]; >> --- >>> wchar_t ccls[]; >> Only in flex-2.5.35.U: ecs.c~ >> Common subdirectories: flex-2.5.35/examples and flex-2.5.35.U/examples >> diff flex-2.5.35/flexdef.h flex-2.5.35.U/flexdef.h >> 108,109c108,109 >> < /* Always be prepared to generate an 8-bit scanner. */ >> < #define CSIZE 256 >> --- >>> /* Always be prepared to generate a 16-bit scanner. */ >>> #define CSIZE 65536 >> 648c648 >> < extern Char *ccltbl; >> --- >>> extern wchar_t *ccltbl; >> 678a679,684 >>> #define allocate_wchar_array(size) \ >>> (wchar_t *) allocate_array( size, sizeof( wchar_t ) ) >>> >>> #define reallocate_wchar_array(array,size) \ >>> (wchar_t *) reallocate_array( (void *) array, size, sizeof( wchar_t ) ) >>> >> 778c784 >> < extern void mkeccl PROTO ((Char[], int, int[], int[], int, int)); >> --- >>> extern void mkeccl PROTO ((wchar_t[], int, int[], int[], int, int)); >> 866c872 >> < extern void cshell PROTO ((Char[], int, int)); >> --- >>> extern void cshell PROTO ((wchar_t[], int, int)); >> 930c936 >> < extern Char myesc PROTO ((Char[])); >> --- >>> extern int myesc PROTO ((Char[])); >> Only in flex-2.5.35.U: flexdef.h~ >> diff flex-2.5.35/FlexLexer.h flex-2.5.35.U/FlexLexer.h >> 36a37,38 >>> // Since this header is generic for all sizes of flex scanners, you must >>> // define the type YY_CHAR before including it: >> 39a42 >>> // typedef xxx YY_CHAR; >> 43a47 >>> // typedef xxx YY_CHAR; >> 65c69 >> < const char* YYText() const { return yytext; } >> --- >>> const YY_CHAR* YYText() const { return yytext; } >> 95c99 >> < char* yytext; >> --- >>> YY_CHAR* yytext; >> 133,134c137,138 >> < virtual int LexerInput( char* buf, int max_size ); >> < virtual void LexerOutput( const char* buf, int size ); >> --- >>> virtual int LexerInput( YY_CHAR* buf, int max_size ); >>> virtual void LexerOutput( const YY_CHAR* buf, int size ); >> 137c141 >> < void yyunput( int c, char* buf_ptr ); >> --- >>> void yyunput( int c, YY_CHAR* buf_ptr ); >> 160c164 >> < char yy_hold_char; >> --- >>> YY_CHAR yy_hold_char; >> 166c170 >> < char* yy_c_buf_p; >> --- >>> YY_CHAR* yy_c_buf_p; >> 185c189 >> < char* yy_last_accepting_cpos; >> --- >>> YY_CHAR* yy_last_accepting_cpos; >> 190c194 >> < char* yy_full_match; >> --- >>> YY_CHAR* yy_full_match; >> Only in flex-2.5.35.U: FlexLexer.h~ >> diff flex-2.5.35/flex.skl flex-2.5.35.U/flex.skl >> 126c126 >> < M4_GEN_PREFIX(`_scan_bytes') >> --- >>> M4_GEN_PREFIX(`_scan_chars') >> 274a275 >>> *out for U pn >> 276c277 >> < #define YY_SC_TO_UI(c) ((unsigned int) (unsigned char) c) >> --- >>> /* #define YY_SC_TO_UI(c) ((unsigned int) (unsigned char) c) pn*/ >> 543,544c544,545 >> < char *yy_ch_buf; /* input buffer */ >> < char *yy_buf_pos; /* current position in input buffer */ >> --- >>> YY_CHAR *yy_ch_buf; /* input buffer */ >>> YY_CHAR *yy_buf_pos; /* current position in input buffer */ >> 546c547 >> < /* Size of input buffer in bytes, not including room for EOB >> --- >>> /* Size of input buffer in chars, not including room for EOB >> 642c643 >> < static char yy_hold_char; >> --- >>> static YY_CHAR yy_hold_char; >> 647c648 >> < static char *yy_c_buf_p = (char *) 0; >> --- >>> static YY_CHAR *yy_c_buf_p = (char *) 0; >> 678,680c679,684 >> < YY_BUFFER_STATE yy_scan_buffer M4_YY_PARAMS( char *base, yy_size_t size M4_YY_PROTO_LAST_ARG ); >> < YY_BUFFER_STATE yy_scan_string M4_YY_PARAMS( yyconst char *yy_str M4_YY_PROTO_LAST_ARG ); >> < YY_BUFFER_STATE yy_scan_bytes M4_YY_PARAMS( yyconst char *bytes, int len M4_YY_PROTO_LAST_ARG ); >> --- >>> YY_BUFFER_STATE yy_scan_buffer M4_YY_PARAMS( YY_CHAR *base, yy_size_t size M4_YY_PROTO_LAST_ARG ); >>> YY_BUFFER_STATE yy_scan_string M4_YY_PARAMS( yyconst YY_CHAR *yy_str M4_YY_PROTO_LAST_ARG ); >>> /* This is the old yy_scan_bytes function - renamed to avoid >>> * confusion since a character may now be 1 or 2 bytes. >>> */ >>> YY_BUFFER_STATE yy_scan_chars M4_YY_PARAMS( yyconst YY_CHAR *chars, int len M4_YY_PROTO_LAST_ARG ); >> 747c751 >> < *yy_cp = '\0'; \ >> --- >>> *yy_cp = (YY_CHAR) '\0'; \ >> 805c809 >> < char yy_hold_char; >> --- >>> YY_CHAR yy_hold_char; >> 808c812 >> < char *yy_c_buf_p; >> --- >>> YY_CHAR *yy_c_buf_p; >> 816c820 >> < char* yy_last_accepting_cpos; >> --- >>> YY_CHAR* yy_last_accepting_cpos; >> 825c829 >> < char *yy_full_match; >> --- >>> YY_CHAR *yy_full_match; >> 837,838c841,842 >> < char yytext_r[YYLMAX]; >> < char *yytext_ptr; >> --- >>> YY_CHAR yytext_r[YYLMAX]; >>> YY_CHAR *yytext_ptr; >> 843c847 >> < char *yytext_r; >> --- >>> YY_CHAR *yytext_r; >> 999c1003 >> < static void yyunput M4_YY_PARAMS( int c, char *buf_ptr M4_YY_PROTO_LAST_ARG); >> --- >>> static void yyunput M4_YY_PARAMS( int c, (YY_CHAR) *buf_ptr M4_YY_PROTO_LAST_ARG); >> 1005c1009 >> < static void yy_flex_strncpy M4_YY_PARAMS( char *, yyconst char *, int M4_YY_PROTO_LAST_ARG); >> --- >>> static void yy_flex_strncpy M4_YY_PARAMS( (YY_CHAR) *, yyconst char *, int M4_YY_PROTO_LAST_ARG); >> 1009c1013 >> < static int yy_flex_strlen M4_YY_PARAMS( yyconst char * M4_YY_PROTO_LAST_ARG); >> --- >>> static int yy_flex_strlen M4_YY_PARAMS( yyconst (YY_CHAR) * M4_YY_PROTO_LAST_ARG); >> 1077c1081 >> < #define ECHO fwrite( yytext, yyleng, 1, yyout ) >> --- >>> #define ECHO (void) fwrite( yytext, sizeof( YY_CHAR ), yyleng, yyout ) >> 1095c1099 >> < if ( (result = LexerInput( (char *) buf, max_size ))< 0 ) \ >> --- >>> if ( (result = LexerInput( buf, max_size ))< 0 ) \ >> 1239c1243 >> < register char *yy_cp, *yy_bp; >> --- >>> register YY_CHAR *yy_cp, *yy_bp; >> 1535c1539 >> < int yyFlexLexer::LexerInput( char* buf, int /* max_size */ ) >> --- >>> int yyFlexLexer::LexerInput( YY_CHAR* buf, int /* max_size */ ) >> 1537c1541 >> < int yyFlexLexer::LexerInput( char* buf, int max_size ) >> --- >>> int yyFlexLexer::LexerInput( YY_CHAR* buf, int max_size ) >> 1544c1548 >> < yyin->get( buf[0] ); >> --- >>> (void) yyin->read((unsigned char *) buf, sizeof( YY_CHAR ) ); >> 1555c1559 >> < (void) yyin->read( buf, max_size ); >> --- >>> (void) yyin->read((unsigned char *) buf, max_size * sizeof( YY_CHAR ) ); >> 1560c1564 >> < return yyin->gcount(); >> --- >>> return ( yyin->gcount() / sizeof( YY_CHAR ) ); >> 1564c1568 >> < void yyFlexLexer::LexerOutput( const char* buf, int size ) >> --- >>> void yyFlexLexer::LexerOutput( const YY_CHAR* buf, int size ) >> 1566c1570 >> < (void) yyout->write( buf, size ); >> --- >>> (void) yyout->write((unsigned char *) buf, size * sizeof( YY_CHAR ) ); >> 1588,1589c1592,1593 >> < register char *dest = YY_CURRENT_BUFFER_LVALUE->yy_ch_buf; >> < register char *source = YY_G(yytext_ptr); >> --- >>> register YY_CHAR *dest = YY_CURRENT_BUFFER_LVALUE->yy_ch_buf; >>> register YY_CHAR *source = YY_G(yytext_ptr); >> 1658c1662 >> < b->yy_ch_buf = (char *) >> --- >>> b->yy_ch_buf = (YY_CHAR *) >> 1661c1665 >> < b->yy_buf_size + 2 M4_YY_CALL_LAST_ARG ); >> --- >>> (b->yy_buf_size + 2)*sizeof( YY_CHAR ) M4_YY_CALL_LAST_ARG ); >> 1737c1741 >> < register char *yy_cp; >> --- >>> register YY_CHAR *yy_cp; >> 1774c1778 >> < static void yyunput YYFARGS2( int,c, register char *,yy_bp) >> --- >>> static void yyunput YYFARGS2( int,c, register YY_CHAR *,yy_bp) >> 1777c1781 >> < void yyFlexLexer::yyunput( int c, register char* yy_bp) >> --- >>> void yyFlexLexer::yyunput( int c, register YY_CHAR* yy_bp) >> 1780c1784 >> < register char *yy_cp; >> --- >>> register YY_CHAR *yy_cp; >> 1792c1796 >> < register char *dest =&YY_CURRENT_BUFFER_LVALUE->yy_ch_buf[ >> --- >>> register YY_CHAR *dest =&YY_CURRENT_BUFFER_LVALUE->yy_ch_buf[ >> 1794c1798 >> < register char *source = >> --- >>> register YY_CHAR *source = >> 1809c1813 >> < *--yy_cp = (char) c; >> --- >>> *--yy_cp = (YY_CHAR) c; >> 1853c1857 >> < *YY_G(yy_c_buf_p) = '\0'; >> --- >>> *YY_G(yy_c_buf_p) = (YY_CHAR)'\0'; >> 1900c1904 >> < *YY_G(yy_c_buf_p) = '\0'; /* preserve yytext */ >> --- >>> *YY_G(yy_c_buf_p) = (YY_CHAR)'\0'; /* preserve yytext */ >> 2016c2020 >> < b->yy_ch_buf = (char *) yyalloc( b->yy_buf_size + 2 M4_YY_CALL_LAST_ARG ); >> --- >>> b->yy_ch_buf = (YY_CHAR *) yyalloc( (b->yy_buf_size + 2)* sizeof( YY_CHAR ) M4_YY_CALL_LAST_ARG ); >> 2292c2296 >> < YY_BUFFER_STATE yy_scan_buffer YYFARGS2( char *,base, yy_size_t ,size) >> --- >>> YY_BUFFER_STATE yy_scan_buffer YYFARGS2( YY_CHAR *,base, yy_size_t ,size) >> 2336c2340 >> < YY_BUFFER_STATE yy_scan_string YYFARGS1( yyconst char *, yystr) >> --- >>> YY_BUFFER_STATE yy_scan_string YYFARGS1( yyconst YY_CHAR *, yystr) >> 2338a2343,2345 >>> int len; >>> for ( len = 0; yy_str[len]; ++len ) >>> ; >> 2340c2347 >> < return yy_scan_bytes( yystr, strlen(yystr) M4_YY_CALL_LAST_ARG); >> --- >>> return yy_scan_chars( yystr, len M4_YY_CALL_LAST_ARG); >> 2356c2363 >> < YY_BUFFER_STATE yy_scan_bytes YYFARGS2( yyconst char *,yybytes, int ,_yybytes_len) >> --- >>> YY_BUFFER_STATE yy_scan_chars YYFARGS2( yyconst YY_CHAR *,yychars, int ,_yybytes_len) >> 2359c2366 >> < char *buf; >> --- >>> YY_CHAR *buf; >> 2365,2366c2372,2373 >> < n = _yybytes_len + 2; >> < buf = (char *) yyalloc( n M4_YY_CALL_LAST_ARG ); >> --- >>> n = _yychars_len + 2; >>> buf = (YY_CHAR *) yyalloc( n sizeof( YY_CHAR ) M4_YY_CALL_LAST_ARG ); >> 2370,2371c2377,2378 >> < for ( i = 0; i< _yybytes_len; ++i ) >> < buf[i] = yybytes[i]; >> --- >>> for ( i = 0; i< _yychars_len; ++i ) >>> buf[i] = yychars[i]; >> 2373c2380 >> < buf[_yybytes_len] = buf[_yybytes_len+1] = YY_END_OF_BUFFER_CHAR; >> --- >>> buf[_yychars_len] = buf[_yychars_len+1] = YY_END_OF_BUFFER_CHAR; >> 2377c2384 >> < YY_FATAL_ERROR( "bad buffer in yy_scan_bytes()" ); >> --- >>> YY_FATAL_ERROR( "bad buffer in yy_scan_chars()" ); >> 2462c2469 >> < static void yy_fatal_error YYFARGS1(yyconst char*, msg) >> --- >>> static void yy_fatal_error YYFARGS1(yyconst YY_CHAR*, msg) >> 2490c2497 >> < *YY_G(yy_c_buf_p) = '\0'; \ >> --- >>> *YY_G(yy_c_buf_p) = (YY_CHAR) '\0'; \ >> 2945c2952 >> < static void yy_flex_strncpy YYFARGS3( char*,s1, yyconst char *,s2, int,n) >> --- >>> static void yy_flex_strncpy YYFARGS3( YY_CHAR*,s1, yyconst YY_CHAR *,s2, int,n) >> 2957c2964 >> < static int yy_flex_strlen YYFARGS1( yyconst char *,s) >> --- >>> static int yy_flex_strlen YYFARGS1( yyconst YY_CHAR *,s) >> Only in flex-2.5.35.U: flex.skl~ >> diff flex-2.5.35/gen.c flex-2.5.35.U/gen.c >> 941c941 >> < indent_puts ("register char *yy_cp = YY_G(yy_c_buf_p);"); >> --- >>> indent_puts ("register YY_CHAR *yy_cp = YY_G(yy_c_buf_p);"); >> 1690c1690 >> < ("static char *yy_last_accepting_cpos;\n"); >> --- >>> ("static YY_CHAR *yy_last_accepting_cpos;\n"); >> 1762c1762 >> < outn ("static char *yy_full_match;"); >> --- >>> outn ("static YY_CHAR *yy_full_match;"); >> 1857,1858c1857,1858 >> < outn ("char yytext[YYLMAX];"); >> < outn ("char *yytext_ptr;"); >> --- >>> outn ("YY_CHAR yytext[YYLMAX];"); >>> outn ("YY_CHAR *yytext_ptr;"); >> 1864c1864 >> < outn ("char *yytext;"); >> --- >>> outn ("YY_CHAR *yytext;"); >> 1877c1877 >> < outn ("\twhile ( (result = read( fileno(yyin), (char *) buf, max_size ))< 0 ) \\"); >> --- >>> outn ("\twhile ( (result = read( fileno(yyin), (char *) buf, max_size* sizeof( YY_CHAR ) ))< 0 ) \\"); >> 1895,1896c1895,1905 >> < outn ("\t\t\t (c = getc( yyin )) != EOF&& c != '\\n'; ++n ) \\"); >> < outn ("\t\t\tbuf[n] = (char) c; \\"); >> --- >>> >>> if ( csize == 65536 ) >>> outn( >>> "\t\t\t (c = getwc( yyin )) != WEOF&& c != '\\n'; ++n ) \\" ); >>> else >>> outn( >>> "\t\t\t (c = getc( yyin )) != EOF&& c != '\\n'; ++n ) \\" ); >>> >>> outn( "\t\t\tbuf[n] = (YY_CHAR) c; \\" ); >>> >> 1898,1899c1907,1918 >> < outn ("\t\t\tbuf[n++] = (char) c; \\"); >> < outn ("\t\tif ( c == EOF&& ferror( yyin ) ) \\"); >> --- >>> outn( "\t\t\tbuf[n++] = (YY_CHAR) c; \\" ); >>> >>> if ( csize == 65536 ) >>> outn( >>> "\t\tif ( c == WEOF&& ferror( yyin ) ) \\" ); >>> else >>> outn( >>> "\t\tif ( c == EOF&& ferror( yyin ) ) \\" ); >>> >>> >>> >> 1902a1922 >> 1906c1926 >> < outn ("\t\twhile ( (result = fread(buf, 1, max_size, yyin))==0&& ferror(yyin)) \\"); >> --- >>> outn ("\t\twhile ( (result = fread(buf, sizeof( YY_CHAR ), max_size, yyin))==0&& ferror(yyin)) \\"); >> Only in flex-2.5.35.U: gen.c~ >> Common subdirectories: flex-2.5.35/m4 and flex-2.5.35.U/m4 >> diff flex-2.5.35/main.c flex-2.5.35.U/main.c >> 96c96 >> < Char *ccltbl; >> --- >>> wchar_t *ccltbl; >> 265c265 >> < csize = CSIZE; >> --- >>> csize = 256; >> 306a307,326 >>> if ( csize == 65536 ) >>> { >>> if ( fulltbl ) >>> { >>> if ( use_read ) >>> flexerror( _( "Can't use -f with -U" ) ); >>> else >>> flexerror( _( "Can't use -Cf with -U" ) ); >>> } >>> else if ( fullspd ) >>> { >>> if ( use_read ) >>> flexerror( _( "Can't use -F with -U" ) ); >>> else >>> flexerror( _( "Can't use -CF with -U" ) ); >>> } >>> else if ( ! useecs&& ! usemecs ) >>> flexerror( _( "Can't use -C with -U" ) ); >>> } >>> >> 483a504,532 >>> outn( "/* Define the YY_CHAR type. */" ); >>> >>> switch (csize) { >>> case 65536: >>> outn( "#include<wchar.h>" ); >>> outn( "typedef unsigned short YY_CHAR;" ); >>> break; >>> case 256: >>> outn( "typedef unsigned char YY_CHAR;" ); >>> break; >>> default: >>> outn( "typedef char YY_CHAR;" ); >>> break; >>> } >>> >>> outn( "\n/* Promotes a YY_CHAR to an unsigned integer for use as an array index. */"); >>> >>> switch (csize) { >>> case 65536: >>> case 256: >>> outn( "#define YY_SC_TO_UI(c) ((unsigned int) (unsigned short) c)" ); >>> break; >>> default: >>> outn( >>> "#define YY_SC_TO_UI(c) ((unsigned int) (unsigned char) c)" ); >>> break; >>> } >>> >>> skelout(); >> 789a839,840 >>> else if ( csize == 256 ) >>> putc( '8', stderr ); >> 791c842 >> < putc ('8', stderr); >> --- >>> putc( 'U', stderr ); >> 1208c1259,1263 >> < csize = CSIZE; >> --- >>> csize = 256; >>> break; >>> >>> case OPT_UNICODE: >>> csize = 65536; >> 1589,1592c1644,1647 >> < if (csize == 256) >> < outn ("typedef unsigned char YY_CHAR;"); >> < else >> < outn ("typedef char YY_CHAR;"); >> --- >>> //if (csize == 256) >>> // outn ("typedef unsigned char YY_CHAR;"); >>> //else >>> // outn ("typedef char YY_CHAR;"); >> 1677c1732 >> < outn ("extern char yytext[];\n"); >> --- >>> outn ("extern YY_CHAR yytext[];\n"); >> 1684c1739 >> < outn ("extern char *yytext;"); >> --- >>> outn ("extern YY_CHAR *yytext;"); >> 1744c1799 >> < ccltbl = allocate_Character_array (current_max_ccl_tbl_size); >> --- >>> ccltbl = allocate_wchar_array (current_max_ccl_tbl_size); >> 1830c1885,1886 >> < " -B, --batch generate batch scanner (opposite of -I)\n" >> --- >>> " -U, generate 16-bit scanner\n" >>> " -B, --batch generate batch scanner (opposite of -I)\n" >> Only in flex-2.5.35.U: main.c~ >> Only in flex-2.5.35: Makefile >> diff flex-2.5.35/misc.c flex-2.5.35.U/misc.c >> 254,256c254,264 >> < lerrsf (_ >> < ("scanner requires -8 flag to use the character %s"), >> < readable_form (c)); >> --- >>> { >>> if ( c< 256 ) >>> lerrsf( >>> _( "scanner requires -8 flag to use the character %s" ), >>> readable_form( c ) ); >>> else >>> lerrsf( >>> _( "scanner requires -U flag to use the character %s" ), >>> readable_form( c ) ); >>> >>> } >> 336c344 >> < Char v[]; >> --- >>> wchar_t v[]; >> 340c348 >> < Char k; >> --- >>> wchar_t k; >> 615c623 >> < Char myesc (array) >> --- >>> int myesc (array) >> 618c626,627 >> < Char c, esc_char; >> --- >>> Char c; >>> unsigned int esc_char; >> Only in flex-2.5.35.U: misc.c~ >> diff flex-2.5.35/options.c flex-2.5.35.U/options.c >> 200,201c200,201 >> < {"-U", OPT_8BIT, 0} >> < , /* Do not include unistd.h */ >> --- >>> {"-U", OPT_UNICODE, 0} >>> , >> Only in flex-2.5.35: options.c~ >> diff flex-2.5.35/options.h flex-2.5.35.U/options.h >> 44a45 >>> OPT_UNICODE, >> Common subdirectories: flex-2.5.35/po and flex-2.5.35.U/po >> Only in flex-2.5.35: stamp-h1 >> diff flex-2.5.35/tblcmp.c flex-2.5.35.U/tblcmp.c >> 687c687 >> < Char transset[CSIZE + 1]; >> --- >>> wchar_t transset[CSIZE + 1]; >> Only in flex-2.5.35.U: tblcmp.c~ >> Common subdirectories: flex-2.5.35/tests and flex-2.5.35.U/tests >> Common subdirectories: flex-2.5.35/tools and flex-2.5.35.U/tools >> ------------------------------------------------------------------------------ >> Virtualization& Cloud Management Using Capacity Planning >> Cloud computing makes use of virtualization - but cloud computing >> also focuses on allowing computing to be delivered as a service. >> http://www.accelacomm.com/jaw/sfnl/114/51521223/ >> _______________________________________________ >> Flex-devel mailing list >> Fle...@li... >> https://lists.sourceforge.net/lists/listinfo/flex-devel > |
From: Will E. <wes...@gm...> - 2012-03-11 14:39:58
|
Paul, Thanks for your posting of this patch. As you know, unicode support is not a trivial change, so we'll be evaluating this to make sure it's what we want for flex. Any and all, your ideas, suggestions and comments on this patch. --Will On Sunday, 11 March 2012, 10:24 am -0400, Paul <pa...@pr...> wrote: > Attached is the diff of flex-2.5.35 to flex-2.5.35.U > The flag -U has been added to enable Unicode 16, otherwise it > behaves as flex 2.5.35 > To enter a Unicode character in a rule use \x0000. i.e. \x and > exactly 4 hex digits. > An example of a rule is: > ID ([a-zA-Z\x391-\x3a9\x3b1-\x3c9][a-zA-Z0-9\x391-\x3a9\x3b1-\x3c9]*) > Which is a-Z, 0-9, and the Greek upper & lower case Unicode letters. > For licenses, whatever covers flex2.5.35, covers this as well. > I have only tested this with Kubuntu 11.10. > Much thanks to the Unicode patch for flex-2.5.4a which was the basis > for this work. > > Cheers, > > Paul Neelands > diff flex-2.5.35/ccl.c flex-2.5.35.U/ccl.c > 83c83 > < ccltbl = reallocate_Character_array (ccltbl, > --- > > ccltbl = reallocate_wchar_array( ccltbl, > Only in flex-2.5.35.U: ccl.c~ > Only in flex-2.5.35: config.h > Only in flex-2.5.35: config.log > Only in flex-2.5.35: config.status > Only in flex-2.5.35: .deps > Common subdirectories: flex-2.5.35/doc and flex-2.5.35.U/doc > diff flex-2.5.35/ecs.c flex-2.5.35.U/ecs.c > 116c116 > < Char ccls[]; > --- > > wchar_t ccls[]; > Only in flex-2.5.35.U: ecs.c~ > Common subdirectories: flex-2.5.35/examples and flex-2.5.35.U/examples > diff flex-2.5.35/flexdef.h flex-2.5.35.U/flexdef.h > 108,109c108,109 > < /* Always be prepared to generate an 8-bit scanner. */ > < #define CSIZE 256 > --- > > /* Always be prepared to generate a 16-bit scanner. */ > > #define CSIZE 65536 > 648c648 > < extern Char *ccltbl; > --- > > extern wchar_t *ccltbl; > 678a679,684 > > #define allocate_wchar_array(size) \ > > (wchar_t *) allocate_array( size, sizeof( wchar_t ) ) > > > > #define reallocate_wchar_array(array,size) \ > > (wchar_t *) reallocate_array( (void *) array, size, sizeof( wchar_t ) ) > > > 778c784 > < extern void mkeccl PROTO ((Char[], int, int[], int[], int, int)); > --- > > extern void mkeccl PROTO ((wchar_t[], int, int[], int[], int, int)); > 866c872 > < extern void cshell PROTO ((Char[], int, int)); > --- > > extern void cshell PROTO ((wchar_t[], int, int)); > 930c936 > < extern Char myesc PROTO ((Char[])); > --- > > extern int myesc PROTO ((Char[])); > Only in flex-2.5.35.U: flexdef.h~ > diff flex-2.5.35/FlexLexer.h flex-2.5.35.U/FlexLexer.h > 36a37,38 > > // Since this header is generic for all sizes of flex scanners, you must > > // define the type YY_CHAR before including it: > 39a42 > > // typedef xxx YY_CHAR; > 43a47 > > // typedef xxx YY_CHAR; > 65c69 > < const char* YYText() const { return yytext; } > --- > > const YY_CHAR* YYText() const { return yytext; } > 95c99 > < char* yytext; > --- > > YY_CHAR* yytext; > 133,134c137,138 > < virtual int LexerInput( char* buf, int max_size ); > < virtual void LexerOutput( const char* buf, int size ); > --- > > virtual int LexerInput( YY_CHAR* buf, int max_size ); > > virtual void LexerOutput( const YY_CHAR* buf, int size ); > 137c141 > < void yyunput( int c, char* buf_ptr ); > --- > > void yyunput( int c, YY_CHAR* buf_ptr ); > 160c164 > < char yy_hold_char; > --- > > YY_CHAR yy_hold_char; > 166c170 > < char* yy_c_buf_p; > --- > > YY_CHAR* yy_c_buf_p; > 185c189 > < char* yy_last_accepting_cpos; > --- > > YY_CHAR* yy_last_accepting_cpos; > 190c194 > < char* yy_full_match; > --- > > YY_CHAR* yy_full_match; > Only in flex-2.5.35.U: FlexLexer.h~ > diff flex-2.5.35/flex.skl flex-2.5.35.U/flex.skl > 126c126 > < M4_GEN_PREFIX(`_scan_bytes') > --- > > M4_GEN_PREFIX(`_scan_chars') > 274a275 > > *out for U pn > 276c277 > < #define YY_SC_TO_UI(c) ((unsigned int) (unsigned char) c) > --- > > /* #define YY_SC_TO_UI(c) ((unsigned int) (unsigned char) c) pn*/ > 543,544c544,545 > < char *yy_ch_buf; /* input buffer */ > < char *yy_buf_pos; /* current position in input buffer */ > --- > > YY_CHAR *yy_ch_buf; /* input buffer */ > > YY_CHAR *yy_buf_pos; /* current position in input buffer */ > 546c547 > < /* Size of input buffer in bytes, not including room for EOB > --- > > /* Size of input buffer in chars, not including room for EOB > 642c643 > < static char yy_hold_char; > --- > > static YY_CHAR yy_hold_char; > 647c648 > < static char *yy_c_buf_p = (char *) 0; > --- > > static YY_CHAR *yy_c_buf_p = (char *) 0; > 678,680c679,684 > < YY_BUFFER_STATE yy_scan_buffer M4_YY_PARAMS( char *base, yy_size_t size M4_YY_PROTO_LAST_ARG ); > < YY_BUFFER_STATE yy_scan_string M4_YY_PARAMS( yyconst char *yy_str M4_YY_PROTO_LAST_ARG ); > < YY_BUFFER_STATE yy_scan_bytes M4_YY_PARAMS( yyconst char *bytes, int len M4_YY_PROTO_LAST_ARG ); > --- > > YY_BUFFER_STATE yy_scan_buffer M4_YY_PARAMS( YY_CHAR *base, yy_size_t size M4_YY_PROTO_LAST_ARG ); > > YY_BUFFER_STATE yy_scan_string M4_YY_PARAMS( yyconst YY_CHAR *yy_str M4_YY_PROTO_LAST_ARG ); > > /* This is the old yy_scan_bytes function - renamed to avoid > > * confusion since a character may now be 1 or 2 bytes. > > */ > > YY_BUFFER_STATE yy_scan_chars M4_YY_PARAMS( yyconst YY_CHAR *chars, int len M4_YY_PROTO_LAST_ARG ); > 747c751 > < *yy_cp = '\0'; \ > --- > > *yy_cp = (YY_CHAR) '\0'; \ > 805c809 > < char yy_hold_char; > --- > > YY_CHAR yy_hold_char; > 808c812 > < char *yy_c_buf_p; > --- > > YY_CHAR *yy_c_buf_p; > 816c820 > < char* yy_last_accepting_cpos; > --- > > YY_CHAR* yy_last_accepting_cpos; > 825c829 > < char *yy_full_match; > --- > > YY_CHAR *yy_full_match; > 837,838c841,842 > < char yytext_r[YYLMAX]; > < char *yytext_ptr; > --- > > YY_CHAR yytext_r[YYLMAX]; > > YY_CHAR *yytext_ptr; > 843c847 > < char *yytext_r; > --- > > YY_CHAR *yytext_r; > 999c1003 > < static void yyunput M4_YY_PARAMS( int c, char *buf_ptr M4_YY_PROTO_LAST_ARG); > --- > > static void yyunput M4_YY_PARAMS( int c, (YY_CHAR) *buf_ptr M4_YY_PROTO_LAST_ARG); > 1005c1009 > < static void yy_flex_strncpy M4_YY_PARAMS( char *, yyconst char *, int M4_YY_PROTO_LAST_ARG); > --- > > static void yy_flex_strncpy M4_YY_PARAMS( (YY_CHAR) *, yyconst char *, int M4_YY_PROTO_LAST_ARG); > 1009c1013 > < static int yy_flex_strlen M4_YY_PARAMS( yyconst char * M4_YY_PROTO_LAST_ARG); > --- > > static int yy_flex_strlen M4_YY_PARAMS( yyconst (YY_CHAR) * M4_YY_PROTO_LAST_ARG); > 1077c1081 > < #define ECHO fwrite( yytext, yyleng, 1, yyout ) > --- > > #define ECHO (void) fwrite( yytext, sizeof( YY_CHAR ), yyleng, yyout ) > 1095c1099 > < if ( (result = LexerInput( (char *) buf, max_size )) < 0 ) \ > --- > > if ( (result = LexerInput( buf, max_size )) < 0 ) \ > 1239c1243 > < register char *yy_cp, *yy_bp; > --- > > register YY_CHAR *yy_cp, *yy_bp; > 1535c1539 > < int yyFlexLexer::LexerInput( char* buf, int /* max_size */ ) > --- > > int yyFlexLexer::LexerInput( YY_CHAR* buf, int /* max_size */ ) > 1537c1541 > < int yyFlexLexer::LexerInput( char* buf, int max_size ) > --- > > int yyFlexLexer::LexerInput( YY_CHAR* buf, int max_size ) > 1544c1548 > < yyin->get( buf[0] ); > --- > > (void) yyin->read((unsigned char *) buf, sizeof( YY_CHAR ) ); > 1555c1559 > < (void) yyin->read( buf, max_size ); > --- > > (void) yyin->read((unsigned char *) buf, max_size * sizeof( YY_CHAR ) ); > 1560c1564 > < return yyin->gcount(); > --- > > return ( yyin->gcount() / sizeof( YY_CHAR ) ); > 1564c1568 > < void yyFlexLexer::LexerOutput( const char* buf, int size ) > --- > > void yyFlexLexer::LexerOutput( const YY_CHAR* buf, int size ) > 1566c1570 > < (void) yyout->write( buf, size ); > --- > > (void) yyout->write((unsigned char *) buf, size * sizeof( YY_CHAR ) ); > 1588,1589c1592,1593 > < register char *dest = YY_CURRENT_BUFFER_LVALUE->yy_ch_buf; > < register char *source = YY_G(yytext_ptr); > --- > > register YY_CHAR *dest = YY_CURRENT_BUFFER_LVALUE->yy_ch_buf; > > register YY_CHAR *source = YY_G(yytext_ptr); > 1658c1662 > < b->yy_ch_buf = (char *) > --- > > b->yy_ch_buf = (YY_CHAR *) > 1661c1665 > < b->yy_buf_size + 2 M4_YY_CALL_LAST_ARG ); > --- > > (b->yy_buf_size + 2)*sizeof( YY_CHAR ) M4_YY_CALL_LAST_ARG ); > 1737c1741 > < register char *yy_cp; > --- > > register YY_CHAR *yy_cp; > 1774c1778 > < static void yyunput YYFARGS2( int,c, register char *,yy_bp) > --- > > static void yyunput YYFARGS2( int,c, register YY_CHAR *,yy_bp) > 1777c1781 > < void yyFlexLexer::yyunput( int c, register char* yy_bp) > --- > > void yyFlexLexer::yyunput( int c, register YY_CHAR* yy_bp) > 1780c1784 > < register char *yy_cp; > --- > > register YY_CHAR *yy_cp; > 1792c1796 > < register char *dest = &YY_CURRENT_BUFFER_LVALUE->yy_ch_buf[ > --- > > register YY_CHAR *dest = &YY_CURRENT_BUFFER_LVALUE->yy_ch_buf[ > 1794c1798 > < register char *source = > --- > > register YY_CHAR *source = > 1809c1813 > < *--yy_cp = (char) c; > --- > > *--yy_cp = (YY_CHAR) c; > 1853c1857 > < *YY_G(yy_c_buf_p) = '\0'; > --- > > *YY_G(yy_c_buf_p) = (YY_CHAR)'\0'; > 1900c1904 > < *YY_G(yy_c_buf_p) = '\0'; /* preserve yytext */ > --- > > *YY_G(yy_c_buf_p) = (YY_CHAR)'\0'; /* preserve yytext */ > 2016c2020 > < b->yy_ch_buf = (char *) yyalloc( b->yy_buf_size + 2 M4_YY_CALL_LAST_ARG ); > --- > > b->yy_ch_buf = (YY_CHAR *) yyalloc( (b->yy_buf_size + 2)* sizeof( YY_CHAR ) M4_YY_CALL_LAST_ARG ); > 2292c2296 > < YY_BUFFER_STATE yy_scan_buffer YYFARGS2( char *,base, yy_size_t ,size) > --- > > YY_BUFFER_STATE yy_scan_buffer YYFARGS2( YY_CHAR *,base, yy_size_t ,size) > 2336c2340 > < YY_BUFFER_STATE yy_scan_string YYFARGS1( yyconst char *, yystr) > --- > > YY_BUFFER_STATE yy_scan_string YYFARGS1( yyconst YY_CHAR *, yystr) > 2338a2343,2345 > > int len; > > for ( len = 0; yy_str[len]; ++len ) > > ; > 2340c2347 > < return yy_scan_bytes( yystr, strlen(yystr) M4_YY_CALL_LAST_ARG); > --- > > return yy_scan_chars( yystr, len M4_YY_CALL_LAST_ARG); > 2356c2363 > < YY_BUFFER_STATE yy_scan_bytes YYFARGS2( yyconst char *,yybytes, int ,_yybytes_len) > --- > > YY_BUFFER_STATE yy_scan_chars YYFARGS2( yyconst YY_CHAR *,yychars, int ,_yybytes_len) > 2359c2366 > < char *buf; > --- > > YY_CHAR *buf; > 2365,2366c2372,2373 > < n = _yybytes_len + 2; > < buf = (char *) yyalloc( n M4_YY_CALL_LAST_ARG ); > --- > > n = _yychars_len + 2; > > buf = (YY_CHAR *) yyalloc( n sizeof( YY_CHAR ) M4_YY_CALL_LAST_ARG ); > 2370,2371c2377,2378 > < for ( i = 0; i < _yybytes_len; ++i ) > < buf[i] = yybytes[i]; > --- > > for ( i = 0; i < _yychars_len; ++i ) > > buf[i] = yychars[i]; > 2373c2380 > < buf[_yybytes_len] = buf[_yybytes_len+1] = YY_END_OF_BUFFER_CHAR; > --- > > buf[_yychars_len] = buf[_yychars_len+1] = YY_END_OF_BUFFER_CHAR; > 2377c2384 > < YY_FATAL_ERROR( "bad buffer in yy_scan_bytes()" ); > --- > > YY_FATAL_ERROR( "bad buffer in yy_scan_chars()" ); > 2462c2469 > < static void yy_fatal_error YYFARGS1(yyconst char*, msg) > --- > > static void yy_fatal_error YYFARGS1(yyconst YY_CHAR*, msg) > 2490c2497 > < *YY_G(yy_c_buf_p) = '\0'; \ > --- > > *YY_G(yy_c_buf_p) = (YY_CHAR) '\0'; \ > 2945c2952 > < static void yy_flex_strncpy YYFARGS3( char*,s1, yyconst char *,s2, int,n) > --- > > static void yy_flex_strncpy YYFARGS3( YY_CHAR*,s1, yyconst YY_CHAR *,s2, int,n) > 2957c2964 > < static int yy_flex_strlen YYFARGS1( yyconst char *,s) > --- > > static int yy_flex_strlen YYFARGS1( yyconst YY_CHAR *,s) > Only in flex-2.5.35.U: flex.skl~ > diff flex-2.5.35/gen.c flex-2.5.35.U/gen.c > 941c941 > < indent_puts ("register char *yy_cp = YY_G(yy_c_buf_p);"); > --- > > indent_puts ("register YY_CHAR *yy_cp = YY_G(yy_c_buf_p);"); > 1690c1690 > < ("static char *yy_last_accepting_cpos;\n"); > --- > > ("static YY_CHAR *yy_last_accepting_cpos;\n"); > 1762c1762 > < outn ("static char *yy_full_match;"); > --- > > outn ("static YY_CHAR *yy_full_match;"); > 1857,1858c1857,1858 > < outn ("char yytext[YYLMAX];"); > < outn ("char *yytext_ptr;"); > --- > > outn ("YY_CHAR yytext[YYLMAX];"); > > outn ("YY_CHAR *yytext_ptr;"); > 1864c1864 > < outn ("char *yytext;"); > --- > > outn ("YY_CHAR *yytext;"); > 1877c1877 > < outn ("\twhile ( (result = read( fileno(yyin), (char *) buf, max_size )) < 0 ) \\"); > --- > > outn ("\twhile ( (result = read( fileno(yyin), (char *) buf, max_size* sizeof( YY_CHAR ) )) < 0 ) \\"); > 1895,1896c1895,1905 > < outn ("\t\t\t (c = getc( yyin )) != EOF && c != '\\n'; ++n ) \\"); > < outn ("\t\t\tbuf[n] = (char) c; \\"); > --- > > > > > > if ( csize == 65536 ) > > outn( > > "\t\t\t (c = getwc( yyin )) != WEOF && c != '\\n'; ++n ) \\" ); > > else > > outn( > > "\t\t\t (c = getc( yyin )) != EOF && c != '\\n'; ++n ) \\" ); > > > > outn( "\t\t\tbuf[n] = (YY_CHAR) c; \\" ); > > > 1898,1899c1907,1918 > < outn ("\t\t\tbuf[n++] = (char) c; \\"); > < outn ("\t\tif ( c == EOF && ferror( yyin ) ) \\"); > --- > > > > outn( "\t\t\tbuf[n++] = (YY_CHAR) c; \\" ); > > > > if ( csize == 65536 ) > > outn( > > "\t\tif ( c == WEOF && ferror( yyin ) ) \\" ); > > else > > outn( > > "\t\tif ( c == EOF && ferror( yyin ) ) \\" ); > > > > > > > 1902a1922 > > > 1906c1926 > < outn ("\t\twhile ( (result = fread(buf, 1, max_size, yyin))==0 && ferror(yyin)) \\"); > --- > > outn ("\t\twhile ( (result = fread(buf, sizeof( YY_CHAR ), max_size, yyin))==0 && ferror(yyin)) \\"); > Only in flex-2.5.35.U: gen.c~ > Common subdirectories: flex-2.5.35/m4 and flex-2.5.35.U/m4 > diff flex-2.5.35/main.c flex-2.5.35.U/main.c > 96c96 > < Char *ccltbl; > --- > > wchar_t *ccltbl; > 265c265 > < csize = CSIZE; > --- > > csize = 256; > 306a307,326 > > if ( csize == 65536 ) > > { > > if ( fulltbl ) > > { > > if ( use_read ) > > flexerror( _( "Can't use -f with -U" ) ); > > else > > flexerror( _( "Can't use -Cf with -U" ) ); > > } > > else if ( fullspd ) > > { > > if ( use_read ) > > flexerror( _( "Can't use -F with -U" ) ); > > else > > flexerror( _( "Can't use -CF with -U" ) ); > > } > > else if ( ! useecs && ! usemecs ) > > flexerror( _( "Can't use -C with -U" ) ); > > } > > > 483a504,532 > > outn( "/* Define the YY_CHAR type. */" ); > > > > switch (csize) { > > case 65536: > > outn( "#include <wchar.h>" ); > > outn( "typedef unsigned short YY_CHAR;" ); > > break; > > case 256: > > outn( "typedef unsigned char YY_CHAR;" ); > > break; > > default: > > outn( "typedef char YY_CHAR;" ); > > break; > > } > > > > outn( "\n/* Promotes a YY_CHAR to an unsigned integer for use as an array index. */"); > > > > switch (csize) { > > case 65536: > > case 256: > > outn( "#define YY_SC_TO_UI(c) ((unsigned int) (unsigned short) c)" ); > > break; > > default: > > outn( > > "#define YY_SC_TO_UI(c) ((unsigned int) (unsigned char) c)" ); > > break; > > } > > > > skelout(); > 789a839,840 > > else if ( csize == 256 ) > > putc( '8', stderr ); > 791c842 > < putc ('8', stderr); > --- > > putc( 'U', stderr ); > 1208c1259,1263 > < csize = CSIZE; > --- > > csize = 256; > > break; > > > > case OPT_UNICODE: > > csize = 65536; > 1589,1592c1644,1647 > < if (csize == 256) > < outn ("typedef unsigned char YY_CHAR;"); > < else > < outn ("typedef char YY_CHAR;"); > --- > > //if (csize == 256) > > // outn ("typedef unsigned char YY_CHAR;"); > > //else > > // outn ("typedef char YY_CHAR;"); > 1677c1732 > < outn ("extern char yytext[];\n"); > --- > > outn ("extern YY_CHAR yytext[];\n"); > 1684c1739 > < outn ("extern char *yytext;"); > --- > > outn ("extern YY_CHAR *yytext;"); > 1744c1799 > < ccltbl = allocate_Character_array (current_max_ccl_tbl_size); > --- > > ccltbl = allocate_wchar_array (current_max_ccl_tbl_size); > 1830c1885,1886 > < " -B, --batch generate batch scanner (opposite of -I)\n" > --- > > " -U, generate 16-bit scanner\n" > > " -B, --batch generate batch scanner (opposite of -I)\n" > Only in flex-2.5.35.U: main.c~ > Only in flex-2.5.35: Makefile > diff flex-2.5.35/misc.c flex-2.5.35.U/misc.c > 254,256c254,264 > < lerrsf (_ > < ("scanner requires -8 flag to use the character %s"), > < readable_form (c)); > --- > > { > > if ( c < 256 ) > > lerrsf( > > _( "scanner requires -8 flag to use the character %s" ), > > readable_form( c ) ); > > else > > lerrsf( > > _( "scanner requires -U flag to use the character %s" ), > > readable_form( c ) ); > > > > } > 336c344 > < Char v[]; > --- > > wchar_t v[]; > 340c348 > < Char k; > --- > > wchar_t k; > 615c623 > < Char myesc (array) > --- > > int myesc (array) > 618c626,627 > < Char c, esc_char; > --- > > Char c; > > unsigned int esc_char; > Only in flex-2.5.35.U: misc.c~ > diff flex-2.5.35/options.c flex-2.5.35.U/options.c > 200,201c200,201 > < {"-U", OPT_8BIT, 0} > < , /* Do not include unistd.h */ > --- > > {"-U", OPT_UNICODE, 0} > > , > Only in flex-2.5.35: options.c~ > diff flex-2.5.35/options.h flex-2.5.35.U/options.h > 44a45 > > OPT_UNICODE, > Common subdirectories: flex-2.5.35/po and flex-2.5.35.U/po > Only in flex-2.5.35: stamp-h1 > diff flex-2.5.35/tblcmp.c flex-2.5.35.U/tblcmp.c > 687c687 > < Char transset[CSIZE + 1]; > --- > > wchar_t transset[CSIZE + 1]; > Only in flex-2.5.35.U: tblcmp.c~ > Common subdirectories: flex-2.5.35/tests and flex-2.5.35.U/tests > Common subdirectories: flex-2.5.35/tools and flex-2.5.35.U/tools > ------------------------------------------------------------------------------ > Virtualization & Cloud Management Using Capacity Planning > Cloud computing makes use of virtualization - but cloud computing > also focuses on allowing computing to be delivered as a service. > http://www.accelacomm.com/jaw/sfnl/114/51521223/ > _______________________________________________ > Flex-devel mailing list > Fle...@li... > https://lists.sourceforge.net/lists/listinfo/flex-devel -- Will Estes (wl...@us...) Flex Project Maintainer http://flex.sourceforge.net/ |
From: Paul <pa...@pr...> - 2012-03-11 14:24:18
|
diff flex-2.5.35/ccl.c flex-2.5.35.U/ccl.c 83c83 < ccltbl = reallocate_Character_array (ccltbl, --- > ccltbl = reallocate_wchar_array( ccltbl, Only in flex-2.5.35.U: ccl.c~ Only in flex-2.5.35: config.h Only in flex-2.5.35: config.log Only in flex-2.5.35: config.status Only in flex-2.5.35: .deps Common subdirectories: flex-2.5.35/doc and flex-2.5.35.U/doc diff flex-2.5.35/ecs.c flex-2.5.35.U/ecs.c 116c116 < Char ccls[]; --- > wchar_t ccls[]; Only in flex-2.5.35.U: ecs.c~ Common subdirectories: flex-2.5.35/examples and flex-2.5.35.U/examples diff flex-2.5.35/flexdef.h flex-2.5.35.U/flexdef.h 108,109c108,109 < /* Always be prepared to generate an 8-bit scanner. */ < #define CSIZE 256 --- > /* Always be prepared to generate a 16-bit scanner. */ > #define CSIZE 65536 648c648 < extern Char *ccltbl; --- > extern wchar_t *ccltbl; 678a679,684 > #define allocate_wchar_array(size) \ > (wchar_t *) allocate_array( size, sizeof( wchar_t ) ) > > #define reallocate_wchar_array(array,size) \ > (wchar_t *) reallocate_array( (void *) array, size, sizeof( wchar_t ) ) > 778c784 < extern void mkeccl PROTO ((Char[], int, int[], int[], int, int)); --- > extern void mkeccl PROTO ((wchar_t[], int, int[], int[], int, int)); 866c872 < extern void cshell PROTO ((Char[], int, int)); --- > extern void cshell PROTO ((wchar_t[], int, int)); 930c936 < extern Char myesc PROTO ((Char[])); --- > extern int myesc PROTO ((Char[])); Only in flex-2.5.35.U: flexdef.h~ diff flex-2.5.35/FlexLexer.h flex-2.5.35.U/FlexLexer.h 36a37,38 > // Since this header is generic for all sizes of flex scanners, you must > // define the type YY_CHAR before including it: 39a42 > // typedef xxx YY_CHAR; 43a47 > // typedef xxx YY_CHAR; 65c69 < const char* YYText() const { return yytext; } --- > const YY_CHAR* YYText() const { return yytext; } 95c99 < char* yytext; --- > YY_CHAR* yytext; 133,134c137,138 < virtual int LexerInput( char* buf, int max_size ); < virtual void LexerOutput( const char* buf, int size ); --- > virtual int LexerInput( YY_CHAR* buf, int max_size ); > virtual void LexerOutput( const YY_CHAR* buf, int size ); 137c141 < void yyunput( int c, char* buf_ptr ); --- > void yyunput( int c, YY_CHAR* buf_ptr ); 160c164 < char yy_hold_char; --- > YY_CHAR yy_hold_char; 166c170 < char* yy_c_buf_p; --- > YY_CHAR* yy_c_buf_p; 185c189 < char* yy_last_accepting_cpos; --- > YY_CHAR* yy_last_accepting_cpos; 190c194 < char* yy_full_match; --- > YY_CHAR* yy_full_match; Only in flex-2.5.35.U: FlexLexer.h~ diff flex-2.5.35/flex.skl flex-2.5.35.U/flex.skl 126c126 < M4_GEN_PREFIX(`_scan_bytes') --- > M4_GEN_PREFIX(`_scan_chars') 274a275 > *out for U pn 276c277 < #define YY_SC_TO_UI(c) ((unsigned int) (unsigned char) c) --- > /* #define YY_SC_TO_UI(c) ((unsigned int) (unsigned char) c) pn*/ 543,544c544,545 < char *yy_ch_buf; /* input buffer */ < char *yy_buf_pos; /* current position in input buffer */ --- > YY_CHAR *yy_ch_buf; /* input buffer */ > YY_CHAR *yy_buf_pos; /* current position in input buffer */ 546c547 < /* Size of input buffer in bytes, not including room for EOB --- > /* Size of input buffer in chars, not including room for EOB 642c643 < static char yy_hold_char; --- > static YY_CHAR yy_hold_char; 647c648 < static char *yy_c_buf_p = (char *) 0; --- > static YY_CHAR *yy_c_buf_p = (char *) 0; 678,680c679,684 < YY_BUFFER_STATE yy_scan_buffer M4_YY_PARAMS( char *base, yy_size_t size M4_YY_PROTO_LAST_ARG ); < YY_BUFFER_STATE yy_scan_string M4_YY_PARAMS( yyconst char *yy_str M4_YY_PROTO_LAST_ARG ); < YY_BUFFER_STATE yy_scan_bytes M4_YY_PARAMS( yyconst char *bytes, int len M4_YY_PROTO_LAST_ARG ); --- > YY_BUFFER_STATE yy_scan_buffer M4_YY_PARAMS( YY_CHAR *base, yy_size_t size M4_YY_PROTO_LAST_ARG ); > YY_BUFFER_STATE yy_scan_string M4_YY_PARAMS( yyconst YY_CHAR *yy_str M4_YY_PROTO_LAST_ARG ); > /* This is the old yy_scan_bytes function - renamed to avoid > * confusion since a character may now be 1 or 2 bytes. > */ > YY_BUFFER_STATE yy_scan_chars M4_YY_PARAMS( yyconst YY_CHAR *chars, int len M4_YY_PROTO_LAST_ARG ); 747c751 < *yy_cp = '\0'; \ --- > *yy_cp = (YY_CHAR) '\0'; \ 805c809 < char yy_hold_char; --- > YY_CHAR yy_hold_char; 808c812 < char *yy_c_buf_p; --- > YY_CHAR *yy_c_buf_p; 816c820 < char* yy_last_accepting_cpos; --- > YY_CHAR* yy_last_accepting_cpos; 825c829 < char *yy_full_match; --- > YY_CHAR *yy_full_match; 837,838c841,842 < char yytext_r[YYLMAX]; < char *yytext_ptr; --- > YY_CHAR yytext_r[YYLMAX]; > YY_CHAR *yytext_ptr; 843c847 < char *yytext_r; --- > YY_CHAR *yytext_r; 999c1003 < static void yyunput M4_YY_PARAMS( int c, char *buf_ptr M4_YY_PROTO_LAST_ARG); --- > static void yyunput M4_YY_PARAMS( int c, (YY_CHAR) *buf_ptr M4_YY_PROTO_LAST_ARG); 1005c1009 < static void yy_flex_strncpy M4_YY_PARAMS( char *, yyconst char *, int M4_YY_PROTO_LAST_ARG); --- > static void yy_flex_strncpy M4_YY_PARAMS( (YY_CHAR) *, yyconst char *, int M4_YY_PROTO_LAST_ARG); 1009c1013 < static int yy_flex_strlen M4_YY_PARAMS( yyconst char * M4_YY_PROTO_LAST_ARG); --- > static int yy_flex_strlen M4_YY_PARAMS( yyconst (YY_CHAR) * M4_YY_PROTO_LAST_ARG); 1077c1081 < #define ECHO fwrite( yytext, yyleng, 1, yyout ) --- > #define ECHO (void) fwrite( yytext, sizeof( YY_CHAR ), yyleng, yyout ) 1095c1099 < if ( (result = LexerInput( (char *) buf, max_size )) < 0 ) \ --- > if ( (result = LexerInput( buf, max_size )) < 0 ) \ 1239c1243 < register char *yy_cp, *yy_bp; --- > register YY_CHAR *yy_cp, *yy_bp; 1535c1539 < int yyFlexLexer::LexerInput( char* buf, int /* max_size */ ) --- > int yyFlexLexer::LexerInput( YY_CHAR* buf, int /* max_size */ ) 1537c1541 < int yyFlexLexer::LexerInput( char* buf, int max_size ) --- > int yyFlexLexer::LexerInput( YY_CHAR* buf, int max_size ) 1544c1548 < yyin->get( buf[0] ); --- > (void) yyin->read((unsigned char *) buf, sizeof( YY_CHAR ) ); 1555c1559 < (void) yyin->read( buf, max_size ); --- > (void) yyin->read((unsigned char *) buf, max_size * sizeof( YY_CHAR ) ); 1560c1564 < return yyin->gcount(); --- > return ( yyin->gcount() / sizeof( YY_CHAR ) ); 1564c1568 < void yyFlexLexer::LexerOutput( const char* buf, int size ) --- > void yyFlexLexer::LexerOutput( const YY_CHAR* buf, int size ) 1566c1570 < (void) yyout->write( buf, size ); --- > (void) yyout->write((unsigned char *) buf, size * sizeof( YY_CHAR ) ); 1588,1589c1592,1593 < register char *dest = YY_CURRENT_BUFFER_LVALUE->yy_ch_buf; < register char *source = YY_G(yytext_ptr); --- > register YY_CHAR *dest = YY_CURRENT_BUFFER_LVALUE->yy_ch_buf; > register YY_CHAR *source = YY_G(yytext_ptr); 1658c1662 < b->yy_ch_buf = (char *) --- > b->yy_ch_buf = (YY_CHAR *) 1661c1665 < b->yy_buf_size + 2 M4_YY_CALL_LAST_ARG ); --- > (b->yy_buf_size + 2)*sizeof( YY_CHAR ) M4_YY_CALL_LAST_ARG ); 1737c1741 < register char *yy_cp; --- > register YY_CHAR *yy_cp; 1774c1778 < static void yyunput YYFARGS2( int,c, register char *,yy_bp) --- > static void yyunput YYFARGS2( int,c, register YY_CHAR *,yy_bp) 1777c1781 < void yyFlexLexer::yyunput( int c, register char* yy_bp) --- > void yyFlexLexer::yyunput( int c, register YY_CHAR* yy_bp) 1780c1784 < register char *yy_cp; --- > register YY_CHAR *yy_cp; 1792c1796 < register char *dest = &YY_CURRENT_BUFFER_LVALUE->yy_ch_buf[ --- > register YY_CHAR *dest = &YY_CURRENT_BUFFER_LVALUE->yy_ch_buf[ 1794c1798 < register char *source = --- > register YY_CHAR *source = 1809c1813 < *--yy_cp = (char) c; --- > *--yy_cp = (YY_CHAR) c; 1853c1857 < *YY_G(yy_c_buf_p) = '\0'; --- > *YY_G(yy_c_buf_p) = (YY_CHAR)'\0'; 1900c1904 < *YY_G(yy_c_buf_p) = '\0'; /* preserve yytext */ --- > *YY_G(yy_c_buf_p) = (YY_CHAR)'\0'; /* preserve yytext */ 2016c2020 < b->yy_ch_buf = (char *) yyalloc( b->yy_buf_size + 2 M4_YY_CALL_LAST_ARG ); --- > b->yy_ch_buf = (YY_CHAR *) yyalloc( (b->yy_buf_size + 2)* sizeof( YY_CHAR ) M4_YY_CALL_LAST_ARG ); 2292c2296 < YY_BUFFER_STATE yy_scan_buffer YYFARGS2( char *,base, yy_size_t ,size) --- > YY_BUFFER_STATE yy_scan_buffer YYFARGS2( YY_CHAR *,base, yy_size_t ,size) 2336c2340 < YY_BUFFER_STATE yy_scan_string YYFARGS1( yyconst char *, yystr) --- > YY_BUFFER_STATE yy_scan_string YYFARGS1( yyconst YY_CHAR *, yystr) 2338a2343,2345 > int len; > for ( len = 0; yy_str[len]; ++len ) > ; 2340c2347 < return yy_scan_bytes( yystr, strlen(yystr) M4_YY_CALL_LAST_ARG); --- > return yy_scan_chars( yystr, len M4_YY_CALL_LAST_ARG); 2356c2363 < YY_BUFFER_STATE yy_scan_bytes YYFARGS2( yyconst char *,yybytes, int ,_yybytes_len) --- > YY_BUFFER_STATE yy_scan_chars YYFARGS2( yyconst YY_CHAR *,yychars, int ,_yybytes_len) 2359c2366 < char *buf; --- > YY_CHAR *buf; 2365,2366c2372,2373 < n = _yybytes_len + 2; < buf = (char *) yyalloc( n M4_YY_CALL_LAST_ARG ); --- > n = _yychars_len + 2; > buf = (YY_CHAR *) yyalloc( n sizeof( YY_CHAR ) M4_YY_CALL_LAST_ARG ); 2370,2371c2377,2378 < for ( i = 0; i < _yybytes_len; ++i ) < buf[i] = yybytes[i]; --- > for ( i = 0; i < _yychars_len; ++i ) > buf[i] = yychars[i]; 2373c2380 < buf[_yybytes_len] = buf[_yybytes_len+1] = YY_END_OF_BUFFER_CHAR; --- > buf[_yychars_len] = buf[_yychars_len+1] = YY_END_OF_BUFFER_CHAR; 2377c2384 < YY_FATAL_ERROR( "bad buffer in yy_scan_bytes()" ); --- > YY_FATAL_ERROR( "bad buffer in yy_scan_chars()" ); 2462c2469 < static void yy_fatal_error YYFARGS1(yyconst char*, msg) --- > static void yy_fatal_error YYFARGS1(yyconst YY_CHAR*, msg) 2490c2497 < *YY_G(yy_c_buf_p) = '\0'; \ --- > *YY_G(yy_c_buf_p) = (YY_CHAR) '\0'; \ 2945c2952 < static void yy_flex_strncpy YYFARGS3( char*,s1, yyconst char *,s2, int,n) --- > static void yy_flex_strncpy YYFARGS3( YY_CHAR*,s1, yyconst YY_CHAR *,s2, int,n) 2957c2964 < static int yy_flex_strlen YYFARGS1( yyconst char *,s) --- > static int yy_flex_strlen YYFARGS1( yyconst YY_CHAR *,s) Only in flex-2.5.35.U: flex.skl~ diff flex-2.5.35/gen.c flex-2.5.35.U/gen.c 941c941 < indent_puts ("register char *yy_cp = YY_G(yy_c_buf_p);"); --- > indent_puts ("register YY_CHAR *yy_cp = YY_G(yy_c_buf_p);"); 1690c1690 < ("static char *yy_last_accepting_cpos;\n"); --- > ("static YY_CHAR *yy_last_accepting_cpos;\n"); 1762c1762 < outn ("static char *yy_full_match;"); --- > outn ("static YY_CHAR *yy_full_match;"); 1857,1858c1857,1858 < outn ("char yytext[YYLMAX];"); < outn ("char *yytext_ptr;"); --- > outn ("YY_CHAR yytext[YYLMAX];"); > outn ("YY_CHAR *yytext_ptr;"); 1864c1864 < outn ("char *yytext;"); --- > outn ("YY_CHAR *yytext;"); 1877c1877 < outn ("\twhile ( (result = read( fileno(yyin), (char *) buf, max_size )) < 0 ) \\"); --- > outn ("\twhile ( (result = read( fileno(yyin), (char *) buf, max_size* sizeof( YY_CHAR ) )) < 0 ) \\"); 1895,1896c1895,1905 < outn ("\t\t\t (c = getc( yyin )) != EOF && c != '\\n'; ++n ) \\"); < outn ("\t\t\tbuf[n] = (char) c; \\"); --- > > > if ( csize == 65536 ) > outn( > "\t\t\t (c = getwc( yyin )) != WEOF && c != '\\n'; ++n ) \\" ); > else > outn( > "\t\t\t (c = getc( yyin )) != EOF && c != '\\n'; ++n ) \\" ); > > outn( "\t\t\tbuf[n] = (YY_CHAR) c; \\" ); > 1898,1899c1907,1918 < outn ("\t\t\tbuf[n++] = (char) c; \\"); < outn ("\t\tif ( c == EOF && ferror( yyin ) ) \\"); --- > > outn( "\t\t\tbuf[n++] = (YY_CHAR) c; \\" ); > > if ( csize == 65536 ) > outn( > "\t\tif ( c == WEOF && ferror( yyin ) ) \\" ); > else > outn( > "\t\tif ( c == EOF && ferror( yyin ) ) \\" ); > > > 1902a1922 > 1906c1926 < outn ("\t\twhile ( (result = fread(buf, 1, max_size, yyin))==0 && ferror(yyin)) \\"); --- > outn ("\t\twhile ( (result = fread(buf, sizeof( YY_CHAR ), max_size, yyin))==0 && ferror(yyin)) \\"); Only in flex-2.5.35.U: gen.c~ Common subdirectories: flex-2.5.35/m4 and flex-2.5.35.U/m4 diff flex-2.5.35/main.c flex-2.5.35.U/main.c 96c96 < Char *ccltbl; --- > wchar_t *ccltbl; 265c265 < csize = CSIZE; --- > csize = 256; 306a307,326 > if ( csize == 65536 ) > { > if ( fulltbl ) > { > if ( use_read ) > flexerror( _( "Can't use -f with -U" ) ); > else > flexerror( _( "Can't use -Cf with -U" ) ); > } > else if ( fullspd ) > { > if ( use_read ) > flexerror( _( "Can't use -F with -U" ) ); > else > flexerror( _( "Can't use -CF with -U" ) ); > } > else if ( ! useecs && ! usemecs ) > flexerror( _( "Can't use -C with -U" ) ); > } > 483a504,532 > outn( "/* Define the YY_CHAR type. */" ); > > switch (csize) { > case 65536: > outn( "#include <wchar.h>" ); > outn( "typedef unsigned short YY_CHAR;" ); > break; > case 256: > outn( "typedef unsigned char YY_CHAR;" ); > break; > default: > outn( "typedef char YY_CHAR;" ); > break; > } > > outn( "\n/* Promotes a YY_CHAR to an unsigned integer for use as an array index. */"); > > switch (csize) { > case 65536: > case 256: > outn( "#define YY_SC_TO_UI(c) ((unsigned int) (unsigned short) c)" ); > break; > default: > outn( > "#define YY_SC_TO_UI(c) ((unsigned int) (unsigned char) c)" ); > break; > } > > skelout(); 789a839,840 > else if ( csize == 256 ) > putc( '8', stderr ); 791c842 < putc ('8', stderr); --- > putc( 'U', stderr ); 1208c1259,1263 < csize = CSIZE; --- > csize = 256; > break; > > case OPT_UNICODE: > csize = 65536; 1589,1592c1644,1647 < if (csize == 256) < outn ("typedef unsigned char YY_CHAR;"); < else < outn ("typedef char YY_CHAR;"); --- > //if (csize == 256) > // outn ("typedef unsigned char YY_CHAR;"); > //else > // outn ("typedef char YY_CHAR;"); 1677c1732 < outn ("extern char yytext[];\n"); --- > outn ("extern YY_CHAR yytext[];\n"); 1684c1739 < outn ("extern char *yytext;"); --- > outn ("extern YY_CHAR *yytext;"); 1744c1799 < ccltbl = allocate_Character_array (current_max_ccl_tbl_size); --- > ccltbl = allocate_wchar_array (current_max_ccl_tbl_size); 1830c1885,1886 < " -B, --batch generate batch scanner (opposite of -I)\n" --- > " -U, generate 16-bit scanner\n" > " -B, --batch generate batch scanner (opposite of -I)\n" Only in flex-2.5.35.U: main.c~ Only in flex-2.5.35: Makefile diff flex-2.5.35/misc.c flex-2.5.35.U/misc.c 254,256c254,264 < lerrsf (_ < ("scanner requires -8 flag to use the character %s"), < readable_form (c)); --- > { > if ( c < 256 ) > lerrsf( > _( "scanner requires -8 flag to use the character %s" ), > readable_form( c ) ); > else > lerrsf( > _( "scanner requires -U flag to use the character %s" ), > readable_form( c ) ); > > } 336c344 < Char v[]; --- > wchar_t v[]; 340c348 < Char k; --- > wchar_t k; 615c623 < Char myesc (array) --- > int myesc (array) 618c626,627 < Char c, esc_char; --- > Char c; > unsigned int esc_char; Only in flex-2.5.35.U: misc.c~ diff flex-2.5.35/options.c flex-2.5.35.U/options.c 200,201c200,201 < {"-U", OPT_8BIT, 0} < , /* Do not include unistd.h */ --- > {"-U", OPT_UNICODE, 0} > , Only in flex-2.5.35: options.c~ diff flex-2.5.35/options.h flex-2.5.35.U/options.h 44a45 > OPT_UNICODE, Common subdirectories: flex-2.5.35/po and flex-2.5.35.U/po Only in flex-2.5.35: stamp-h1 diff flex-2.5.35/tblcmp.c flex-2.5.35.U/tblcmp.c 687c687 < Char transset[CSIZE + 1]; --- > wchar_t transset[CSIZE + 1]; Only in flex-2.5.35.U: tblcmp.c~ Common subdirectories: flex-2.5.35/tests and flex-2.5.35.U/tests Common subdirectories: flex-2.5.35/tools and flex-2.5.35.U/tools |
From: Tim L. <ti...@ti...> - 2012-03-06 17:36:14
|
Isaac Dunham <id...@la...> wrote: > [...] > The steps up to chmod -x roughly correspond to creating a release; those > afterwards correspond to building from the tarball. > So yes, it works for all sane configurations; it fails when you > -install GNU M4 > -then install autotools > -then remove gnu m4 (breaks autotools) Thanks for testing and nice to see that it works. Just to clarify: There shouldn't be *any* failure unless you change something after ./configure. So it probably wasn't a good idea of me to have you run autoconf yourself as it apparent- ly only added some confusion :-). Tim |
From: Will E. <wes...@gm...> - 2012-03-02 21:39:22
|
This is now in flex cvs. --Will On Thursday, 1 March 2012, 6:17 pm -0800, Isaac Dunham <id...@la...> wrote: > On Fri, 24 Feb 2012 14:44:57 +0000 > Tim Landscheidt <ti...@ti...> wrote: > > I replaced the test for GNU M4 with a feature test for "m4 > > -P" in the attached patch. Could you try it out on NetBSD? > > (You will have to regenerate "configure" with "autoconf".) > > It builds with NetBSD m4, if autotools is absent (or non-executable!) > If autotools is installed, autom4te will die because of the missing GNU M4. > > What I did that worked: > #gm4 is GNU m4 > #m4 is netbsd m4 > patch > autoconf #requires gm4 > chmod -x /usr/bin/pkg/gm4 #simulate no GNU m4 > #autotools needs GNU m4, and "missing" doesn't check beforehand > #This scenario wouldn't happen without tinkering. > chmod -x /usr/pkg/bin/auto* > /configure > make > > The steps up to chmod -x roughly correspond to creating a release; those > afterwards correspond to building from the tarball. > So yes, it works for all sane configurations; it fails when you > -install GNU M4 > -then install autotools > -then remove gnu m4 (breaks autotools) > > -- > Isaac Dunham <id...@la...> > > > ------------------------------------------------------------------------------ > Virtualization & Cloud Management Using Capacity Planning > Cloud computing makes use of virtualization - but cloud computing > also focuses on allowing computing to be delivered as a service. > http://www.accelacomm.com/jaw/sfnl/114/51521223/ > _______________________________________________ > Flex-devel mailing list > Fle...@li... > https://lists.sourceforge.net/lists/listinfo/flex-devel -- Will Estes (wl...@us...) Flex Project Maintainer http://flex.sourceforge.net/ |
From: Isaac D. <id...@la...> - 2012-03-02 02:17:36
|
On Fri, 24 Feb 2012 14:44:57 +0000 Tim Landscheidt <ti...@ti...> wrote: > I replaced the test for GNU M4 with a feature test for "m4 > -P" in the attached patch. Could you try it out on NetBSD? > (You will have to regenerate "configure" with "autoconf".) It builds with NetBSD m4, if autotools is absent (or non-executable!) If autotools is installed, autom4te will die because of the missing GNU M4. What I did that worked: #gm4 is GNU m4 #m4 is netbsd m4 patch autoconf #requires gm4 chmod -x /usr/bin/pkg/gm4 #simulate no GNU m4 #autotools needs GNU m4, and "missing" doesn't check beforehand #This scenario wouldn't happen without tinkering. chmod -x /usr/pkg/bin/auto* /configure make The steps up to chmod -x roughly correspond to creating a release; those afterwards correspond to building from the tarball. So yes, it works for all sane configurations; it fails when you -install GNU M4 -then install autotools -then remove gnu m4 (breaks autotools) -- Isaac Dunham <id...@la...> |
From: Will E. <wes...@gm...> - 2012-02-26 02:40:44
|
Paul, I don't think you need to worry about the 16 bit option. The issue with unicode is...well, I'm entry level at this point with unicode understanding. So, let's see what you've got and we all can go over it and get in what makes the most sense. On Saturday, 25 February 2012, 9:43 am -0500, Paul <pa...@pr...> wrote: > I notice there is an incompletely implemented flag in flex 2.5.35 > called 16bit. Only the flag exists. No implementation. If it would > resolve conflict in getting this patch into the mainline, I would be > happy to rename it under the 16bit flag instead of unicode. Would > that help? > > Thanks for your interest, > > Paul Neelands > > On 02/17/2012 09:12 PM, Will Estes wrote: > >Please post what you have. There's tons of interest from you, me and > >other folks. Probably better to post to flex-devel, though. > > > >--Will > > > >On Tuesday, 17 January 2012, 12:55 pm -0500, Paul<pa...@pr...> wrote: > > > >>I needed 16 bit unicode in version 2.5.35 of lex, so I have brought > >>forward the old path for 2.5.4a to the current version. > >>It works in both normal and reentrant modes and with bison-bridge. > >>Non-unicode operation has also been tested and is unaffected. > >>Is there any interest in posting this patch or incorporating it into > >>mainline flex? > >> > >>Paul Neelands > >> > >>------------------------------------------------------------------------------ > >>Keep Your Developer Skills Current with LearnDevNow! > >>The most comprehensive online learning library for Microsoft developers > >>is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3, > >>Metro Style Apps, more. Free future releases when you subscribe now! > >>http://p.sf.net/sfu/learndevnow-d2d > >>_______________________________________________ > >>Flex-help mailing list > >>Fle...@li... > >>https://lists.sourceforge.net/lists/listinfo/flex-help -- Will Estes (wl...@us...) Flex Project Maintainer http://flex.sourceforge.net/ |