From: Wu Y. <ad...@ne...> - 2002-01-29 02:58:59
|
I agree with Oleg that getc is causing problems, but we can do nothing about it, for otherwise we may possibly break other programs. However, straits.h MUST be patched for STL to work with 8-bit characters. I confirmed that iswspace should not have problems: fgetwc returns wint_t, which is defined as unsigned short in MSVC, and unsigned int in MinGW. All normal values should be positive. Anyone can explain to me why wint_t is defined differently than in MSVC? And the line "#define __WCHAR_TYPE__ int" in <stddef.h> looks strange (followed by "typedef __WCHAR_TYPE__ wchar_t", thought it does not seem to have any effect). It seems there are no feasible ideal solutions for the current problem, so we should just patch straits.h to let Alexandros's program work (it really should, and do in MSVC (either mode including /MD) and Cygwin), for now. Best regards, Wu Yongwei --- Original Message from Oleg Sesov <se...@ma...> --- Wu Yongwei wrote: > > If Danny's reading of the C standard is true, then we must patch > straits.h, which states > > static bool is_del(char_type a) { return isspace(a); } > > Should we think -1 and 255 are identical so that we can use > > static bool is_del(char_type a) { return isspace(static_cast<unsigned char>(a)); } In my opinion this solution is appropriate --- EOF is not fail into char type bounds --- it is extention of it and in this case isspace would get the parameter in the specified bounds. On the other hand I would modify getc() implementation, which is the source of problems, to return character values in range of unsigned char and -1. This way it would be possible to use code like one being discussed and it would be possible to assign values to char/signed char/unsigned char variables. I am not sure if there is anything about it in ANSI C, but I think it is better than return values which are out of bounds. If it were impossible I would work round using ld's --wrap option or patch code near getc() functions. > And is the same applicable to the line that "return iswspace(a)" for > wchar_t? I think it shouldn't --- getwc defined properly and it wouldn't return broken values. As well as the whole unicode layer designed to get rid of such problems. Unfortunately unicode rarely used in third party libraries. > > Anyway, I doubt the meaning of "representable" in "shall be representable > as a unsigned char or shall equal the value of the macro EOF". And I think > my patch MIGHT be good to prevent careless programmers from making strange > mistakes in non-O0 modes and maintain compatibility with other compilers > like Cygwin. > > By the way, I don't like my patch, either. It is ugly. It works only for > C++ or optimizing mode of C! What is the ideal solution? There are three ideal solutions: 1) use wchar_t 2) make getc to return proper values --- which is incompatible with other C compilers. 3) The best one is to redesign ANSI C to separate character and small number types. Characters shouldn't have any sign. Anyway I can't realize lookng in a book which characters are possible and which are negative and why ;-) [...] > MS runtime is compliant with above, and docs state: > "The is routines produce meaningful results for any integer argument > from -1 (EOF) to UCHAR_MAX (0xFF), inclusive." > > I belive it is up to user to range-check the input argument before passing > to is* function or macro. I agree, but that means not to use STL for 8-bit characters in this case. [...] Oleg Sesov. |