|
From: Andy H. <and...@gm...> - 2005-10-19 23:17:46
|
These changes to Charset Detection came out of a review of the API with Mar= kus. =3D=3D=3D ucsdet_getUChars() =3D=3D=3D This fills a caller-supplied UChar buffer with the input text data after conversion to UChars. The change is to the behavior when the buffer is too small to hold the full UChar string. As originally described, the function would put as many characters as would fit into the output buffer, and return the number of chars actually returned. The total size needed to hold the entire string was not returned. The new behavior is the same as that of ucnv_toUChars - when the buffer is too small, the buffer contents are undefined and return value is the total number of UChars that would be in the output string, not including the terminating NUL. The new behavior follows the usual convention for ICU functions that fill an output buffer with UChars. The original behavior was intended to make it easier to work with files where the total size was not known in advance, and could be extremely large. The file APIs have since been removed from charset detection, which eliminates the reason for the non-standard behavior. File APIs that work with charset detection will be proposed later for the ICU IO package. =3D=3D=3D ucsdet_getDetectableCharsetName =3D=3D=3D =3D=3D=3D ucsdet_DetectableCharsetsCount =3D=3D=3D Replace these two functions with a single one that provides a UEnumeration over the detectable charsets. The new function name can be the taken from Java. UEnumeration * ucsdet_getAllDetectableCharsets(const UCharsetDetector *csd, UErrorCode *status); This is more in keeping with the preferred conventions for new ICU APIs, and can better deal with the chance that there may be some way in the future to register or add detectors to the charset detector service on the fly. Functions on UEnumeration provide for enumerating over the set of detectable charsets. -- Andy Heninger |