[Lcms-user] Heuristics for badly-encoded profileDescriptionTag
An ICC-based CMM for color management
Brought to you by:
mm2
From: Tomalak Geret'k. <to...@ke...> - 2019-09-09 11:53:23
|
Greetings, I've encountered more than a few ICC profiles which seem to be malformed, in the sense that they only store the Profile Description in the 7-bit ASCII part of profileDescriptionTag (§6.4.27), despite there being high bits set in the name. For example, "SWOP (couché), 20%" or "LG HDR 4K-". In the profiles I'm looking at, the former appears to be Mac Roman encoded (and is accompanied by a PrimaryPlatform of "APPL"), whereas the latter appears to be Codepage 1252 (or compatible) encoded (and is accompanied by a PrimaryPlatform of "MSFT"). When using cmsGetProfileInfo, littlecms is correctly finding that there is no Unicode localisation and gives me the "ASCII" version in a wchar_t[]. However, as mentioned, it's not actually ASCII and there's no way to know what the encoding should be. This is not littlecms's fault; it's a spec violation. However it's common enough that I was wondering whether some heuristics could be added: in the case that only an "ASCII" variant is present, ignore the fact that the spec requires 7-bit and try to interpret the name in some codepage, based on the PrimaryPlatform. This would then be transcoded to UTF-16 (or whatever the native wide-char encoding is, I guess) for the result of cmsGetProfileInfo, and the calling scope would not need to worry about the fact that it has no idea what encoding some malformed Profile Description is in. We'd be more likely to get the "right" result. This is a long shot, I know, but would the team be interested in something like this? Or are "dirty hacks" for bad files totally frowned upon in this project? Cheers :) Tom |