Greetings,
I've encountered more than a few ICC profiles which seem to be
malformed, in the sense that they only store the Profile Description in
the 7-bit ASCII part of profileDescriptionTag (§6.4.27), despite there
being high bits set in the name.
For example, "SWOP (couché), 20%" or "LG HDR 4K-".
In the profiles I'm looking at, the former appears to be Mac Roman
encoded (and is accompanied by a PrimaryPlatform of "APPL"), whereas the
latter appears to be Codepage 1252 (or compatible) encoded (and is
accompanied by a PrimaryPlatform of "MSFT").
When using cmsGetProfileInfo, littlecms is correctly finding that there
is no Unicode localisation and gives me the "ASCII" version in a
wchar_t[]. However, as mentioned, it's not actually ASCII and there's no
way to know what the encoding should be.
This is not littlecms's fault; it's a spec violation. However it's
common enough that I was wondering whether some heuristics could be
added: in the case that only an "ASCII" variant is present, ignore the
fact that the spec requires 7-bit and try to interpret the name in some
codepage, based on the PrimaryPlatform. This would then be transcoded to
UTF-16 (or whatever the native wide-char encoding is, I guess) for the
result of cmsGetProfileInfo, and the calling scope would not need to
worry about the fact that it has no idea what encoding some malformed
Profile Description is in. We'd be more likely to get the "right"
result.
This is a long shot, I know, but would the team be interested in
something like this? Or are "dirty hacks" for bad files totally frowned
upon in this project?
Cheers :)
Tom
|