Re: [Gptfdisk-general] GPTPart::GetDescription broken in 1.0.7 on Big Endian systems
Brought to you by:
srs5694
|
From: Erik L. <cat...@gm...> - 2021-06-08 16:23:09
|
Hi, On 08-06-2021 18:16, Rod Smith wrote: > On 6/8/21 10:54 AM, Christian Ehrhardt wrote: >>> There aren't a lot of changes from 1.0.7 to the current >>> state, but given that this is kind of important for big-endian systems, >>> I'll release this change as 1.0.8 soon -- but I'll give it a day or two >>> in case something else crops up. >> Agreed, that little wait sounds wise. >> I've indirectly also pulled IBM into the boat since they are kind of >> the "master of big-endian things". >> Maybe (but no promise) they have opinions/hints on this as well. > I've been doing some searching to find a way to identify UCS-2 with > swapped bytes, so that the program could fix this automatically when > reading corrupted partition tables. (GPT technically uses UCS-2, not > UTF-16, although the GPT fdisk code has a lot of references to UTF-16 > because at one time I relied on an external UTF-16 library.) So far no luck. This problem is impossible to solve perfectly, however I would think that partition labels with characters outside the ASCII range are very rare in practice so for a start one could check if the first byte is 0 in every UCS-2 byte-pair. If true, then it's almost certain that the label is in big endian (but I would still ask the user to confirm that the big-endian interpretation makes more sense). There are of course also some ranges that are reserved in Unicode that can be used to detect that something might not be right with the endianness and if we're limiting the allowed Unicode range to UCS-2 we can exclude UTF-16 surrogate pair ranges (disallowing for instance the poop emoji as a partition label). One could also check locality (text doesn't usually use characters from all over the place in but stays within a few defined script ranges) and there may also be statistical models based on the frequency that Unicode characters occur in text, but that's probably taking it too far. Best regards, - Erik |