|
From: Markus S. <mar...@gm...> - 2012-10-22 18:36:04
|
On Sat, Oct 20, 2012 at 12:41 PM, Tim Kimber <KI...@uk...> wrote: > I've recently encountered a character encoding that doesn't use octets, > and is not byte-aligned. It is GSM-338 ( > http://en.wikipedia.org/wiki/GSM_03.38#GSM_7_bit_default_alphabet_and_extension_table_of_3GPP_TS_23.038_.2F_GSM_03.38 > ). > > a) is there any way to configure ICU to encode/decode GSM-338? We have a conversion table available here: http://source.icu-project.org/repos/icu/data/trunk/charset/contrib/data/ucm/ http://site.icu-project.org/charts/charset It is used assuming that GSM-03.38 is stored using whole bytes. b) if answer to a) is 'no', is there any prospect of support being added in > a future release of ICU? > c) how hard would it be for an ICU user to write an ICU encoder/decoder > that operates on a bit stream instead of an octet stream, and supports > GSM-338? Is there any documentation on 'rolling your own' ICU encoding? ( > Apart from the documents that explain how to write data files to control > the existing algorithms ) > I think you would pack and unpack as a separate step, and use the ICU converter for the byte-oriented version of your text. Best regards, markus -- Google Internationalization Engineering |