Re: [Passwordsafe-devel] Enhancement Topics (Format and Usage Bugs)
Popular easy-to-use and secure password manager
Brought to you by:
ronys
From: Frank P. <fp...@fp...> - 2007-05-08 21:52:11
|
On Tue, 08 May 2007 02:53:27 -0400, Wolfgang Keller <91...@gm...> wrote: > > The definition is still to weak or misleading. Without being an expert > in this, I know that UTF-8 may come with something they call "BOM" as a > prefix token. Moreover, UTF-8 encoding is a bit-format, not a text > format and I would not expect that null bytes are to be excluded from > being a valid part of it. So I suggest the following text as definition > for "Text": > Hi Wolfgang and all, BOMs ("Byte Order Marks") may be used with UTF-16 or UTF-32 to indicate that a code unit sequence is serialized in either big or little endian order. UTF-8 is byte oriented, and has no need for BOMs. Let's just delegate the responsibility to the appropriate party, with the appropriate references. Text fields are stored using the UTF-8 encoding scheme (see definition D-39 of the Unicode Standard 4.0 at http://www.unicode.org/versions/Unicode4.0.0/ch03.pdf). Note that a Unicode string (D-29a) does not contain any terminating NUL character that might exist in a C language implementation; consequently, no NUL character is stored or counted as part of the field length. I.e., the ASCII string "Hello World" is stored as a single block, with the field length set to 11. We could strike the second sentence, because it is implicit from the first, but given that I have been recently bitten by an incompatibility involving the null character recently, I'd like to see it included. Frank -- Frank Pilhofer, fp...@fp... |