#4 Support for CS6 UTF-8 charspec

Markus Kuhn


The OSTA UDF standard predates the official definition
of the ASCII-compatible UTF-8 encoding of Unicode and
ISO 10646. Therefore, OSTA invented it's own "OSTA
Compressed Unicode" character encoding for dstrings,
which is like nothing else used anywhere under Linux.

Unicode is used today under Linux in the form of the
UTF-8 encoding in file names, text files, etc. It can
be hoped that eventually UTF-8 will almost completely
replace other character encodings on GNU/Linux systems.

Feature Request:

While the UDF standard allows only the use of the
OSTA-invented encoding, the ECMA-167 standard does have
all necessary provisions for UTF-8 to be used directly
as the dstring encoding. To do so, use in charspec
(ECMA 167, 7.2) the following values:

CharacterSetType = 6 (CS6)
CharacterSetInfo = #1B, #25, #47

The CharacterSetInfo value "ESC %G" is the ISO 2022 /
ECMA-35 registered escape sequence for UTF-8.

It would be nice, if the Linux UDF driver and tools
could also generate and handle ECMA-167 disks which use
the CS6/UTF-8 encoding instead of the OSTA Compressed
Unicode encoding. Such disks would not conform any more
to the UDF standard, but they would be fully based on
ISO standards and would not require recoding in a
non-standard encoding under Linux. In future Linux
systems which use UTF-8 everywhere in filenames,
filenames could be copied without any conversion
between disk and application.


More info on Unicode/UTF-8 under GNU/Linux:


UDF Spec:




Registered ISO-2022/ECMA-35 ESC sequence for UTF-8:



  • Logged In: NO

    I disagree, allthough UTF-8 is recomendable breaking UDF
    compatibility just for not having to do an almost trivial
    name translation is not a good thing. Maybe it'll come in a
    new UDF revision but UDF implemenataion compatibility is
    allready a major issue and should be worked on first.

    Reinoud Zandijk
    (NetBSD udf, udfclient)