Linux UDF / Feature Requests / #4 Support for CS6 UTF-8 charspec

#4 Support for CS6 UTF-8 charspec

Status: open

Owner: nobody

Labels: None

Priority: 5

Updated: 2002-08-26

Created: 2002-08-26

Creator: Markus Kuhn

Private: No

Background:

The OSTA UDF standard predates the official definition
of the ASCII-compatible UTF-8 encoding of Unicode and
ISO 10646. Therefore, OSTA invented it's own "OSTA
Compressed Unicode" character encoding for dstrings,
which is like nothing else used anywhere under Linux.

Unicode is used today under Linux in the form of the
UTF-8 encoding in file names, text files, etc. It can
be hoped that eventually UTF-8 will almost completely
replace other character encodings on GNU/Linux systems.

Feature Request:

While the UDF standard allows only the use of the
OSTA-invented encoding, the ECMA-167 standard does have
all necessary provisions for UTF-8 to be used directly
as the dstring encoding. To do so, use in charspec
(ECMA 167, 7.2) the following values:

CharacterSetType = 6 (CS6)
CharacterSetInfo = #1B, #25, #47

The CharacterSetInfo value "ESC %G" is the ISO 2022 /
ECMA-35 registered escape sequence for UTF-8.

It would be nice, if the Linux UDF driver and tools
could also generate and handle ECMA-167 disks which use
the CS6/UTF-8 encoding instead of the OSTA Compressed
Unicode encoding. Such disks would not conform any more
to the UDF standard, but they would be fully based on
ISO standards and would not require recoding in a
non-standard encoding under Linux. In future Linux
systems which use UTF-8 everywhere in filenames,
filenames could be copied without any conversion
between disk and application.

References:

More info on Unicode/UTF-8 under GNU/Linux:

http://www.cl.cam.ac.uk/~mgk25/unicode.html

UDF Spec:

http://www.osta.org/specs/pdf/udf201.pdf

ECMA-167:

ftp://ftp.ecma.ch/ecma-st/Ecma-167.pdf

Registered ISO-2022/ECMA-35 ESC sequence for UTF-8:

http://www.itscj.ipsj.or.jp/ISO-IR/196.pdf

Discussion

Nobody/Anonymous - 2006-08-26

Logged In: NO

I disagree, allthough UTF-8 is recomendable breaking UDF
compatibility just for not having to do an almost trivial
name translation is not a good thing. Maybe it'll come in a
new UDF revision but UDF implemenataion compatibility is
allready a major issue and should be worked on first.

Reinoud Zandijk
(NetBSD udf, udfclient)

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Support for CS6 UTF-8 charspec

Group

Searches

Help

#4 Support for CS6 UTF-8 charspec

Discussion