Thread: [Phpvideopro-developers] About UTF-8

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

From http://www.cl.cam.ac.uk/~mgk25/unicode.html

Because of these difficulties, the major Linux distributors and application
developers now foresee and hope that Unicode will eventually replace all
these older legacy encodings, primarily in the UTF-8 form. UTF-8 will be
used in

text files (source code, HTML files, email messages, etc.)
file names
standard input and standard output, pipes
environment variables
cut and paste selection buffers
telnet, modem, and serial port connections to terminal emulators
and in any other places where byte sequences used to be interpreted in ASCII
In UTF-8 mode, terminal emulators such as xterm or the Linux console driver
transform every keystroke into the corresponding UTF-8 sequence and send it
to the stdin of the foreground process. Similarly, any output of a process
on stdout is sent to the terminal emulator, where it is processed with a
UTF-8 decoder and then displayed using a 16-bit font.

Full Unicode functionality with all bells and whistles (e.g. high-quality
typesetting of the Arabic and Indic scripts) can only be expected from
sophisticated multi-lingual word-processing packages. What Linux will use on
a broad base to replace ASCII and the other 8-bit character sets is far
simpler. Linux terminal emulators and command line tools will in the first
step only switch to UTF-8. This means that only a Level 1 implementation of
ISO 10646-1 is used (no combining characters), and only scripts such as
Latin, Greek, Cyrillic, Armenian, Georgian, CJK, and many scientific symbols
are supported that need no further processing support. At this level, UCS
support is very comparable to ISO 8859 support and the only significant
difference is that we have now thousands of different characters available,
that characters can be represented by multibyte sequences, and that
ideographic Chinese/Japanese/Korean characters require two terminal
character positions (double-width).

---
Leszek Boroch                                       eng: KISS! - Keep It
Simple Stupid!
Technical University of Lublin                  pol: BUZI! - Bez Udziwnien
Zapisu Idioto!
mailto: bo...@aj...           lub:  BUZI! - Bez Urzywania
Zakreconego Interfejsu!

Thread: [Phpvideopro-developers] About UTF-8

phpvideopro-developers