[Phpvideopro-developers] About UTF-8
Brought to you by:
izzy
From: Leszek B. <bo...@aj...> - 2001-09-05 19:24:24
|
From http://www.cl.cam.ac.uk/~mgk25/unicode.html Because of these difficulties, the major Linux distributors and application developers now foresee and hope that Unicode will eventually replace all these older legacy encodings, primarily in the UTF-8 form. UTF-8 will be used in text files (source code, HTML files, email messages, etc.) file names standard input and standard output, pipes environment variables cut and paste selection buffers telnet, modem, and serial port connections to terminal emulators and in any other places where byte sequences used to be interpreted in ASCII In UTF-8 mode, terminal emulators such as xterm or the Linux console driver transform every keystroke into the corresponding UTF-8 sequence and send it to the stdin of the foreground process. Similarly, any output of a process on stdout is sent to the terminal emulator, where it is processed with a UTF-8 decoder and then displayed using a 16-bit font. Full Unicode functionality with all bells and whistles (e.g. high-quality typesetting of the Arabic and Indic scripts) can only be expected from sophisticated multi-lingual word-processing packages. What Linux will use on a broad base to replace ASCII and the other 8-bit character sets is far simpler. Linux terminal emulators and command line tools will in the first step only switch to UTF-8. This means that only a Level 1 implementation of ISO 10646-1 is used (no combining characters), and only scripts such as Latin, Greek, Cyrillic, Armenian, Georgian, CJK, and many scientific symbols are supported that need no further processing support. At this level, UCS support is very comparable to ISO 8859 support and the only significant difference is that we have now thousands of different characters available, that characters can be represented by multibyte sequences, and that ideographic Chinese/Japanese/Korean characters require two terminal character positions (double-width). --- Leszek Boroch eng: KISS! - Keep It Simple Stupid! Technical University of Lublin pol: BUZI! - Bez Udziwnien Zapisu Idioto! mailto: bo...@aj... lub: BUZI! - Bez Urzywania Zakreconego Interfejsu! |