Thread: [Phpvideopro-developers] About UTF-8
Brought to you by:
izzy
From: Leszek B. <bo...@aj...> - 2001-09-05 19:24:24
|
From http://www.cl.cam.ac.uk/~mgk25/unicode.html Because of these difficulties, the major Linux distributors and application developers now foresee and hope that Unicode will eventually replace all these older legacy encodings, primarily in the UTF-8 form. UTF-8 will be used in text files (source code, HTML files, email messages, etc.) file names standard input and standard output, pipes environment variables cut and paste selection buffers telnet, modem, and serial port connections to terminal emulators and in any other places where byte sequences used to be interpreted in ASCII In UTF-8 mode, terminal emulators such as xterm or the Linux console driver transform every keystroke into the corresponding UTF-8 sequence and send it to the stdin of the foreground process. Similarly, any output of a process on stdout is sent to the terminal emulator, where it is processed with a UTF-8 decoder and then displayed using a 16-bit font. Full Unicode functionality with all bells and whistles (e.g. high-quality typesetting of the Arabic and Indic scripts) can only be expected from sophisticated multi-lingual word-processing packages. What Linux will use on a broad base to replace ASCII and the other 8-bit character sets is far simpler. Linux terminal emulators and command line tools will in the first step only switch to UTF-8. This means that only a Level 1 implementation of ISO 10646-1 is used (no combining characters), and only scripts such as Latin, Greek, Cyrillic, Armenian, Georgian, CJK, and many scientific symbols are supported that need no further processing support. At this level, UCS support is very comparable to ISO 8859 support and the only significant difference is that we have now thousands of different characters available, that characters can be represented by multibyte sequences, and that ideographic Chinese/Japanese/Korean characters require two terminal character positions (double-width). --- Leszek Boroch eng: KISS! - Keep It Simple Stupid! Technical University of Lublin pol: BUZI! - Bez Udziwnien Zapisu Idioto! mailto: bo...@aj... lub: BUZI! - Bez Urzywania Zakreconego Interfejsu! |
From: Tom A. <to...@ko...> - 2001-09-17 15:55:37
|
Maybe I missed somethings, but the other day I saw some funtions in php, to convert text to utf (or something). But isn't xml rather standard and will it not cover this problem? Not that i like it, but ... Tom > -----Oorspronkelijk bericht----- > Van: > php...@li... > [mailto:php...@li...urceforge. > net]Namens Leszek Boroch > Verzonden: woensdag 5 september 2001 21:25 > Aan: phpvideopro-developers > Onderwerp: [Phpvideopro-developers] About UTF-8 > > > >From http://www.cl.cam.ac.uk/~mgk25/unicode.html > > > Because of these difficulties, the major Linux > distributors and application > developers now foresee and hope that Unicode will > eventually replace all > these older legacy encodings, primarily in the UTF-8 > form. UTF-8 will be > used in > > text files (source code, HTML files, email messages, etc.) > file names > standard input and standard output, pipes > environment variables > cut and paste selection buffers > telnet, modem, and serial port connections to terminal > emulators > and in any other places where byte sequences used to > be interpreted in ASCII > In UTF-8 mode, terminal emulators such as xterm or the > Linux console driver > transform every keystroke into the corresponding UTF-8 > sequence and send it > to the stdin of the foreground process. Similarly, any > output of a process > on stdout is sent to the terminal emulator, where it > is processed with a > UTF-8 decoder and then displayed using a 16-bit font. > > Full Unicode functionality with all bells and whistles > (e.g. high-quality > typesetting of the Arabic and Indic scripts) can only > be expected from > sophisticated multi-lingual word-processing packages. > What Linux will use on > a broad base to replace ASCII and the other 8-bit > character sets is far > simpler. Linux terminal emulators and command line > tools will in the first > step only switch to UTF-8. This means that only a > Level 1 implementation of > ISO 10646-1 is used (no combining characters), and > only scripts such as > Latin, Greek, Cyrillic, Armenian, Georgian, CJK, and > many scientific symbols > are supported that need no further processing support. > At this level, UCS > support is very comparable to ISO 8859 support and the > only significant > difference is that we have now thousands of different > characters available, > that characters can be represented by multibyte > sequences, and that > ideographic Chinese/Japanese/Korean characters require > two terminal > character positions (double-width). > > > --- > Leszek Boroch > eng: KISS! - Keep It > Simple Stupid! > Technical University of Lublin pol: > BUZI! - Bez Udziwnien > Zapisu Idioto! > mailto: bo...@aj... lub: > BUZI! - Bez Urzywania > Zakreconego Interfejsu! > > > > > > _______________________________________________ > Phpvideopro-developers mailing list > Php...@li... > https://lists.sourceforge.net/lists/listinfo/phpvideopr o-developers |
From: Itzchak R. <izz...@qu...> - 2001-09-20 17:04:48
|
Tom, Tom Albers wrote: > Maybe I missed somethings, but the other day I saw some funtions > in php, to convert text to utf (or something). But isn't xml > rather standard and will it not cover this problem? Not that i > like it, but ... Please let's delay this topic for some weeks - it's nothing to be done *before* the release of v0.2, and I have no time for this at the moment :) More important for me: who has the latest code running? Any bugs found? I wanna release it! :-) </izzy> |