Re: [Indic-computing-devel] NCST IndiX examined
Status: Alpha
Brought to you by:
jkoshy
From: Keyur S. <key...@ya...> - 2002-02-12 10:02:13
|
Hello, I don't understand what is happening. Now sending it again. :( Hello, It seems that my earlier mail was not sent in full. So sending it again. Sorry for the inconvenience. --- Joseph Koshy <jk...@Fr...> wrote: > > > Dear Keyur, > > ks> When Xlib converts 8-bit string into 16-bit string, > it > ks> first send MSB first. This is same as Little-Endian > UCS-2. > > The X11 Protocol definition predates Unicode. This isn't > Little-Endian UCS-2, its just a 2-byte encoding of the 8 > bit glyph > indices. I don't say that "it is" UCS-2. I say that it is "same as" UCS-2 (or compatible with UCS-2). > > ks> How can client have knowledge about the glyph > indices? > > That is what the encoding field in the long name of X > fonts is for. > For Latin fonts this will be `-iso8859-1' meaning the > font is encoded > compatibly with the ISO8859-1 character encoding. As you said, this is really character encoding not font encoding. Some distinction should be made between "character" and "glyph". ----------- According to Unicode standard (see glossary), a character is (1) The smallest component of written language that has semantic value; refers to the abstract meaning and/or shape, rather than a specific shape (see also glyph), though in code tables some form of visual representation is essential for the reader's understanding. (2) Synonym for abstarct character (3) Loosely, the basic unit of encoding for the Unicode character encoding, a 16-bit unit of textual representation. (4) Synonym for code value. (5) The English name for the ideographic written elements of Chinese origin. Abstract character : A unit of information used for the organization, control, or representation of textual data. (See also character (1, 2)) And glyph has been defined as (1) An abstract form that represents one or more glyph images. (2) A synonym for glyph image. In displaying Unicode character data, one or more glyphs may be selected to depict a particular character. These glyphs are selected by a rendering engine during composition and layout processing. --------------- As can be depicted from the above definations, a client pass "something" that has semantic value, means "characters". One or more glyphs may be selected to display a particular character. So client is in no position to decide upon the glyph indices to be used for a character. It is totally at the sole discretion of font designer to select _proper_ glyph(s) for a character. We can't say that particular glyph should be used for a character. > You can have fonts that are not indexed by character > codes and fonts > that follow different encoding schemes e.g:- hp-roman8. Can you give me few font formats used in X Window system which doesn't use mapping table? Even in case of different encoding like hp-roman8 or font coding like ISFOC, there should be mapping from these encoding values to the glyph codes. In case of ISFOC, font glyph encoding matches with ISFOC encoding. > The client > has to select the correct glyph indices in the X text > drawing calls, > appropriately. In XWindow system system client doesn't have direct access to font resources when fonts are loaded by the font library interactively with Xserver. Also all the font resources and security data are kept by the Xserver. Clients can only send request to Xserver to display a character string or to get extents of a character string. > > If the font's glyph encoding matches the character > encoding, then an X > client can just send over the numeric values of > 'characters' unchanged > and the correct glyphs will get selected automatically. > This is what > you are seeing when you put a "printf()" in > "XQueryTextExtents()". I have objection against the word "automatically". The glyphs are not selected automatically but since glyph codes and character codes are matched, they are displayed properly. It is also possible that font designer decides to use two glyphs "/" and "\" for character "X". In that case it is the job of mapping tables to do the things properly. Client will only request to draw glyph for character "X". It will not send indices for "/" and "\". > TrueType fonts do have a 'cmap' that maps from character > codes to > internal glyph indices. This happens to work in X because > the X client > is assuming a font encoding (like iso8859-1) when sending > over the > glyph indices and the fonts 'cmap' is setup to map the > same character > encoding to its internal layout. So you are coming to the point. As you said TrueType characters do have a 'cmap' table that maps from character codes to internal glyph indices. It means that clients has to pass character codes to such fonts. And clients do pass character codes. My stand becomes more clear if you take example of XDrawString16 or XQueryTextExtents16. In these functions we use XChar2b structure to pass character codes (e.g., Unicode). A font may have as many as 500 glyphs. But we pass values like below. ---- XChar2b str[10]; str[0].byte1 = 0x09; str[0].byte2 = 0x15; str[1].byte1 = 0x09; str[0].byte2 = 0x30; XDrawString16(dpy, drawable, gc, x, y, str, 2); ---- Clearly, we are passing Unicode values U+0915 and U+0930 which are Unicode characters "Devanagari Ka" and "Devanagari Ra" respectively. The glyphs for these characters may be at position 156 and 183 respectively. We are not passing values "156" or "183". > > Such "remapping" by TrueType fonts is out of the scope of > the X > protocol. > > ks> Please read the first sentence in > ks> X Protocol Specification, Glossary, pp 37 > ks> "This request returns the logical extents of the > ks> specified string of characters in the specified font". > ^^^^^^^^^^^^^^^^^^^^ > > Agreed, this is poorly worded. You need to read the > formal > definitions of FONT, STRING8 and STRING16 to put the > definition in > context. See also the protocol descriptions for > PolyText{8,16} and > ImageText{8,16}. OK. Here are the definations. ------- FONT (Page 154) A font is a matrix of glyphs (typically characters). The protocol does no translation or interpretation of character sets. The client simply indicates values used to index glyph array. A font contains additional metric information to determine interglyph and interline spacing. ------- Here "values used to index" doesn't necessarily mean glyph codes. "Character codes" are also values used to index glyph array using some mapping table. --------- (Page 3) STRING8 -> LISTofCARD8 STRING16 -> LISTofCHAR2B CHAR2B -> [byte1, byte2: CARD8] BYTE -> 8-bit value CARD8 -> 8-bit unsigned integer CARD16 -> 16-bit unsigned integer --------- At no place they have indicated anything about glyph indices. In fact, Protocol doesn't clearly describe anything explicitly about the "values" used in the protocol. The freedom was given for the implementation. X Window system is not merely an X Protocol but it includes X library, X Protocol, Xserver, and now Font renderers. It is totally on the implementation to decide what these "values" mean. And the developers have decided to use "character codes" to pass as values in X Protocol. Regards, Keyur __________________________________________________ Do You Yahoo!? Send FREE Valentine eCards with Yahoo! Greetings! http://greetings.yahoo.com |