Re: [Indic-computing-devel] NCST IndiX examined

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Hello,

I don't understand what is happening. Now sending it again.
:(

Hello,

It seems that my earlier mail was not sent in full. So
sending it again. Sorry for the inconvenience.

--- Joseph Koshy <jk...@Fr...> wrote:
>
>
> Dear Keyur,
>
> ks> When Xlib converts 8-bit string into 16-bit string,
> it
> ks> first send MSB first. This is same as Little-Endian
> UCS-2.
>
> The X11 Protocol definition predates Unicode.  This isn't
> Little-Endian UCS-2, its just a 2-byte encoding of the 8
> bit glyph
> indices.

I don't say that "it is" UCS-2. I say that it is "same as"
UCS-2 (or compatible with UCS-2).

>
> ks> How can client have knowledge about the glyph
> indices?
>
> That is what the encoding field in the long name of X
> fonts is for.
> For Latin fonts this will be `-iso8859-1' meaning the
> font is encoded
> compatibly with the ISO8859-1 character encoding.

As you said, this is really character encoding not font
encoding. Some distinction should be made between
"character" and "glyph".

-----------
According to Unicode standard (see glossary), a character
is
(1) The smallest component of written language that has
semantic value; refers to the abstract meaning and/or
shape, rather than a specific shape (see also glyph),
though in code tables some form of visual representation is
essential for the reader's understanding.
(2) Synonym for abstarct character
(3) Loosely, the basic unit of encoding for the Unicode
character encoding, a 16-bit unit of textual
representation.
(4) Synonym for code value.
(5) The English name for the ideographic written elements
of Chinese origin.

Abstract character : A unit of information used for the
organization, control, or representation of textual data.
(See also character (1, 2))

And glyph has been defined as
(1) An abstract form that represents one or more glyph
images.
(2) A synonym for glyph image. In displaying Unicode
character data, one or more glyphs may be selected to
depict a particular character. These glyphs are selected by
a rendering engine during composition and layout
processing.
---------------

As can be depicted from the above definations, a client
pass "something" that has semantic value, means
"characters". One or more glyphs may be selected to display
a particular character. So client is in no position to
decide upon the glyph indices to be used for a character.
It is totally at the sole discretion of font designer to
select _proper_ glyph(s) for a character. We can't say that
particular glyph should be used for a character.

> You can have fonts that are not indexed by character
> codes and fonts
> that follow different encoding schemes e.g:- hp-roman8.

Can you give me few font formats used in X Window system
which doesn't use mapping table? Even in case of different
encoding like hp-roman8 or font coding like ISFOC, there
should be mapping from these encoding values to the glyph
codes. In case of ISFOC, font glyph encoding matches with
ISFOC encoding.

> The client
> has to select the correct glyph indices in the X text
> drawing calls,
> appropriately.

In XWindow system system client doesn't have direct access
to font resources when fonts are loaded by the font library
interactively with Xserver. Also all the font resources and
security data are kept by the Xserver. Clients can only
send request to Xserver to display a character string or to
get extents of a character string.

>
> If the font's glyph encoding matches the character
> encoding, then an X
> client can just send over the numeric values of
> 'characters' unchanged
> and the correct glyphs will get selected automatically.
> This is what
> you are seeing when you put a "printf()" in
> "XQueryTextExtents()".

I have objection against the word "automatically". The
glyphs are not selected automatically but since glyph codes
and character codes are matched, they are displayed
properly. It is also possible that font designer decides to
use two glyphs "/" and "\" for character "X". In that case
it is the job of mapping tables to do the things properly.
Client will only request to draw glyph for character "X".
It will not send indices for "/" and "\".

> TrueType fonts do have a 'cmap' that maps from character
> codes to
> internal glyph indices. This happens to work in X because
> the X client
> is assuming a font encoding (like iso8859-1) when sending
> over the
> glyph indices and the fonts 'cmap' is setup to map the
> same character
> encoding to its internal layout.

So you are coming to the point. As you said TrueType
characters do have a 'cmap' table that maps from character
codes to internal glyph indices. It means that clients has
to pass character codes to such fonts. And clients do pass
character codes. My stand becomes more clear if you take
example of XDrawString16 or XQueryTextExtents16. In these
functions we use XChar2b structure to pass character codes
(e.g., Unicode). A font may have as many as 500 glyphs. But
we pass values like below.

----
XChar2b str[10];

str[0].byte1 = 0x09; str[0].byte2 = 0x15;
str[1].byte1 = 0x09; str[0].byte2 = 0x30;

XDrawString16(dpy, drawable, gc, x, y, str, 2);
----

Clearly, we are passing Unicode values U+0915 and U+0930
which are Unicode characters "Devanagari Ka" and
"Devanagari Ra" respectively. The glyphs for these
characters may be at position 156 and 183 respectively. We
are not passing values "156" or "183".

> 
> Such "remapping" by TrueType fonts is out of the scope of
> the X
> protocol.
> 
> ks> Please read the first sentence in
> ks>   X Protocol Specification, Glossary, pp 37
> ks> "This request returns the logical extents of the
> ks> specified string of characters in the specified
font".
>               ^^^^^^^^^^^^^^^^^^^^ 
> 
> Agreed, this is poorly worded.  You need to read the
> formal
> definitions of FONT, STRING8 and STRING16 to put the
> definition in
> context. See also the protocol descriptions for
> PolyText{8,16} and
> ImageText{8,16}.

OK. Here are the definations. 

-------
FONT (Page 154)

A font is a matrix of glyphs (typically characters). The
protocol does no translation or interpretation of character
sets. The client simply indicates values used to index
glyph array. A font contains additional metric information
to determine interglyph and interline spacing.
-------

Here "values used to index" doesn't necessarily mean glyph
codes. "Character codes" are also values used to index
glyph array using some mapping table.

---------
(Page 3)
STRING8   -> LISTofCARD8
STRING16  -> LISTofCHAR2B
CHAR2B    -> [byte1, byte2: CARD8] 
BYTE      -> 8-bit value
CARD8     -> 8-bit unsigned integer 
CARD16    -> 16-bit unsigned integer
---------

At no place they have indicated anything about glyph
indices. In fact, Protocol doesn't clearly describe
anything explicitly about the "values" used in the
protocol. The freedom was given for the implementation. X
Window system is not merely an X Protocol but it includes X
library, X Protocol, Xserver, and now Font renderers. It is
totally on the implementation to decide what these "values"
mean. And the developers have decided to use "character
codes" to pass as values in X Protocol.

Regards,
Keyur

__________________________________________________
Do You Yahoo!?
Send FREE Valentine eCards with Yahoo! Greetings!
http://greetings.yahoo.com