Re: [Mlterm-dev-en] mlterm, tibetan, and bitmap fonts?

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

On Mon, 2005-12-19 at 15:45 -0500, Rich Felker wrote:
> On Mon, Dec 19, 2005 at 01:30:47AM +0900, mi...@mi... wrote:
> > Hi. 
> > 
> > I've tried to see Tibetan text by following procedure.
> > Please let me know you have used some different way or
> > you are using another font. 
> > 
> > 1. install "Tibetan Machine Uni" from www.thdl.org and register the font to 
> > fontconfig library. 
> > 
> > 2. create ~/.mlterm/aafont and add:
> > ISO10646_UCS4_1=Tibetan Machine Uni-iso10646-1; 
> > 
> > 3. run mlterm with anti-alias enabled:
> > mlterm -A 
> > 
> > 4. display a Tibetan web page by w3m on mlterm:
> > w3m "http://www.thdl.org/xml/show.php?xml=test/tibetnew/thdlhp.xml&lng=tib" 
> 
> I finally succeeded in getting mlterm to display Tibetan on my laptop,
> but it's horribly misrendered. Here's a screenshot:
> 
> http://brightrain.aerifal.cx/~dalias/tibetan_misrender.png

You have to enable "variable column width" and Xft support
to render some character as zero-width.
Under the configuration, simple over-striking with vertical
shift can be performed.

What I saw was:
http://mistfall.net/minami/tmp/tibetan_w3m.png

which is not looks so bad to me.

> I also got around to reading the mlterm source, and it seems that
> there's no effort made to position combining characters; they're just
> blindly displayed relative to the same origin as the base character
> with overstrike.

For combined character in Arabic, mlterm perform combining by
converting a group of non-combined UCS4 codes to a combined form.
Since they cannot be rendered simply vertically stacking each glyphs.

The code and its conversion table is in ml_shape.c.

For Tibetan, however, the same approach cannot be used because
combined form of Tibetan glyphs are not registered in Unicode.

> With that in mind, I'm trying to decide what the best way to fix this
> is, short of having mlterm use FreeType directly. I could either:
> 
> - build some sort of tables to load in mlterm that tell it how to
>   correctly combine characters in a given font.
> - make an artificial encoding of some sort to allow access to the
>   precombined glyphs.
> 
> Personally I'm of the opinion that, except for accents on latin
> characters and other simple diacritics, automatic generation of
> combined glyphs will never look great, so I think we need some sort of
> solution for using precombined glyphs. If there were an artificial
> non-unicode encoding, then mlterm could get to the glyphs that way
> similarly to how it can use legacy-encoded Japanese, etc. fonts even
> in unicode mode. And of course it would be a big bonus if precombined
> bitmap glyphs could be used as well.

The plan I had was:

By assuming specific font, the conversion table can be hard-coded.
i.e. when the glyph ordering of the font to be used is known,
we can write conversion rules like following:
0x0f40, 0x0f72 => glyph ID xxxx
0x0f40, 0x0f7f => glyph ID yyyy

With Xft, drawing for non USASCII/ISO8859-1 characters are processed in 
xwindow/x_window.c:x_window_xft_draw_string32().
the function takes an array of UCS4 codes like 0x0f40, 0x0f72... .

So it may be possible to hack the function to watch input
and if the input sequence were Tibetan chars to be combined,
intercept them and call XftDrawGlyphs() instead.

... and we can replace the converter from UCS4 to glyph ID 
to be able to handle any font using libotf/freetype/ or something.

Unfortunately, I don't have enough time to do this for a while.
Patches are welcome, of course.

--
MINAMI Hirokazu <mi...@mi...>