Share

gnuplot development

Tracker: Patches

8 UTF-8 support in PostScript driver - ID: 1746352
Last Update: Comment added ( sfeam )

Related to the discussion in patch [ 1734995 ] Extended PostScript Latin1
encoding vector, here is an experimental patch for UTF-8 support in the
PostScript driver. It supports PostScript Type1 fonts which use glyph names
according to the Adobe Glyph List For New Fonts.


Thomas Henlich ( thenlich ) - 2007-07-02 09:19

8

Closed

Accepted

Nobody/Anonymous

None

None

Public


Comments ( 15 )




Date: 2008-02-19 06:36
Sender: sfeamProject Admin


This patchset is now in CVS.
If further work is proposed, please start a new tracker item.


Date: 2007-11-07 23:09
Sender: sfeamProject Admin


This particular font (Arial MT) is quite large because it includes many
unicode code pages. That is exactly why I chose it for testing. You can of
course choose a much smaller font, so long as it contains the character
glyphs you are
interested in.

If your postscript printer (or display program) knows about the font on
its own, then there is no need to bundle it into the *.ps file via the
"fontfile..." option.

For example, my ghostscript installation knows about the Computer Modern
UTF fonts that are packaged for use with TeX. So I can run the demos
using
set term post font 'CMUSansSerif-Oblique'
set encoding utf8
load 'utf8.dem'
I've attached the output of this run here, because it is small. But of
course it won't do you much good on a printer that doesn't know where to
find the Computer Modern fonts. And that particular font doesn't contain
the math symbols, which must be called something else.
File Added: utf8_CMUSansSerif-Oblique.ps


Date: 2007-11-07 21:23
Sender: mikulik


I've found the problem. I have used the "set term postscript ..." without
the "fontfile '...'" option, e.g.
set term post color font "ArialMT"

Then it does not work (and gnuplot is silent). With the fontfile option
and ttf2pt1 program, it works. However, the postscript file is very big (it
adds more than 600 KB). Cannot this amount be reduced?



Date: 2007-11-07 20:05
Sender: sfeamProject Admin


This patch has no effect on drivers other than PostScript.

Could you please upload somewhere the PostScript output you get from
running the demos after applying the patch? Unfortunately it's too big to
attach here. I have placed a copy of the correct output on my web site

http://skuld.bmsc.washington.edu/people/merritt/gnuplot/index.html#utf8

Perhaps by comparing the two we can figure out why it is not working on
your system.


Date: 2007-11-07 17:45
Sender: mikulik


I tried it -- the two utf8 demos, and some Czech text. It works in wxt and
x11.

In postscript, it does not show the text correctly.



Date: 2007-11-06 20:06
Sender: sfeamProject Admin


More bugfixes; more code cleanup.
This version passes all the tests I have thought to try, other than CJK
fonts. Some additional optimization of the output PostScript code is
possible, particularly in enhanced text mode. But that can come later.

I plan to add this to CVS in the near future.
File Added: utf-06nov2007.patch


Date: 2007-11-05 23:22
Sender: sfeamProject Admin


I haven't forgotten about this patchset. In fact I put a high priority on
it, but I haven't found much time to look at it. Here is an updated
version.

1) Bug fix: tests for escaped characters were performed out of order
2) Code cleanup: lighter weight test for presence of 8-bit characters
3) Implement UTF-8 support for enhanced text mode

Still to be done:
- Consolidate some code shared by PS_put_text and ENHPS_put_text
- Dynamic string handling for ENHps_opensequence
- Centering of multibyte text is a bit inconsistent
- More error checks
- Figure out why none of this works for CJK unicode fonts (I think the
problem is in ttf2pf1 rather than in the gnuplot code).
File Added: utf-05nov2007.patch


Date: 2007-07-05 19:56
Sender: sfeamProject Admin


> I uploaded a mapping table which you will need.
> Use "ttfpt1 -L aglfn.map" to convert

OK. I've tried this, using the Sazanami and Arial unicode fonts.

I tried, for example:
ttf2pt1 -L ../aglfn.map -e ./arialuni.ttf ./arialuni
gnuplot>
set term post fontfile "./arialuni.pfa" font "ArialUnicodeMS"
set output 'arialuni_utf8.ps'
set encoding utf8
load 'utf8.dem'
set term png font "./arialuni.ttf"
set output 'arialuni_utf8.png'
load 'utf8.dem'

The postscript output correctly contained several of the math symbols from
code page 34, but only a few of them. It also contained the Hebrew
alphabetic characters. It did not, however, handle the majority of the math
symbol or any of the Japanese glyphs (not even the alphabetic ones).

I can confirm that the various fonts do, in fact, contain many of the
desired math and Japanese glyphs by comparing the *.ps output and the *.png
output. Gnuplot's png terminal also uses freetype2 (via libgd) for font
rendering, so the problem is not a lack of support in freetype2.

So it seems the pathway via ttf2pt1 is still lacking some key pieces. Do
you have any further suggestions?



Date: 2007-07-05 13:02
Sender: thenlich


Some fonts (e. g. Free UCS Outline Fonts) do not conform to Adobe's
specification on glyph names
(http://www.adobe.com/devnet/opentype/archives/glyph.html#6). Use this
mapping table for ttf2pt1 to fix the glyph names for such fonts.
File Added: aglfn.map


Date: 2007-07-03 06:33
Sender: thenlich


Cannot reproduce your bug with Dejavu Fonts 2.18 and Ghostscript 8.54.

Are you sure you used the "-a" option to ttf2pt1?

I also cannot follow your hypothesis about the glyph table (in the pfa
file?) being too big. I know of no such restriction. The only restriction
is the length of the encoding vector (is that what you were referring to?):
256. Since my patch uses the glyphshow operator for characters > 255 it is
independent of the encoding table.


Date: 2007-07-03 05:18
Sender: sfeamProject Admin


I've experimented a bit more, using the font conversion mechanism already
supported in gnuplot (via external utility ttf2pt1). I ran another UTF
test through the patched postscript driver via the following set of
commands:

set term post fontfile "/usr/share/ttfonts/verdana.ttf" font "Verdana"
set output "verdana_utf.ps"
set encoding utf
load "utftext.dem"

Unfortunately, the resulting file verdana_utf.ps is too large to upload
here, but I have placed a copy at
<http://skuld.bmsc.washington.edu/~merritt/gnuplot/verdana_utf.ps>

You can see that it successfully picks up greek characters from Unicode
code page 3 and picks up cyrillic characters from code page 4. That is
more than you get in the standard PostScript fonts, but not very many
altogether. If I try the same thing with a larger unicode ttf font, like
DejaVuSerif-Roman.ttf, then ghostscript chokes on the resulting *.ps file.
I hypothesize that this is because the converted glyph table is bigger than
256, but I can't think of an easy way to test that.


Date: 2007-07-02 17:47
Sender: sfeamProject Admin


That sounds great, but I don't understand (not an unusual situation when
it comes to fonts).

I was under the impression that Type1 fonts were limited to 255 glyphs.
That is confirmed on the web page you point to, which says:
"You may have noticed the lack of PostScript Type 1 (.pfa/.pfb) font
files.
Type 1 format does not support large (> 256) encoding vectors, so they
can
not be used with ISO 10646 encoding."

So how does one actually obtain or create a font that can be used with
this patch? Is that where the "Type 42" comes into the picture?


Date: 2007-07-02 17:25
Sender: thenlich


Additional description:

With this patch, the PostScript driver supports all UTF-8 characters (the
full range of Unicode) under the following conditions:
- The font contains the required glyphs
- The glyph names in the font follow Adobe's recommendation on glyph
names: http://www.adobe.com/devnet/opentype/archives/glyph.html#6

As the standard 35 PS fonts do not contain a wide range of Unicode
characters, those are probably not very useful for this.

Some free fonts which can be used are the Free UCS Outline Fonts:
http://www.nongnu.org/freefont/ (need to convert from TTF to Type 1 or Type
42 font first).

The patch is experimental and does not yet support enhanced text mode
(should be easy to add though).



Date: 2007-07-02 16:49
Sender: sfeamProject Admin


File Added: utf8text.png


Date: 2007-07-02 16:48
Sender: sfeamProject Admin


Could you please provide a description of which glyphs (in terms of
Unicode code pages) are included here? Is it just code pages 0 and 1?
That would be in the spirit of the extended Latin1 patch [1734995], but it
would be very misleading to call that "UTF-8" without further explanation.


I tried the patch on the utf8 version of gnuplot's enhanced text demo
(attached below), and it did not handle the math symbols correctly. This
demo may be useful as a test case for further work on UTF-8 support.
File Added: utf8text.dem


Log in to comment.




Attached Files ( 6 )

Filename Description Download
gnuplot-post-utf8-20070702.patch Download
utf8text.dem UTF-8 version of gnuplot's enhanced text demo Download
utf8text.png UTF-8 version of enhanced text demo (PNG output) Download
aglfn.map ttf2pt1 mapping table to make fonts AGLFN compliant Download
utf-06nov2007.patch Updated version with UTF8 enhanced text mode Download
utf8_CMUSansSerif-Oblique.ps utf8.ps using font CMUSansSerif-Oblique Download

Changes ( 12 )

Field Old Value Date By
close_date - 2008-02-19 06:36 sfeam
resolution_id None 2008-02-19 06:36 sfeam
status_id Open 2008-02-19 06:36 sfeam
File Added 253341: utf8_CMUSansSerif-Oblique.ps 2007-11-07 23:09 sfeam
File Added 253116: utf-06nov2007.patch 2007-11-06 20:06 sfeam
File Deleted 252957: 2007-11-06 20:06 sfeam
priority 5 2007-11-05 23:22 sfeam
File Added 252957: utf-05nov2007.patch 2007-11-05 23:22 sfeam
File Added 235776: aglfn.map 2007-07-05 13:02 thenlich
File Added 235400: utf8text.png 2007-07-02 16:49 sfeam
File Added 235399: utf8text.dem 2007-07-02 16:48 sfeam
File Added 235311: gnuplot-post-utf8-20070702.patch 2007-07-02 09:19 thenlich