Share

gnuplot development

Tracker: Patches

5 Extended PostScript Latin1 encoding vector - ID: 1734995
Last Update: Comment added ( sf-robot )

An PostScript encoding vector which is compatible to ISO-8859-1 and Windows
Latin 1 (CP 1252).

It also contains all the extra characters from MacRoman encoding (useful
math symbols!) and some characters which are found in Y&Y Lucida Bright
(dotlessj ff ffi ffl)

The idea is borrowed from TeX Base 1 Encoding (8r).

The code positions 00-1F were chosen to be compatible with some X11 bitmap
fonts, where possible.


Thomas Henlich ( thenlich ) - 2007-06-11 13:08

5

Closed

Accepted

Nobody/Anonymous

None

None

Public


Comments ( 10 )




Date: 2007-07-11 02:20
Sender: sf-robotSourceForge.net Site Admin


This Tracker item was closed automatically by the system. It was
previously set to a Pending status, and the original submitter
did not respond within 14 days (the time period specified by
the administrator of this Tracker).


Date: 2007-06-27 08:17
Sender: mikulik


It would be nice if there is also a table with the characters and their
codes (e.g. png figure) on the web page.
Thomas, can you prepare it?
An example of how to use it can be useful as well.



Date: 2007-06-26 17:39
Sender: sfeamProject Admin


I have placed this prolog file, with credit, on the "contributed scripts
and files" section of the Gnuplot web site. A pointer to this discussion
is included.


Date: 2007-06-14 16:13
Sender: sfeamProject Admin


In essence - no. Consider:
-We won't know "the first 256 distinct characters" until after we plot.
-We don't know what fonts to pull them from (there are no Unicode fonts
for PostScript).
-256 characters are not enough; that's the whole point of Unicode.
-99% of gnuplot usage? Where do you pull that from? Gnuplot has a
significant Japanese user base, for instance, and to encode labels in
Japanese requires access to several thousand character glyphs. But even for
English-speaking gnuplot users, the exercise seems pointless if it doesn't
at least provide access to the common TeX, math, greek, and physical
constant symbols. That spans at least 4 Unicode code pages right there,
and you won't find all of these in any single free PostScript font that I
am aware of. That is why several of our FAQ entries deal with questions
like "How can I get Plank's constant in a PostScript label?".

If you think there is an easy solution, feel free to demonstrate it was a
proof of principle code patch. You could start by having a look at Harald
Harders' patch #1252232, which did something something related to what you
suggest. But even before you look into code, have a look at the problem of
available fonts.


Date: 2007-06-14 15:11
Sender: thenlich


Couldn't we just implement a "poor man's Unicode" PS driver which just
stuffs the first 256 distinct characters from the output into a
dynamically-generated encoding vector? This would probably cover 99% of
gnuplot usage (it isn't a typesetting system, after all).

Of course, there are better solutions, but this one is easy enough to
implement and should be good enough for most users.


Date: 2007-06-13 11:49
Sender: tlecomte


> That leaves PostScript as a major missing piece

For what is worth, if cairopdf eventually gets finished and is satisfying,
we can with very little effort implement "cairops" which is very likely to
have the same utf-8 capabilities as cairopdf, since it would be made of the
same building blocks. I've been told that cairo PS output isn't obfuscated.
Of course, we would lose some of the current possible customization
(prologue, etc.) and this is something to discuss.


Date: 2007-06-12 16:44
Sender: sfeamProject Admin


> "Why not simply call it 8859-1, just for historical reasons?"

Because ISO 8859-1 is an international standard, and your file does not
agree with that standard. OK, it's an extension rather than a
contradiction, but still... So far as I can tell from a quick look at the
Wikipedia entry for Microsoft's code page 1252, it doesn't exactly match
that either. But cp1252 is not an actual standard, so maybe that doesn't
matter. How about if we put your prologue file in the "contrib" directory
as cp1252.ps?

> An even better solution would be Unicode support in gnuplot.

Then you're in luck, because gnuplot already supports Unicode, via UTF-8.

See, for example
http://gnuplot.sourceforge.net/demo_svg/utf8text.html

In gnuplot version 4.2, UTF-8 works well in the x11, wxt, svg, png, jpeg,
and (I am told but haven't tried) TeX-based terminal drivers. Some minor
glitches are fixed in current CVS, and UTF-8 will be available also for pdf
output as soon as the new cairo-based pdf terminal driver (Patch #1651015)
is slotted in to replace the existing one.

That leaves PostScript as a major missing piece, but that is because the
PostScript language itself was not designed to handle multi-byte encoding
schemes. I am not aware of any reasonable solution for this short of the
elaborate language-specific multipage remappings used by ghostscript, and
these do not work all that well anyhow. I think the answer will have to be
"if you need PostScript output, please use the new PDF terminal and then
convert from PDF to PostScript externally".


Date: 2007-06-12 12:11
Sender: thenlich


Why not simply call it 8859-1, just for historical reasons?
It contains all characters from Latin1 in the same code positions, plus
some more.
Just as the original gnuplot 8859-1 encoding vector does.

But I agree, a more correct name would be 'cp1252'.
Since CP1252 is a superset of Latin1, that would actually make Latin1
superfluous.

An even better solution would be Unicode support in gnuplot.



Date: 2007-06-11 20:39
Sender: sfeamProject Admin


You cannot call this encoding 8859-1, or Latin1, since it obviously
isn't.
Please give it a different name and document where this encoding vector is
described. Is it identical to "TeX Base 1 Encoding (8r)"? Are there ttf
fonts or x11 fonts available that use it, so we could support it in other
terminal types other that postscript?

I don't think we want to add customized encodings to the gnuplot command
set,
at least not unless/until someone can work out how to have it apply to
most terminal types. But as of version 4.2 individual users have the
option of maintaining a private prologue.ps file, so they have the option
of including this as the base encoding for PostScript output if it is
installed locally.

The file gp-test.plt is not needed, as the current "charset.dem" in the
standard demo directory exercises the full encoding.


Date: 2007-06-11 13:10
Sender: thenlich


File Added: gp-test.plt


Log in to comment.




Attached File ( 1 )

Filename Description Download
8859-1.ps Extended PostScript Latin1 encoding vector Download

Changes ( 8 )

Field Old Value Date By
status_id Pending 2007-07-11 02:20 sf-robot
close_date 2007-06-26 17:39 2007-07-11 02:20 sf-robot
resolution_id None 2007-06-26 17:39 sfeam
status_id Open 2007-06-26 17:39 sfeam
close_date - 2007-06-26 17:39 sfeam
File Deleted 232613: 2007-06-12 12:11 thenlich
File Added 232613: gp-test.plt 2007-06-11 13:10 thenlich
File Added 232612: 8859-1.ps 2007-06-11 13:08 thenlich