Hi, John,
Yes, you are largely correct: The hypothetical terminal application
would have to know the specific widths of more characters than current
terminal applications do now.
But I do not think this would have to be entirely font specific : one
could create a de facto or real standard stating, for example that
ARABIC LETTERS SHEEN and SAAD in terminal form take 2 character cells
in all "quantum" fonts.
Initially of course, there would only be one quantum font in existance
-- perhaps one based on DejaVu Mono Sans or DejaVu Mono Sans
Condensed, for example.
Over the years there have been many threads here and there on various
mailing lists regarding the problem of supporting complex scripts on
Unix-style terminals.
I suppose it is fair to say that all the *nix aficionados out there
don't want to give up the command line. The command line is nice. We
all like the command line. Like many good things in life, the command
line is very unobtrusive. To the uninformed observer, the command
line doesn't really look like anything interesting. But once you know
what you can do with the command line, all that hidden unobtrusive
power, it is very addictive.
The only problem is that there still is no terminal that really
handles a large number of the non-Latin scripts well. Mlterm is as
close as I have seen. There are a huge number of applications that
have a command-line interface: MySQL is a good example of one that I
use all the time. Many GUI interfaces are possible for databases like
MySQL, but I still like the simplicity and reliability of the standard
command-line interface, and I hate most of the GUIs.
Now if only people like me could use command-line tools for scripts
like Devanagari, Myanmar, and Tamil, that would be great. Or, as a
former professor of mine is fond of saying, that would be "Too Much!"
So I am merely setting out a set of requirements for this hypothetical
project that I think are very doable ...
Best - Ed
On Feb 8, 2008 3:40 PM, John Karp <johnkarp@gm...> wrote:
> I've never written a terminal, but...
>
> A common use case for terminals is that the application and the
> terminal are on totally separate machines. In order for the
> application to be able to lay out boxes, lines, etc. such that the
> columns align, it needs to have knowledge of how many cells a given
> character is going to take.
>
> With latin fonts, this is trivial, 1 character = 1 cell. Bi-width
> isn't much worse, everything is 1 cell, except for a range of
> characters which is agreed upon to be two.
>
> As I understand your proposal, the number of cells a given character
> has is font-specific, and varies even within script ranges. How is
> this information going to be known by the application? Normally the
> app has ~0 knowledge of the particular font used on the terminal.
>
> -John
>
>
> On 08/02/2008, Ed Trager <ed.trager@gm...> wrote:
> > Hi, everyone,
> >
> > I've decided to forward the following message which I sent to the
> > Unicode discussion list, to this list.
> >
> > The first parts of the message are largely irrelevant to this list,
> > but my response to Sinnathurai Srivas' question #3 about how to fix
> > "rigidly fixed width" systems to handle Unicode is something I would
> > like to toss out to this development community, for what it is worth
> > ...
> >
> > Best -- Ed Trager
> >
> > ---------- Forwarded message ----------
> > From: Ed Trager <ed.trager@gm...>
> > Date: Feb 8, 2008 3:02 PM
> > Subject: Re: minimizing size (was Re: allocation of Georgian letters)
> > To: Unicode Discussion <unicode@un...>
> >
> >
> > Hi, everyone,
> >
> > Just a few brief comments on this thread:
> >
> > >
> > > Having flown halfway around the world to talk to people who for whatever
> > > reasons, both valid and invalid (and not really distinguishing which is
> > > which on their list of concerns), are unhappy with a language encoding
that
> > > in their view doubles or worse the amount of bytes used to store their
> > > language in Unicode, I can tell you that this as very real concern on
some
> > > people's minds.
> > >
> > > True or false, it is on their minds. They can all add and multiply, and
it
> > > certainly looks like a 2x or 3x situation to them.
> > >
> >
> > Of course it is on their minds! Judging from the titles of emails in
> > my spam box, size really does matter. But apparently what humanity
> > really wants to do is MAXIMIZE the size, not minimize it. So a 2x or
> > 3x situation should be good. :-)
> >
> > On Feb 8, 2008 5:52 AM, Sinnathurai Srivas <sisrivas@bl...>
wrote:
> > > 2/
> > > My question was, mostly all proper publishing softwares do not yet
support
> > > complex rendering. How many years since Unicode come into being?
> > > When is this going to be resolved, or do we plan on choosing an
alternative
> > > encoding as Unicode is not working.
> > >
> >
> > Unicode does in fact work very well. Implementing good Unicode
> > support for complex text layout (CTL) scripts like Tamil is
> > achievable. Not sure what "proper publishing software" includes --
> > For example, would that include http://ta.wikipedia.org/ ?
> >
> > From an economic perspective, when the markets in South and Southeast
> > Asia that require complex text layout look enticing enough to the
> > software vendors, then the problem will be solved. Is it possible
> > that rampant piracy of commercial software throughout Asia actually
> > contributes to the problem of poor support for many Asian scripts in
> > heavy-weight commercial software like Adobe InDesign? This question
> > might be a great topic of some student's research paper.
> >
> > Clearly the commercial players like Adobe InDesign and Quark XPress
> > and the non-commercial players like Scribus (http://www.scribus.net/)
> > are all working on providing support for CTL scripts. In this arena,
> > the Open Source players are influenced by a different set of driving
> > criteria than the commercial vendors: Does being Open Source encourage
> > faster development of non-Latin script support? This question might
> > be a great topic for some other student's research paper.
> >
> > In any case, the transparency of development in the Open Source world
> > allows one to find out exactly how things stand. For example, here is
> > the link to Scribus' "Support for Non-Latin Languages" meta-bug page:
> >
> > http://bugs.scribus.net/view.php?id=3965
> >
> > And in the case of Scribus, for example, one is welcome to contribute
> > well-documented test cases (sample Unicode text along with references
> > to fonts that are know to work correctly in other software) which the
> > developers can use for testing the software.
> >
> > > 3/
> > > As for bitmap, I meant the "Rigidly-fixed-width-character" requirements.
> > > At present, the complex rendering (which is not working yet in these
> > > systems) will produce extremly large width glyphs which will not be
> > > accomodated by "rigidly-fixedwidth- requirements. What is the plan to
> > > resolve this?
> > >
> >
> > The only place where "rigidly fixed width" characters are normally required
> > that I can think of is in terminal emulators. Once upon a time I
> > investigated the idea of creating a terminal emulator --along with a
> > bitmap font-- that would support scripts like Myanmar (Burmese),
> > Tamil, etc. (Actually, from time to time, I still return to this
> > idea).
> >
> > In existing terminal emulators, Latin glyphs take up one character
> > cell each, while CJK glyphs are "double-width" and take up 2 character
> > cells each. The GNU Unifont BMP bitmap font originally designed by
> > Roman Czyborra (http://en.wikipedia.org/wiki/GNU_Unifont) provides a
> > good example of how this works: most of the glyphs are 8 pixels wide
> > by 16 pixels high, but the CJK glyphs are 16 pixels wide by 16 pixels
> > high.
> >
> > In the hypothetical system as I had envisioned it, glyphs other than
> > CJK glyphs could also be double-width. And, in fact, why limit
> > ourselves to widths of 1 and 2 character cells? When I was
> > investigating Myanmar, I thought that it actually would be *better* to
> > allow some glyphs to stretch across 3 or even 4 character cells.
> >
> > We can think of this hypothetical terminal emulator as having a
> > cartesian grid and glyphs of all scripts need to fit into discrete
> > "quantum" cells : 1, 2, 3, or 4. (Maybe one could even make an
> > argument for some glyph using up 5 quantum cells?)
> >
> > An experienced font designer (or team of designers) would then take up
> > the challenge of creating a font to use with this terminal emulator.
> > The font need not be a bitmap font -- it could just as easily be a
> > vector font. For the sake of argument, let's say we allow this
> > hypothetical terminal to use vector fonts (i.e., we could just make a
> > special kind of OpenType font which could even have embedded bitmaps
> > if desired).
> >
> > So for the various Latin blocks of Unicode we could start out with a
> > suitable "monospaced" font. In a Latin monospaced font, all letters
> > fit into fixed-width cells so that the advance distances on all glyphs
> > are the same. This obviously requires some special aesthetic
> > compromises, especially on the wide Latin letters like "m" and "w".
> >
> > To this originally "monospaced" font, we would now add additional
> > blocks of Unicode. We could pretty much continue working within our
> > "monospaced" design mantra through many blocks of Unicode -- until, of
> > course, we hit scripts like Devanagari, Tamil, Myanmar, Khmer, and so
> > on. Arabic too. At this point, our originally "monospaced" font
> > becomes no longer "monospaced". Let's give it a new name -- how about
> > "quantized font" or "quantum spaced font"? Or simply "quantum font" ?
> >
> > In this new quantum font, whenever an individual glyph became too
> > horribly "squished" to fit inside one quantum character cell, then we
> > would automatically try a 2-cell approach, and if even that did not
> > work, then go for a 3-or 4-cell approach.
> >
> > As a quick and familiar example, let's use Arabic script. On Linux,
> > the mlterm folks (http://mlterm.sourceforge.net/) have actually
> > produced a "multilingual" terminal that even handles RTL Arabic. This
> > is pretty cool. Mlterm uses GNU unifont for its Arabic glyphs.
> > Arabic in mlterm is readable, which is nice, but it is really ugly.
> > For example, terminal ARABIC LETTER SHEEN ش looks almost unbearably
> > *squished*. Clearly, wide arabic letters like isolated or terminal
> > ARABIC LETTER SHEEN ش or ARABIC LETTER SAAD ص would probably end up
> > looking *much* nicer if we just allowed them to occupy 2 character
> > cells. So, in this quantum font, most Arabic letters would still
> > occupy just one character cell, but a few would occupy up to 2
> > character cells.
> >
> > A similar principle would apply for the creation of the necessary
> > glyphs for scripts like Myanmar and Tamil -- except in these cases
> > there would be some glyphs that would necessarily take up 3 or even 4
> > character cells.
> >
> > Well that's my idea, for what it is worth. I even tried my hand at
> > creating a set of bitmap glyphs for Myanmar which could be added to
> > GNU Unifont. But after wasting a lot of time on this, I realized I
> > did not know how to write a terminal emulator. So, maybe someday I
> > will return to this outlandish project. After I have learned how to
> > write a terminal emulator.
> >
> > - Ed Trager
> > -------------------------------------------------------------------------
> > This SF.net email is sponsored by: Microsoft
> > Defy all challenges. Microsoft(R) Visual Studio 2008.
> > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
> > _______________________________________________
> > DejaVu-fonts mailing list
> > DejaVu-fonts@li...
> > https://lists.sourceforge.net/lists/listinfo/dejavu-fonts
> >
>
|