Thread: RE: [Waba-spec] A Font counter-counter proposal
Status: Abandoned
Brought to you by:
bornet
From: Isao Y. <IYamashita@Vetronix.com> - 2001-11-27 17:52:22
|
> By code-page, I assume that you're saying that something is displaying > (say) the Cyrillic range of the Unicode glyph-space. But why not just > switch the font automagically when it has to display a given character > Foo outside its range? Is this a Palm problem I'm missing somewhere? > >>> First, let me say I like the word "automagically". I think this would be a good idea. By detecting Unicode range, I can automagically change character sets. The reason for my initial proposal was that I had to match the fonts with proper character sets. This is a requirement for code-page based Win32 environment (WinCE, NT, 2000, XP are Unicode based). > On the other hand, if you have fonts which only work in certain areas, > why not specify this in the name string? As in: new Font("Cyrillic", > Font.BOLD + Font.ITALIC, 18); > > Clue me in, I'm probably not seeing it. > >>> Well, there are multiple fonts available for a given character set. Like there are two default Japanese fonts, "mincho" and "gothic". So this option is not really usable. > Now, there *is* an issue if you want to display Chinese (say) and don't > know which font is capable of doing this. You'd like an equivalent of > Font.forUse(...). One approach is: > > Change my proposed > > public static native String[] getNames(); > > to > > public static native String[] getNames(String script); > > Or instead of deleting it, you could redefine getNames() to be { return > getNames(nil); } That way you can look up all the available fonts for a > given script; of course you'd need to be able to provide such a thing > programmatically (on the Newton we'd have to just hack it and create a > little list of various-language fonts we know about). Programmers > should be informed that if the returned array is empty, they should tell > the user they can't figure out what font to use for a given script, and > perhaps ask the user to pick from a list. > > Instead of defining script in terms of "locale" (which I think is not a > good idea for fonts), instead define it in terms of script name. > Unicode has some standard script names, see > http://www.unicode.org/unicode/reports/tr24/charts/ . Stuff like Latin, > Armenian, Han, Hiragana, etc. If you like you can be more sophisticated > and allow the names to be separated by spaces, and it returns any fonts > which provide all three scripts, as in "Han Hiragana Hangul". If nil is > passed in, or the string is empty, then ALL fonts are returned. > >>> Mmm, sounds like we 'd have to embed many look-up tables somewhere. I'm not sure about this and it doesn't make me feel comfortable, as it makes blow up Waba's code size for not-so important feature. I'd still like to keep this issue to a coder's discretion. > Further, the Font.forUse()... etc. methods should assume that the user's > pre-defined script for his system (Japanese, Latin, etc.) biases the > font choice for the given use. Though I'd strongly suggest that any > such fonts should *also* be able to display Latin. > >>> Only up to ASCII code 127. For instance, in Japanese OS, ASCII code up above 128 are used as flags for double byte encoding. So, I can't mix Japanese characters with alphabets with accent marks (as it is the case for code-page based OS). > The nice feature of this is that there are Unicode scripts which aren't > locale-based: like "Sm", which is math symbols, or "No", which is > special kinds of number symbols. > >>> Ah, sounds good. > _______________________________________________ > Waba-spec mailing list > Wab...@li... > https://lists.sourceforge.net/lists/listinfo/waba-spec |
From: Sean L. <se...@cs...> - 2001-11-28 02:37:31
|
Isao Quoth: > The reason for my initial proposal was that I had to match the=20= > fonts with > proper character sets. This is a requirement for code-page based > Win32 environment (WinCE, NT, 2000, XP are Unicode based). How does _Java_ handle this for Win32 then? It seems to me that rather than this being a Font issue, this is an=20 implementation issue better handled at the native level when a user=20 requests a drawText() (or as I'm proposing rewriting it, a drawString()=20= or drawChars() :-). Your native implementation of drawText() basically=20= gets handed a string of unicode characters, and a font to draw them in. =20= If the font is a "Japanese" font according to your operating system,=20 then you take the unicode characters requested and map them into the=20 appropriate hi-bit escape value ASCII gunk necessary to render them=20 correctly. If the font is a "Cyrillic" font, you do the same thing. It seems to me that the software developer shouldn't even know that this=20= is going on. If he wants an Hangul (Korean) character like =E3=8C=B3, = he writes=20 drawText("\u3333"). Similarly, if he wants a Han (CJK) character like = =E5=95=95=20 then he writes drawText("\u5555"). If the system is presently not=20 drawing with a Hangul font or Han font or whatever, then it either draws=20= a stub character (on the Mac, it's a square box), or it switches=20 temporarily to some Hangul or Han font that's appropriate, draws the=20 character however necessary, and then switches back. [BTW, I'd be very interested in knowing who was capable of viewing the=20= characters I wrote above in their email! MacOS X is a wonderful=20 system.] >> Instead of defining script in terms of "locale" (which I think is not = a >> good idea for fonts), instead define it in terms of script name. >> Unicode has some standard script names, see >> http://www.unicode.org/unicode/reports/tr24/charts/ . Stuff like=20 >> Latin, >> Armenian, Han, Hiragana, etc. If you like you can be more=20 >> sophisticated >> and allow the names to be separated by spaces, and it returns any = fonts >> which provide all three scripts, as in "Han Hiragana Hangul". If nil=20= >> is >> passed in, or the string is empty, then ALL fonts are returned. >> > >>> > Mmm, sounds like we'd have to embed many look-up tables = somewhere. > I'm not sure about this and it doesn't make me feel comfortable, > as it makes blow up Waba's code size for not-so important = feature. > I'd still like to keep this issue to a coder's discretion. Well, if you can't query the OS to get this information, you might have=20= to do it with tables or not at all. The way I was envisioning it, if=20 the system can provide this stuff, it does. If it doesn't, the array=20 that comes back is empty. It would be an optional nicety, but it's not=20= a deal-breaker for me, just an idea. >> Further, the Font.forUse()... etc. methods should assume that the=20 >> user's >> pre-defined script for his system (Japanese, Latin, etc.) biases the >> font choice for the given use. Though I'd strongly suggest that any >> such fonts should *also* be able to display Latin. > > Only up to ASCII code 127. For instance, in Japanese OS, > ASCII code up above 128 are used as flags for double byte = encoding. > So, I can't mix Japanese characters with alphabets with accent = marks > (as it is the case for code-page based OS). Well all right, but the developer isn't writing in ASCII. He's writing=20= in Unicode. So if you get a Unicode character your current font is=20 capable of rendering, you render it. If you get a character you don't=20= know how to render, you either give a filler character, or you figure=20 out how to switch to a different font to render that character and then=20= switch back. Both are fine. But I feel we should avoid any=20 platform-specific features in the core spec. Sean |
From: Sean L. <se...@cs...> - 2001-11-28 03:20:21
|
Guilherme quoth: > Maybe you could adopt what i made in SuperWaba. > My String class has this: > /** The bytes are converted to char and vice-versa using the > CharacterConverter associated in this charConverter member. */ > public static waba.sys.CharacterConverter charConverter; > public String(byte[] value, int offset, int count); > public byte[] getBytes(); I like the general idea of supporting the getBytes() function and String(byte[],int,int) constructor. Especially since we'd need to read and write to streams. And you picked exactly the right ones to support (rather than the other myriad of byte-conversion methods). AND I like the idea of a swap-in coder. But I'm a little queasy about adding a data member to a Java-compliant a class as java.lang.String! This really pushes us away from the Java standard rapidly. So where do you stash a swap-in converter if not String? Maybe the best place would be as a static member of the waba.sys.Convert class. A few other quibbles to otherwise a nice idea: > /** This class is used to correctly handle international character > convertions. > * The default character scheme converter is the 8859-1. If you want > to use > a different one, > * you must extend this class, implementing the bytes2chars and > chars2bytes > methods, and then > * assign the public member of java.lang.String charConverter to use > your > class instead of this default one. In a PDA, ISO-8859-1 really *shouldn't* be the default. The default really should be some form which preserves Unicode values, like UTF-8. Unfortunately, Sun's suggested "default" (if your operating system doesn't have anything better) is indeed ISO-8859-1. I think that was a big mistake on their part, but oh well. If we're going the drop-in coding route, we oughta provide, at the very least, free encoders in the core spec which do ASCII, (more critical than others) UTF-8, plus maybe UTF-16BE, UTF-16LE, and UTF-16. That's all the encodings one's expected to see typically anyway, and those can all be done easily without a lookup table. Kaffe's probably got GNU versions of the encodings if you want to grab 'em. > public class CharacterConverter Converter isn't the right word. This should be an encoder, something like CharacterEncoder, or some such. No big deal though. BTW, there's a standard non-public Java class called java.lang.StringCoding, which has almost identical functions as yours, they're called: static byte[] encode(char[] ca, int off, int len); static char[] decode(byte[] ba, int off, int len); ...maybe those would be better names, just a suggestion. Also, I'm not sure if you should override CharacterEncoder, er, CharacterConverter, or if we should have an interface to extend instead, where the default class extending the CharacterEncoder interface is ISO88591Encoder. Dunno about that. Perhaps it wouldn't buy us anything in terms of memory consumption. Sean |
From: Guilherme C. H. <gu...@us...> - 2001-11-28 11:09:35
|
I'm very happy that finally someone liked my work in this list. Thats why i am disapointed with it: if i needed to implement all changes proposed here, i just had to throw SuperWaba to trash and restart all over again. I think that provide some other codepages is good, and is what i was wondering when i created that methodology. As you suggested, i'll remove the static method from String and place it at convert. regards guich |
From: Sean L. <se...@cs...> - 2001-11-28 03:24:34
|
[Last message tonight, I PROMISE!] Guich quoth: > I'll post a list of my changes on the VM so that you can know what i > done. The problem is that the list is huge (more than 70 changes) and i > cannot precise very well what had changed. Also, currently i'm out of > time > to do that. So, please give me one more week. Guich, the ones I'm particularly interested in are those which can be implemented specifically in the core VM and aren't hard to do -- most bang for the buck, you know! :-) > If you take a look at my current release and compare the > executeMethod > part, you may see many of them. > > The Waba 1.0 timmings are 143010/33340/8860/25940/126460/53250 > (graphics+native method/ loop/ field/ method/ array/ string) > The current release has: 76100/24180/4980/14870/94330/29360. Please compile and run WabaMark and let us know the results on the SuperWaba VM. (http://www.cs.gmu.edu/~sean/newton/waba/dev/WabaMark.java), especially compared to the standard Waba VM on the same handheld. Thanks! Sean |