Thread: [sinhala-technical] #defines for Sinhala codepoints
Brought to you by:
aratnaweera,
harshula
From: Harshula <har...@gm...> - 2008-04-07 16:48:23
Attachments:
keysymdef.h.fullname
keysymdef.h.phonetic
|
Hi, I've attached two files, both incomplete, that I was working on but gave up on in 2005. It will be great to get some input from others. REQUIREMENTS ============ 1) The #defines should be as short/concise as possible. 2) Try and make the #defines visually unambiguous. 3) We have to get it right the first time. These are #defines, so once we define them, it's unlikely we'll be allowed to change them. 4) Use ASCII characters. ISSUES ====== a) I abandoned the fullnames version because they were ridiculously long. b) I then used the Unicode chart's shortnames, but how should we differentiate between an independent vowel and a dependent vowel? - Using case sensitivity to differentiate might be frowned upon. - I tried adding "_pilla" for D/Vs but that made the name longer. - I could add a suffix of "_p" for D/Vs. - How should we name vocalic I/Vs and D/Vs? cya, # |
From: nidujay <ni...@gm...> - 2008-04-07 22:02:17
|
Hi, Have a look at the definitions I used for the SCIM Wijesekera layout (see layout.h). It's quite close to the schemes you have come up with. I think it meets your requirements... Also note the compound definitions. They accomplish 2 things: 1. Testing for compound characters is easy (test 1 bit). 2. The low (n) bits can be used in a lookup table (see parser.cpp for what I mean). දුෂාර On 08/04/2008, Harshula <har...@gm...> wrote: > > Hi, > > I've attached two files, both incomplete, that I was working on but gave > up on in 2005. It will be great to get some input from others. > > REQUIREMENTS > ============ > > 1) The #defines should be as short/concise as possible. > 2) Try and make the #defines visually unambiguous. > 3) We have to get it right the first time. These are #defines, so once > we define them, it's unlikely we'll be allowed to change them. > 4) Use ASCII characters. > > ISSUES > ====== > > a) I abandoned the fullnames version because they were ridiculously > long. > > b) I then used the Unicode chart's shortnames, but how should we > differentiate between an independent vowel and a dependent vowel? > - Using case sensitivity to differentiate might be frowned upon. > - I tried adding "_pilla" for D/Vs but that made the name longer. > - I could add a suffix of "_p" for D/Vs. > - How should we name vocalic I/Vs and D/Vs? > > cya, > # > > ------------------------------------------------------------------------- > This SF.net email is sponsored by the 2008 JavaOne(SM) Conference > Register now and save $200. Hurry, offer ends at 11:59 p.m., > Monday, April 7! Use priority code J8TLD2. > > http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone > _______________________________________________ > sinhala-technical mailing list > sin...@li... > https://lists.sourceforge.net/lists/listinfo/sinhala-technical > > > |
From: Harshula <har...@gm...> - 2008-04-09 17:42:49
|
Hi Dushara, On Tue, 2008-04-08 at 08:02 +1000, nidujay wrote: > Hi, > > Have a look at the definitions I used for the SCIM Wijesekera layout > (see layout.h). layout.h: -------------------------------------- #define SCH_KAYANNA 0x9A ... #define SCH_DPAA 0xD6 ... #define SCH_KOMBUGAYANU 0xDE -------------------------------------- > It's quite close to the schemes you have come up with. I'm leaning towards something like: #define XK_Sinhala_ka #define XK_Sinhala_uu_p #define XK_Sinhala_au_p > I think it meets your requirements... 1) You use a short "SCH_" prefix, but I'll probably need to use a longer "XK_Sinhala_" prefix to be consistent with the other #defines. Which means having "YANNA" as a suffix for all the consonants is a problem. I should check whether we'd be allowed to use "XK_si_" or "XK_sinh_" as the prefix. 2) Something like "DPAA", is not obvious for those unfamiliar with the Sinhala codepage. Whereas "UU" is quite obvious. 3) Using "KOMBUGAYANU" is much longer than using the phonetic "AU". If we name the #defines phonetically, it also means it will be intuitive for non-Sinhala speakers. So I'm leaning towards phonetic naming. But what should we use to denote a dependent-vowel/pilla? a) _pilla b) _p c) _dv cya, # |
From: nidujay <ni...@gm...> - 2008-04-13 09:48:21
|
Hi again, I was never considering the prefix you used. I was saying that I can't > include "YANNA" like you did because the X11 prefix is too long. Gotcha, > #define XK_Sinhala_gayanukitta > > #define XK_Sinhala_diga_gayanukitta > > > I could do: > XK_Sinhala_vl_p > XK_Sinhala_vll_p > > But I'll have a look at the LTRL suggestions. Works for me. > One thing I did notice was that this caters for the phonetic key > > layout. In the විජේසේකර keyboard, ෛ etc doesn't appear. On the other > > hand, the phonetic keyboard doesn't contain ළු, yans, rep, raka, etc. > > Seeing that the විජේසේකර keyboard is the 'standard' one - at least > > it's in the standard, should we include these keys as well? > > > These #defines have absolutely nothing to do with input methods, it > simply maps to the Sinhala code chart: > http://unicode.org/charts/PDF/U0D80.pdf An excerpt from keysimdef.h ... * The "X11 Window System Protocol" standard defines in Appendix A the * keysym codes. These 29-bit integer values identify characters or * functions associated with each key (e.g., via the visible * engraving) of a keyboard layout. This file assigns mnemonic macro * names for these keysyms. ... This is what lead me to believe that there should be definitions to match all the keys (including ළු, ්ය. , etc). දුෂාර > > On 10/04/2008, Harshula <har...@gm...> wrote: > > Hi Dushara, > > > > > > On Tue, 2008-04-08 at 08:02 +1000, nidujay wrote: > > > Hi, > > > > > > Have a look at the definitions I used for the SCIM > > Wijesekera layout > > > (see layout.h). > > > > > > layout.h: > > -------------------------------------- > > #define SCH_KAYANNA 0x9A > > ... > > #define SCH_DPAA 0xD6 > > ... > > #define SCH_KOMBUGAYANU 0xDE > > -------------------------------------- > > > > > > > It's quite close to the schemes you have come up with. > > > > > > I'm leaning towards something like: > > #define XK_Sinhala_ka > > #define XK_Sinhala_uu_p > > #define XK_Sinhala_au_p > > > > > > > I think it meets your requirements... > > > > > > 1) You use a short "SCH_" prefix, but I'll probably need to > > use a longer > > "XK_Sinhala_" prefix to be consistent with the other #defines. > > Which > > means having "YANNA" as a suffix for all the consonants is a > > problem. I > > should check whether we'd be allowed to use "XK_si_" or > > "XK_sinh_" as > > the prefix. > > > > 2) Something like "DPAA", is not obvious for those unfamiliar > > with the > > Sinhala codepage. Whereas "UU" is quite obvious. > > > > 3) Using "KOMBUGAYANU" is much longer than using the phonetic > > "AU". > > > > If we name the #defines phonetically, it also means it will be > > intuitive > > for non-Sinhala speakers. So I'm leaning towards phonetic > > naming. But > > what should we use to denote a dependent-vowel/pilla? > > > > a) _pilla > > b) _p > > c) _dv > > > > > > cya, > > # > > |
From: Harshula <har...@gm...> - 2008-04-13 10:42:30
|
Hi Dushara, On Thu, 2008-04-10 at 08:44 +1000, nidujay wrote: > Hi Harshula, > > 1) I didn't mean for you to consider the prefix. I was never considering the prefix you used. I was saying that I can't include "YANNA" like you did because the X11 prefix is too long. > 2) & 3) If we're going to make it easy for non native speakers, the > name for the pilla prefix is immaterial. In that case, I'd go with > _p. The point of (2) & (3) was to show that a phonetic naming scheme would avoid cryptic (2) names and long (3) names. > Mind you, going down the phonetic path means, the Unicode character > names are not exactly in synch with the key symbols. But maybe it > doesn't matter (I guess there'll be a comment to indicate the full > name anyway). Commented on further below ... > Following your scheme, maybe use: > > XK_Sinhala_vr_p instead of XK_Sinhala_gaettapilla > XK_Sinhala_vrr_p = XK_Sinhala_diga_gaettapilla For the independent vowels I had already used _vr and _vrr but I think LTRLers had some good suggestions on transliteration of vocalic vowels a while back, need to find the emails. > XK_Sinhala_ai_p = XK_Sinhala_kombu_deka > XK_Sinhala_au_p = XK_Sinhala_kombuva_haa_gayanukitta That's the plan. > Have to think about these ones... > > #define XK_Sinhala_gayanukitta > #define XK_Sinhala_diga_gayanukitta I could do: XK_Sinhala_vl_p XK_Sinhala_vll_p But I'll have a look at the LTRL suggestions. > Also... > > #define XK_Sinhala_nyya 0x1000d83 /* U+0D83 SINHALA > YANNA */ > ^^^^^ shouldn't this be nyja? Thanks, I've fixed it. There were one or two other problems too. > One thing I did notice was that this caters for the phonetic key > layout. In the විජේසේකර keyboard, ෛ etc doesn't appear. On the other > hand, the phonetic keyboard doesn't contain ළු, yans, rep, raka, etc. > Seeing that the විජේසේකර keyboard is the 'standard' one - at least > it's in the standard, should we include these keys as well? These #defines have absolutely nothing to do with input methods, it simply maps to the Sinhala code chart: http://unicode.org/charts/PDF/U0D80.pdf cya, # > On 10/04/2008, Harshula <har...@gm...> wrote: > Hi Dushara, > > > On Tue, 2008-04-08 at 08:02 +1000, nidujay wrote: > > Hi, > > > > Have a look at the definitions I used for the SCIM > Wijesekera layout > > (see layout.h). > > > layout.h: > -------------------------------------- > #define SCH_KAYANNA 0x9A > ... > #define SCH_DPAA 0xD6 > ... > #define SCH_KOMBUGAYANU 0xDE > -------------------------------------- > > > > It's quite close to the schemes you have come up with. > > > I'm leaning towards something like: > #define XK_Sinhala_ka > #define XK_Sinhala_uu_p > #define XK_Sinhala_au_p > > > > I think it meets your requirements... > > > 1) You use a short "SCH_" prefix, but I'll probably need to > use a longer > "XK_Sinhala_" prefix to be consistent with the other #defines. > Which > means having "YANNA" as a suffix for all the consonants is a > problem. I > should check whether we'd be allowed to use "XK_si_" or > "XK_sinh_" as > the prefix. > > 2) Something like "DPAA", is not obvious for those unfamiliar > with the > Sinhala codepage. Whereas "UU" is quite obvious. > > 3) Using "KOMBUGAYANU" is much longer than using the phonetic > "AU". > > If we name the #defines phonetically, it also means it will be > intuitive > for non-Sinhala speakers. So I'm leaning towards phonetic > naming. But > what should we use to denote a dependent-vowel/pilla? > > a) _pilla > b) _p > c) _dv > > > cya, > # |
From: Harshula <har...@gm...> - 2008-04-13 12:24:24
|
Hi Dushara, On Sun, 2008-04-13 at 19:48 +1000, nidujay wrote: > An excerpt from keysimdef.h > > ... > * The "X11 Window System Protocol" standard defines in > Appendix A the > * keysym codes. These 29-bit integer values identify > characters or > * functions associated with each key (e.g., via the visible > * engraving) of a keyboard layout. This file assigns mnemonic > macro > * names for these keysyms. > ... > This is what lead me to believe that there should be definitions to > match all the keys (including ළු, ්ය. , etc). Just below those comments are these: ----------------------------------------------------------- * Where a keysym corresponds one-to-one to an ISO 10646 / Unicode * character, this is noted in a comment that provides both the U+xxxx * Unicode position, as well as the official Unicode name of the * character. * * Where the correspondence is either not one-to-one or semantically * unclear, the Unicode position and name are enclosed in * parentheses. Such legacy keysyms should be considered deprecated * and are not recommended for use in future keyboard mappings. ----------------------------------------------------------- cya, # |
From: nidujay <ni...@gm...> - 2008-04-14 12:20:11
|
Hi Harshula, On 13/04/2008, Harshula <har...@gm...> wrote: > > Hi Dushara, > > > On Sun, 2008-04-13 at 19:48 +1000, nidujay wrote: > > > An excerpt from keysimdef.h > > > > ... > > * The "X11 Window System Protocol" standard defines in > > Appendix A the > > * keysym codes. These 29-bit integer values identify > > characters or > > * functions associated with each key (e.g., via the visible > > * engraving) of a keyboard layout. This file assigns mnemonic > > macro > > * names for these keysyms. > > ... > > This is what lead me to believe that there should be definitions to > > match all the keys (including ළු, ්ය. , etc). > > > Just below those comments are these: > ----------------------------------------------------------- > * Where a keysym corresponds one-to-one to an ISO 10646 / Unicode > * character, this is noted in a comment that provides both the U+xxxx > * Unicode position, as well as the official Unicode name of the > * character. > * > * Where the correspondence is either not one-to-one or semantically > * unclear, the Unicode position and name are enclosed in > * parentheses. Such legacy keysyms should be considered deprecated > * and are not recommended for use in future keyboard mappings. > ----------------------------------------------------------- Yes I saw that. However I reckon the comment I quoted takes precedence. I posted a quick question in the xorg-devel channel: [Mon Apr 14 2008] [08:23:29] <dushara> How should UNICODE keysims be defined when the character engraved on a key is represented by more than one unicode code point? ... [Mon Apr 14 2008] [08:23:51] <daniels> dushara: i'm trying to fix that, but the short answer is that you just can't right now. I may be totally off track here, but to me it doesn't make sense for the keysims to simply duplicate the unicode code page. I'm inclined to hold the view that (somehow) the #defines should represent all the keys in the keyboard. If X org itself can't support what we require, I don't see us having any choice but to use the system that is going to be deprecated - unless we hold off the whole thing until they come up with a suitable scheme. |
From: Harshula <har...@gm...> - 2008-04-14 15:40:34
|
Hi Dushara, On Mon, 2008-04-14 at 22:20 +1000, nidujay wrote: > Yes I saw that. However I reckon the comment I quoted takes > precedence. I doubt it, if you read the entire comment you'll notice that older (less relevant) information is at the start/top and the newer (more relevant) information is at the end/bottom. > I posted a quick question in the xorg-devel channel: > > [Mon Apr 14 2008] [08:23:29] <dushara> How should UNICODE keysims > be defined when the character engraved on a key is represented by more > than one unicode code point? > ... > [Mon Apr 14 2008] [08:23:51] <daniels> dushara: i'm trying to fix > that, but the short answer is that you just can't right now. I assume that's Daniel Stone, I spoke to him at LCA about single-key press to multi-codepoint mappings which XKB doesn't support. That issue is independent of #defines. > I may be totally off track here, but to me it doesn't make sense for > the keysims to simply duplicate the unicode code page. Actually it makes complete sense. When you learnt programming, I'm sure you were told that 'magic numbers' are a bad thing. Adding #defines that correspond to the Unicode Sinhala code chart means that we can avoid using 'magic numbers'. cya, # |
From: nidujay <ni...@gm...> - 2008-04-14 22:03:44
|
Hi Harshula, I assume that's Daniel Stone, I spoke to him at LCA about single-key > press to multi-codepoint mappings which XKB doesn't support. That issue > is independent of #defines. > > > > I may be totally off track here, but to me it doesn't make sense for > > the keysims to simply duplicate the unicode code page. > > > Actually it makes complete sense. When you learnt programming, I'm sure > you were told that 'magic numbers' are a bad thing. Adding #defines that > correspond to the Unicode Sinhala code chart means that we can avoid > using 'magic numbers'. Ok so may be I AM off track. From the beginning I've assumed that keysims define the keyboard not the character set. Your line of reasoning indicates otherwise. දුෂාර cya, > # > > |
From: nidujay <ni...@gm...> - 2008-04-15 06:32:21
|
> Ok so may be I AM off track. From the beginning I've assumed that keysims define the keyboard not the character set. Your line of reasoning indicates otherwise. P.S. I realise you said so earlier, but the name of the file (*keysymdef.h*) and the comment I quoted, convinced me otherwise. Dushara |
From: Harshula <har...@gm...> - 2008-04-15 12:35:40
|
On Tue, 2008-04-15 at 16:32 +1000, nidujay wrote: > Ok so may be I AM off track. From the beginning I've assumed that > keysims define the keyboard not the character set. Your line of > reasoning indicates otherwise. > > P.S. I realise you said so earlier, but the name of the file > (keysymdef.h) and the comment I quoted, convinced me otherwise. Simply, X is approx. 3 decades old, some filenames and comments would have been relevant back then but they are less relevant now. cya, # |
From: Harshula <har...@gm...> - 2008-04-27 15:55:15
|
Hi, Here's the latest version from 2 weeks ago, it's still a work-in-progress: http://cvs.savannah.nongnu.org/viewvc/*checkout*/sinhala/patches/x11proto-keysymdef.h-add-sinhala.patch?root=sinhala Asanka, can LTRL have a look at the #defines and recommend improvements? The requirements are below. cya, # On Tue, 2008-04-08 at 02:48 +1000, Harshula wrote: > Hi, > > I've attached two files, both incomplete, that I was working on but gave > up on in 2005. It will be great to get some input from others. > > REQUIREMENTS > ============ > > 1) The #defines should be as short/concise as possible. > 2) Try and make the #defines visually unambiguous. > 3) We have to get it right the first time. These are #defines, so once > we define them, it's unlikely we'll be allowed to change them. > 4) Use ASCII characters. > > ISSUES > ====== > > a) I abandoned the fullnames version because they were ridiculously > long. > > b) I then used the Unicode chart's shortnames, but how should we > differentiate between an independent vowel and a dependent vowel? > - Using case sensitivity to differentiate might be frowned upon. > - I tried adding "_pilla" for D/Vs but that made the name longer. > - I could add a suffix of "_p" for D/Vs. > - How should we name vocalic I/Vs and D/Vs? > > cya, > # |
From: Harshula <har...@gm...> - 2011-03-27 12:40:03
|
Hi, The #defines have been committed to Xorg's x11proto repository: http://cgit.freedesktop.org/xorg/proto/x11proto/commit/?id=423f5faddbb1023d0c1cf55b9d1da4397aa1aa26 Thanks to Dushara, Asanka and Chamila for their invaluable feedback and encouragement! The scheme is documented for others to use in their software too: http://nongnu.org/sinhala/doc/transliteration/sinhala-transliteration_6.html cya, # On Tue, 2008-04-08 at 02:48 +1000, Harshula wrote: > Hi, > > I've attached two files, both incomplete, that I was working on but gave > up on in 2005. It will be great to get some input from others. > > REQUIREMENTS > ============ > > 1) The #defines should be as short/concise as possible. > 2) Try and make the #defines visually unambiguous. > 3) We have to get it right the first time. These are #defines, so once > we define them, it's unlikely we'll be allowed to change them. > 4) Use ASCII characters. > > ISSUES > ====== > > a) I abandoned the fullnames version because they were ridiculously > long. > > b) I then used the Unicode chart's shortnames, but how should we > differentiate between an independent vowel and a dependent vowel? > - Using case sensitivity to differentiate might be frowned upon. > - I tried adding "_pilla" for D/Vs but that made the name longer. > - I could add a suffix of "_p" for D/Vs. > - How should we name vocalic I/Vs and D/Vs? > > cya, > # > ------------------------------------------------------------------------- > This SF.net email is sponsored by the 2008 JavaOne(SM) Conference > Register now and save $200. Hurry, offer ends at 11:59 p.m., > Monday, April 7! Use priority code J8TLD2. > http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone > _______________________________________________ sinhala-technical mailing list sin...@li... https://lists.sourceforge.net/lists/listinfo/sinhala-technical |