Thread: [sinhala-technical] Sinhala Unicode and IDNAs
Brought to you by:
aratnaweera,
harshula
From: Lakmal S. <lak...@ya...> - 2008-02-17 22:40:33
|
Hi, I'm investigating the latest Internet Drafts of the IETF related to IDNAs (International Domain Names) and how they are going to effect Sinhala domain names. I have a question weather we need to use ZWJ and ZWNJ for Sinhala. Any thoughts on this? A recent draft related to Unicode Codepoints and IDNA is given in the following ID. http://www.ietf.org/internet-drafts/draft-faltstrom-idnabis-tables-04.txt Under this, codepoints with the property DISALLOWED will not be permitted in the IDNs. These are described in the draft under Category A. If you can give a feed back on these DISALLOWED codepoints for Sinhala language, it would be great. The other question I have is the way Shri is written using the unicode characters. I think the 'r' sound vowel is not included in the unicode (the part goes below the letter 'sha'). So currently I have seen 'Shri' is written using the letters 'sha' and 'ree' instead of the way we write generally, using only the letter 'sha'. Does anyone know why 'r' vowel is not included in the unicode? Regards, Lakmal ____________________________________________________________________________________ Be a better friend, newshound, and know-it-all with Yahoo! Mobile. Try it now. http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ |
From: Gihan D. <gi...@cs...> - 2008-02-19 00:59:42
Attachments:
pr-96.sinhala.gif
|
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> <html> <head> <meta content="text/html;charset=UTF-8" http-equiv="Content-Type"> <title></title> </head> <body bgcolor="#ffffff" text="#000000"> Lakmal Silva wrote: <blockquote cite="mid:450...@we..." type="cite"> <pre wrap="">Hi, I'm investigating the latest Internet Drafts of the IETF related to IDNAs (International Domain Names) and how they are going to effect Sinhala domain names. I have a question weather we need to use ZWJ and ZWNJ for Sinhala. Any thoughts on this? </pre> </blockquote> Yes. ZWJ (200D) is needed for Sinhala (but not ZWNJ).<br> <br> According to the IDNAbis drafts, this character is defined as context-dependent, and needs a rule which says it is OK to allow it. These rules are currently not defined in IDNAbis, but would be taken from the unicode documentation (see below) which specifically mentions the "sri" issue.<br> <br> I will keep monitoring the drafts, and check if they include the applicable rule.<br> <br> for more information, see:<br> <br> <a class="moz-txt-link-freetext" href="http://www3.tools.ietf.org/html/draft-klensin-idnabis-issues">http://www3.tools.ietf.org/html/draft-klensin-idnabis-issues</a><br> <br> Gihan<br> ---<br> excerpt from <a class="moz-txt-link-freetext" href="http://unicode.org/review/pr-96.html">http://unicode.org/review/pr-96.html</a><br> <br> <b><span id="st" name="st" class="st">B. ZWJ</span> in the following context:</b><b>In a conjunct context. </b>That is, a sequence of the form: <ul> <li>A Letter, followed by a Virama, followed by a ZWJ,<br> where the Letter and Virama are both in the Sinhala script</li> <li>This corresponds to the following regular expression (in Perl-style syntax): <b>/$L $V <span id="st" name="st" class="st">ZWJ</span>/</b><br> where: <ul> <li>$L = [:General_Category=Letter:]</li> <li>$V = [:Canonical_Combining_Class<wbr>=Virama:]</li> </ul> </li> <li><span style="font-weight: bold;">Example: </span>The Sinhala word for the country 'Sri Lanka' in Figure 3A, which uses both a space character and a ZWJ. Removing the space gives the text in Figure 3B which is still readable, but removing the ZWJ completely modifies the appearance of the 'Sri' cluster and gives the text in Figure 3C. <center> <table style="border-collapse: collapse;" border="0"> <tbody> <tr> <td> <p align="center"><b><br> <font size="5">Figure 3.</font></b></p> </td> </tr> <tr> <td><img src="cid:par...@cs..." border="0"></td> </tr> </tbody> </table> </center> </li> </ul> </body> </html> |
From: K. S. <sk...@gm...> - 2008-02-27 01:30:43
|
Hi all Two days back (Feb 25) I had written following message but forgetfully by pressing only "Reply" in Gihan's posting instead of "Reply to all". So here I am reposting mine to the list ~Sethu ------------->> Hi The link in Lakmal's message http://www.ietf.org/internet-drafts/draft-faltstrom-idnabis-tables-04.txt gives 404 error. But via the http://www3.tools.ietf.org/html/draft-klensin-idnabis-issues link in Gihan's reply I found http://stupid.domain.name/idnabis/ which lists all drafts including table-04.txt as well as older and newer than that. Am I right in assuming later drafts supersede the older ones? There is no change in the status of ZWJ in drafts 5,5a and 5b compared to 4. But I noticed another change among code points of Tamil and Sinhala (which two sets only I looked at). In draft 04 all Unicode code points having Canonical/Compatible decompositions and compositions (there are 4 each for these two languages) were having "DISALLOWED" status whereas in drafts -05 they are "PVALID". Is "Disallowed" status for the purpose of eliminating phishing possibilities? Sethu (K. Sethuramalingam) On Feb 19, 2008 6:29 AM, Gihan Dias <gi...@cs...> wrote: > > Lakmal Silva wrote: > Hi, > > I'm investigating the latest Internet Drafts of the > IETF related to IDNAs (International Domain Names) and > how they are going to effect Sinhala domain names. I > have a question weather we need to use ZWJ and ZWNJ > for Sinhala. Any thoughts on this? > > Yes. ZWJ (200D) is needed for Sinhala (but not ZWNJ). > > According to the IDNAbis drafts, this character is defined as > context-dependent, and needs a rule which says it is OK to allow it. These > rules are currently not defined in IDNAbis, but would be taken from the > unicode documentation (see below) which specifically mentions the "sri" > issue. > > I will keep monitoring the drafts, and check if they include the applicable > rule. > > for more information, see: > > http://www3.tools.ietf.org/html/draft-klensin-idnabis-issues > > Gihan > --- > excerpt from http://unicode.org/review/pr-96.html > > B. ZWJ in the following context:In a conjunct context. That is, a sequence > of the form: > > A Letter, followed by a Virama, followed by a ZWJ, > where the Letter and Virama are both in the Sinhala script > This corresponds to the following regular expression (in Perl-style syntax): > /$L $V ZWJ/ > where: > > $L = [:General_Category=Letter:] > $V = [:Canonical_Combining_Class=Virama:] > Example: The Sinhala word for the country 'Sri Lanka' in Figure 3A, which > uses both a space character and a ZWJ. Removing the space gives the text in > Figure 3B which is still readable, but removing the ZWJ completely modifies > the appearance of the 'Sri' cluster and gives the text in Figure 3C. > > > > Figure 3. > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Microsoft > Defy all challenges. Microsoft(R) Visual Studio 2008. > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > _______________________________________________ > sinhala-technical mailing list > sin...@li... > https://lists.sourceforge.net/lists/listinfo/sinhala-technical > > |