Lakmal Silva wrote:
Yes. ZWJ (200D) is needed for Sinhala (but not ZWNJ).
I'm investigating the latest Internet Drafts of the
IETF related to IDNAs (International Domain Names) and
how they are going to effect Sinhala domain names. I
have a question weather we need to use ZWJ and ZWNJ
for Sinhala. Any thoughts on this?
According to the IDNAbis drafts, this character is defined as
context-dependent, and needs a rule which says it is OK to allow it.
These rules are currently not defined in IDNAbis, but would be taken
from the unicode documentation (see below) which specifically mentions
the "sri" issue.
I will keep monitoring the drafts, and check if they include the
for more information, see:
excerpt from http://unicode.org/review/pr-96.html
B. ZWJ in the following
context:In a conjunct context. That is, a sequence of the
- A Letter, followed by a Virama, followed by a ZWJ,
where the Letter and Virama are both in the Sinhala script
- This corresponds to the following regular expression (in
Perl-style syntax): /$L $V ZWJ/
- $L = [:General_Category=Letter:]
- $V = [:Canonical_Combining_Class=Virama:]
- Example: The Sinhala
word for the country 'Sri Lanka' in Figure 3A, which uses both a space
character and a ZWJ. Removing the space gives the text in Figure 3B
which is still readable, but removing the ZWJ completely modifies the
appearance of the 'Sri' cluster and gives the text in Figure 3C.