We need to include support for U+2060 in all fonts so that horizontal alignment of Fatha and Superscript Alef can be achieved - thanks to Abdul Rahim Nizamani
You should have indicated that U+2060 is the "WORD JOINER". Its role is not at all about horizontal alignment, but to indicate that two words must be kept glued together. This does not change the spacing, but forbids some breaks:
If you use U+2060 between two words, there's no extra SPACE between them, the WORD JOINER occurs directly between the last alphabetic grapheme of the 1st word and the first alphabetic grapheme of the 2nd word. Note that its effect is to display usually these two words as if they were glued. A renderer will try to avoid line-breaking at that position, but if there's not enough space on the same line to display the two glued words, the renderer can still insert a line-break between them (possibly with an hyphen after the 1st word on the 1st line) to make them fit better. However the word joiner acts like a space for plain-text collation and searches, that can still detect separate words.
As far as I know, this case never occurs in the Arabic script: this is something needed for other Asian scripts (like Thai or Lao), that are normally written without spaces between words: the WORD JOINER allows forming composite words or expressions that should remain together, but without joining them in ligatures or moving the boundary between complex clusters (such as Indic Aksharas, or some old Hangul syllables).
The example in attachement is clearly not you are trying to get. Here what you want in your example shows that you need non-breaking space (U+00A0) to replace a space (U+0020) between these two words.
The effect of word joiner (U+2060) touching a standard SPACE (U+0020) is completely void, as the space itself still remains a word break opportunity.
Last edit: Philippe Verdy 2025-04-29
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
When I worked on this item then I noticed that U+00A0 was changing the shape of the characters to final and causing the next character to have initial shape. In other words it was breaking a word into two words. Whereas U+2060 works correctly.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
May be the alternative is ZERO-WIDTH NON JOINER (ZWNJ), if the intent is to glue two words together, while still prohibing the joining of the final letter of the 1st word with the 1st letter of the 2nd word. This is the standard behavior in Arabic and it is described in the Unicode standard.
WORD JOINER is not described in the standard for Arabic. Inserting it between two Arabic letters has no effect, it does not override the default joining behavior of these letters. Its role is only semantic for plain-text searches and collation in South-East Asian scripts like Thai and Lao, or CJK scripts (where words are not separated by any white space, and word boundaries are determined by lexical rules and a reduced basic dictionary of exceptions: WJ can be used to override these rules for "uncommon" words that are not part of the basic dictionary and that don't follow the generic lexical rule, notably for transcripted foreign words, or acronyms). WJ is always ignored/discarded by text renderers (except when rendering text with a "visible controls" mode where it would display a "dummy" substitute (some spacing symbol) without changing how other characters around them are to rendered (the only exception is that WJ blocks typographic ligatures of pairs of letters, and generally blocks as well fine-tuned their kerning.
To control the contextual joining forms of letters, ZWJ or ZWNJ can be prepended and/or appended to ANY Arabic letters to change their default joining behavior. This is possible even for letters that are at start or end of paragraphs, or next to other whitespaces, punctuation or symbol.
OpenType Fonts do not need to map ZWJ/ZWNJ, because the OpenType conforming renderer will automatically select the appropriate letterforms, depending on the default joining properties of each letter which are overriden by the presence of ZWJ or ZWNJ just beside them.The default joining behavior of Arabic letters is specified by Unicode in the UCD.
So OpenType fonts just need to map "initial", "medial", "final" or "isolated" forms for each letter (using one of these 4 OpenType "features" that must be implemented for conforming Arabic fonts), and the renderer will select the matching form. All this is entirely documented in the OpenType specifications for the Arabic script.
Last edit: Philippe Verdy 2025-05-01
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
You should have indicated that U+2060 is the "WORD JOINER". Its role is not at all about horizontal alignment, but to indicate that two words must be kept glued together. This does not change the spacing, but forbids some breaks:
If you use U+2060 between two words, there's no extra SPACE between them, the WORD JOINER occurs directly between the last alphabetic grapheme of the 1st word and the first alphabetic grapheme of the 2nd word. Note that its effect is to display usually these two words as if they were glued. A renderer will try to avoid line-breaking at that position, but if there's not enough space on the same line to display the two glued words, the renderer can still insert a line-break between them (possibly with an hyphen after the 1st word on the 1st line) to make them fit better. However the word joiner acts like a space for plain-text collation and searches, that can still detect separate words.
As far as I know, this case never occurs in the Arabic script: this is something needed for other Asian scripts (like Thai or Lao), that are normally written without spaces between words: the WORD JOINER allows forming composite words or expressions that should remain together, but without joining them in ligatures or moving the boundary between complex clusters (such as Indic Aksharas, or some old Hangul syllables).
The example in attachement is clearly not you are trying to get. Here what you want in your example shows that you need non-breaking space (U+00A0) to replace a space (U+0020) between these two words.
The effect of word joiner (U+2060) touching a standard SPACE (U+0020) is completely void, as the space itself still remains a word break opportunity.
Last edit: Philippe Verdy 2025-04-29
When I worked on this item then I noticed that U+00A0 was changing the shape of the characters to final and causing the next character to have initial shape. In other words it was breaking a word into two words. Whereas U+2060 works correctly.
May be the alternative is ZERO-WIDTH NON JOINER (ZWNJ), if the intent is to glue two words together, while still prohibing the joining of the final letter of the 1st word with the 1st letter of the 2nd word. This is the standard behavior in Arabic and it is described in the Unicode standard.
WORD JOINER is not described in the standard for Arabic. Inserting it between two Arabic letters has no effect, it does not override the default joining behavior of these letters. Its role is only semantic for plain-text searches and collation in South-East Asian scripts like Thai and Lao, or CJK scripts (where words are not separated by any white space, and word boundaries are determined by lexical rules and a reduced basic dictionary of exceptions: WJ can be used to override these rules for "uncommon" words that are not part of the basic dictionary and that don't follow the generic lexical rule, notably for transcripted foreign words, or acronyms). WJ is always ignored/discarded by text renderers (except when rendering text with a "visible controls" mode where it would display a "dummy" substitute (some spacing symbol) without changing how other characters around them are to rendered (the only exception is that WJ blocks typographic ligatures of pairs of letters, and generally blocks as well fine-tuned their kerning.
To control the contextual joining forms of letters, ZWJ or ZWNJ can be prepended and/or appended to ANY Arabic letters to change their default joining behavior. This is possible even for letters that are at start or end of paragraphs, or next to other whitespaces, punctuation or symbol.
OpenType Fonts do not need to map ZWJ/ZWNJ, because the OpenType conforming renderer will automatically select the appropriate letterforms, depending on the default joining properties of each letter which are overriden by the presence of ZWJ or ZWNJ just beside them.The default joining behavior of Arabic letters is specified by Unicode in the UCD.
So OpenType fonts just need to map "initial", "medial", "final" or "isolated" forms for each letter (using one of these 4 OpenType "features" that must be implemented for conforming Arabic fonts), and the renderer will select the matching form. All this is entirely documented in the OpenType specifications for the Arabic script.
Last edit: Philippe Verdy 2025-05-01
Diff:
If I use 00A0 then the word breaks apart.