From: Edward M. <em...@co...> - 2011-04-23 15:05:26
|
On 23 Apr 2011, at 2:18 AM, Smokey Ardisson wrote: > At 11:23 PM +0200 on 4/21/11, Fridrich Strba wrote: > >> On 21/04/11 21:52, Smokey Ardisson wrote: >>> Do we have reason to believe that the mapping in WP5 is different from >>> the ones I contributed for WP6? >> >> Yes, we have a strong reason to believe that not all are the same. If >> you take the arabic test in the zip I pointed to and convert with a >> fresh checkout of libwpd to html, you will see remarkable differences. >> The advantage though is that that test document is actually having the >> names of the characters near to them, which will make the janitorial >> work a bit easier. > > I can't believe that a company would move around blocks of characters > from one version of the software to the next, especially when those > characters' codes are the canonical ways of identifying them :-P > :sigh: However, there were not too many differences/corrections > between what's in libwpd_internal.cpp right now for set 13. I won't be able to send more details until later today, or tomorrow, but I can confirm (what I assume you already know) that there are two states of the 5.1 character sets - one state for non-Hebrew/Arabaic 5.1, another for Hebrew and Arabic 5.1 - and that the 6.x character sets are very different from the 5.1 sets. About the two states of the 5.1 sets: until Hebrew and Arabic was released, there were only 12 sets; the Hebrew/Arabic version had a much larger set 9 (Hebrew) and added Arabic sets 13 and 14. I don't have a full list of the differences between the 5.1 and 6.x in the other character sets, but briefly: Set 1: 6.x adds some characters at the end Set 2. 5.x and 6.x are completely different. Set 4: 6.x adds some characters at the end Set 5: 5.x and 6.x are completely different Set 6: 6.x adds some characters at the end Set 8: 6.x adds some characters at the end Set 9: 6.x is vastly larger (I haven't yet checked whether it's the same as 5.1 Hebrew) Set 10: 6.x adds some characters at the end Set 11: 6.x is completely different Set 13/14: I think you discovered that these are different in 5.1 Arabic and 6.x? You can find on my Arabic and Hebrew WP page full sets of printer drivers for Arabic and Hebrew 5.1, and these may help in mapping characters: http://www.columbia.edu/~em36/wpdos/arabicandhebrew.html I'll get back to testing later today or tomorrow at the latest. Edward |