|
From: Matitiahu A. <ma...@il...> - 2008-06-16 14:38:12
|
As per ticket 6156, I am working on some possible performance improvements
for ICU4C bidi API (see ubidi.h).
Whenever a string is analyzed for bidi processing, each one of its
characters is checked for its Bidirectional character type. This is done
by calling ubidi_getClass() for each character. This function looks up
the character bidi properties in a trie, which takes non-negligible time.
When the bidi processing is simplest (left-to-right base direction, no RTL
characters), the calls to ubidi_getClass may account for up to 15% of the
total bidi processing.
When the bidi processing is more involved (many directional runs), this
percentage may be reduced to about 7%.
My idea for improving performance is that bidi processing is typically
used for Arabic and Hebrew text, and it is very unusual for Arabic and
Hebrew text to include characters outside their own block except for the
range 0x0 to 0x7f. If I add simple tables for Bidi properties in the
ranges 0x0 to 0x7f and 0x0591 to 0x06ff, the time spent within
ubidi_getClass will be saved.
A side benefit (which I have not quantified) is that the same tables may
also shorten the time spent in u_charMirror() which is used to find
possible mirror-image characters for characters in right-to-left
directional runs. This benefit is of the same order of magnitude as for
ubidi_getClass but only applies to right-to-left directional runs, so it
depends on the percentage of right-to-left runs in the total text.
Costs:
a) 400 bytes of data for tables
b) Very few added lines of code in ubidi_getClass (3 "if" statements, 2
indexed accesses into simple tables) and u_charMirror.
c) The added processing time for characters not in the preferred ranges is
negligible (so small that it does not exceed the variations in
measurements of my not-so-precise test bed).
d) The main work is to modify tools\genbidi to generate the new tables in
addition to the current trie.
My questions to the list:
1) Do my assumptions (in the sentence starting with "My idea" above) seem
reasonable?
2) Is the added table footprint (400 bytes) acceptable?
3) Is it worth bothering at all?
Shalom (Regards), Mati
Bidi Architect
Globalization Center Of Competency - Bidirectional Scripts
IBM Israel
Phone: +972 2 5888802 Fax: +972 2 5870333 Mobile: +972 52
2554160
|