This patch makes splitting a string into single characters use only as many objects as there are distinct characters in the input string. It boosts performance in the short string case, which is nice, but turns the long string case from intractable into quite feasable. :^)
All tests in the current CVS HEAD are passed with this patch applied.
None
Please review; I think this one is OK (it certainly seems to work for me) but it is nice to have someone else give it the once over before applying it...
It applies, builds, and passes 'make test' for me on
a 64-bit Alpha.
I'm a bit uneasy by all the typecasting
Tcl_UniChar -> int -> (char *)
What are they for and are you confident they will be OK
always and everywhere?
Other than that, the variable names hent and intch don't
seem to fit with the core traditions. Use of hPtr or
hashEntry is more common.
Tcl_UniChar -> int -> (char *)
The first is safe, AFAIK, because IIRC Tcl_UniChar represents UTF16 coding of characters which is 2 bytes at most. And int is 2 bytes at least. The (char*) should be read is (void*). Tcl_CreateHashTable has a very generic interface to keys, because of the various possible key-types. As the hash is created with one-word-keys an (int*) is ok, even in the guise of a (char*).
I agree that we should change the variable names to match the core style.
The (char *) is correct as that because that is how hash tables configured with TCL_ONE_WORD_KEYS are used (as documented on the manual page.) The assumption that UNICODE characters are sensibly castable to int is pretty good; it certainly works for UCS-16 and will actually be good in practise (even on 64-bit architectures as deployed) up to UCS-32 which is, incidentally, not supported by the rest of the core anyway. :^)
Variable names in patch have been modified to match core naming scheme. My fault for naming them according to the scheme which I use in my own programs instead...
But why the intermediate type of int ? Why not a direct
cast:
Tcl_UniChar -> (char *)
Note some more followups posted directly to the
tcl-bugs mailing list:
http://www.geocrawler.com/lists/3/SourceForge/7374/0/5149633/
http://www.geocrawler.com/lists/3/SourceForge/7374/0/5151315/
Hopefully those strange 64-bit beasties will like this version better.
I modified the date in the ChangeLog entry, and made
a change to one of the comments ("integral type" rather
than "numeric type").
Looks like it's ready to go now. Make it so. Nice work.