Menu

#50 Improve speed of [split $string ""] (bug #131523)

closed
None
5
2001-02-16
2001-02-09
No

This patch makes splitting a string into single characters use only as many objects as there are distinct characters in the input string. It boosts performance in the short string case, which is nice, but turns the long string case from intractable into quite feasable. :^)

All tests in the current CVS HEAD are passed with this patch applied.

Discussion

  • Donal K. Fellows

    None

     
  • Donal K. Fellows

    Please review; I think this one is OK (it certainly seems to work for me) but it is nice to have someone else give it the once over before applying it...

     
  • Don Porter

    Don Porter - 2001-02-09

    It applies, builds, and passes 'make test' for me on
    a 64-bit Alpha.

    I'm a bit uneasy by all the typecasting

    Tcl_UniChar -> int -> (char *)

    What are they for and are you confident they will be OK
    always and everywhere?

    Other than that, the variable names hent and intch don't
    seem to fit with the core traditions. Use of hPtr or
    hashEntry is more common.

     
  • Andreas Kupries

    Andreas Kupries - 2001-02-12

    Tcl_UniChar -> int -> (char *)

    The first is safe, AFAIK, because IIRC Tcl_UniChar represents UTF16 coding of characters which is 2 bytes at most. And int is 2 bytes at least. The (char*) should be read is (void*). Tcl_CreateHashTable has a very generic interface to keys, because of the various possible key-types. As the hash is created with one-word-keys an (int*) is ok, even in the guise of a (char*).

    I agree that we should change the variable names to match the core style.

     
  • Donal K. Fellows

    The (char *) is correct as that because that is how hash tables configured with TCL_ONE_WORD_KEYS are used (as documented on the manual page.) The assumption that UNICODE characters are sensibly castable to int is pretty good; it certainly works for UCS-16 and will actually be good in practise (even on 64-bit architectures as deployed) up to UCS-32 which is, incidentally, not supported by the rest of the core anyway. :^)

    Variable names in patch have been modified to match core naming scheme. My fault for naming them according to the scheme which I use in my own programs instead...

     
  • Don Porter

    Don Porter - 2001-02-12

    But why the intermediate type of int ? Why not a direct
    cast:

    Tcl_UniChar -> (char *)

     
  • Donal K. Fellows

    Hopefully those strange 64-bit beasties will like this version better.

     
  • Donal K. Fellows

    • assigned_to: nobody --> dkf
     
  • Don Porter

    Don Porter - 2001-02-15

    I modified the date in the ChangeLog entry, and made
    a change to one of the comments ("integral type" rather
    than "numeric type").

    Looks like it's ready to go now. Make it so. Nice work.

     
  • Donal K. Fellows

    • status: open --> Error - status not found
     
  • Donal K. Fellows

    • status: Error - status not found --> closed