AW: microbenchmarks (was: GDI module on win32)

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Sam wrote:
>> Really, I consider the 8/16/32 bit character arrays in CLISP a
>> problematic issue when interfacing to the foreign world.
>why?
>these strings can be converted to 32-bit string at any time you want.
Please explain.
I know that CLISP will upgrade the strings (and fix all pointers at the next GC I guess) but ignore the details.

If my hypothetical --with-unicode module gets passed a 8/16 bit string, what should it do (and especially how)?
a- create (and later forget) a 32-bit version of that string?
b- create the 32-bit version and have the rest of CLISP forget the 8/16-bit version, including the caller?

Also, I'm still unsure whether there wouldn't be an advantage in letting programmers create strings of known width, e.g. for better interfacing with UTF-16) (cf. (sys::string-info (make-string 1)) vs (make-array 'character)).

In particular, I wonder if the following would be a useful thing to have:
(= 16 (sys::string-info (make-array x :element-type charset:utf-16)))
[An encoding is an acceptable argument to the array element type according to CLHS, and it works in CLISP]

Once that is clear, creating the regexp module is trivial.
- get one of the newer regexp codes known to work not only on 8bit char
- compile it with chartype=[unsigned]long (to hopefully get 32bits)
- write a few adequate wrappers.
- verify the module indeed works with 32bit characters
- measure and report speed improvements (or decrease because newer regexp maybe slower than current 5 year old one with less features).
- compare against CL-PPCRE (speed and testcases)

Thanks for enlightment,
	Jorg Hohle.