When using a utf-8 NSString RKEnumerator performance drops dramatically. The problem lies in RKStringBufferWithString() and the heavy use of it in all the enumerator methods. In RKStringBufferWithString() it calls CFStringGetFastestEncoding() which returns utf-16 when called on a utf-8 NSString. This causes the string to unnecessarily be converted on each call which is terribly slow.
To make matters worse, the resulting string buffer is not cached throughout the lifetime of RKEnumerator so it is being recreated over and over. We've tweaked the code to cache this buffer (and changed calls to RKutf8to16 to RKConvertUTF8ToUTF16RangeForStringBuffer, etc) and saw a 500%+ speed increase.
Furthermore, it may be useful to allow users to explicitly send the string encoding into the various RegexKit methods to avoid the very expensive conversion to UTF-8 when it doesn't need to happen.
Thanks,
Will
will@panic.com