Some of the metrics (for example BlockDistance) fail if one of the strings has a unicode 160 (non-blocking space) in.
by fail how do you mean - not the expected results - currently the tokenisation is not very unicode aware, development is currently focused on lower bit unicode as a primary focus, this should be easy to address in the tokenisation code however
Log in to post a comment.
Sign up for the SourceForge newsletter:
You seem to have CSS turned off.
Please don't fill out this field.