#3 Bug with character encoding

open-later
nobody
None
5
2011-05-18
2008-07-07
Anonymous
No

Some of the metrics (for example BlockDistance) fail if one of the strings has a unicode 160 (non-blocking space) in.

Discussion

  • ReverendSam

    ReverendSam - 2011-05-18

    by fail how do you mean - not the expected results - currently the tokenisation is not very unicode aware, development is currently focused on lower bit unicode as a primary focus, this should be easy to address in the tokenisation code however

     
  • ReverendSam

    ReverendSam - 2011-05-18
    • status: open --> open-later
     

Log in to post a comment.

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:





No, thanks