From: Peter L. <pet...@te...> - 2010-06-05 20:06:30
|
Den Saturday 05 June 2010 21.35.03 skrev Gerald Britton: > so does that mean you SOL or is there a way to handle it? Yes, Convert str to unicode before calling spli() or you can do str.decode('utf-8').split(). But in this case it's best to convert to unicode as you later will shorten the string and you want to avoid to cut a string with a utf-8 sequence. /Peter > On Sat, Jun 5, 2010 at 1:32 PM, Peter Landgren <pet...@te...> wrote: > > Hi, > > when working with http://www.gramps-project.org/bugs/view.php?id=3935 > > > > I have found out why this happens. > > > > For certain characters, at least for any of "àĠŠƠǠȠɠʠΠР", the > > str.split() function gives in different result in Linux and Windows. > > > > str.split() tried to decode the supplied string using the current > > encoding. If this is UTF-8 as in Linux normally, str.split() works ok. If > > the encoding is cp1252, as in Windows normaly, The second part of these > > characters has the hex value of \xa0, which is interpreted as a > > whitespace and thus the character is split within an UTF-8 sequence. This > > generates error further down. > > > > It's similar to slice a string in the middle of UTF-8 sequences. > > > > /Peter > |