From: John R. <jr...@ce...> - 2014-04-06 22:27:56
|
On Apr 5, 2014, at 6:18 AM, Vassilii Khachaturov <vas...@ta...> wrote: > On 05.04.2014 06:08, John Ralls wrote: >> This is part of the work Josip and I have been doing to fix bugs 7519 and 7258. We've found that if you feed python path functions unicodes it almost always does the right thing; the exception is bsddb, which in Python2 needs to have paths encoded in the filesystem encoding. To get there we've been cleaning out a lot of the foo.encode(sys.getfilesystemencoding()) stuff and also changing gramps files to be consistently utf8. There's a lot of affected code and mistakes are inevitable, so please everyone be vigilant for encoding issues, particularly concerning paths and database names. >> >> Regards, >> John Ralls > > Thank you both for the extensive investigation and bug fixing work on > 7258 & friends, a lot of users will be happier now! > > Could you please summarize your work in a wiki guide page for us all to > peruse, so as to minimize these mistakes? > > Also, maybe you could lay out some guidance for (platform-independent) > unit testing new/modified code to prevent these problems in the future? Vassilii, I’ve extended https://gramps-project.org/wiki/index.php?title=Coding_for_translation#Encoding to explain my current understanding of how to handle encoding. Unit testing encoding is a bit problematic for anything more than trivial tests because it’s difficult to tease apart the encoding effects from other differences between OSes. For example, a lot of the problems arise because the OS takes a string that’s encoded in UTF8 and treats it like it’s encoded in cp850. That doesn’t have any effect on the actual bytes in the string, but a human looking at the result sees garbage. Regards, John Ralls |