From: Peter L. <pet...@te...> - 2009-07-01 14:52:13
|
> 2009/6/30 Peter Landgren <pet...@te...>: > > Upgraded my Win XP gramps to svn 12740 today in order to check sorting. > > > > It does not work with non-latin, at least for Swedish locale. The coding > > reported by Windows XP says cp1250. Python 2.5.1. > > > > Linux reports UTF8. > > > > The sorting is very strange, If I take the first character it sorts like: > > Ä,Å,Ö,A,Ö,A,Ö,A,B,C,..X,Y,Z (Switches between A and Ö a number of times.) > > This is strange indeed. So that is what you see in eg the placeview > where the new sort is in effect? Yes, I see it all views, where I have strings beginning with Å or Ä or Ö. > You should add to your testprogram the sorting as used in flatbasemodel.py: > > print > print "= = List b sorted as in treeviews ===" > d = sorted((locale.strxfrm(x),x) for x in b) > d = [x[1] for x in d] > print c > for x in range(len(a1)): print a[x], d[x], a[x] == d[x] > print "Lists equal:", a == d > print If I do this and if I NOT encode I get an error. both in Linux and WinXP: Traceback (most recent call last): File "SortTestA2.py", line 53, in <module> d = sorted((locale.strxfrm(x),x) for x in b) File "SortTestA2.py", line 53, in <genexpr> d = sorted((locale.strxfrm(x),x) for x in b) UnicodeEncodeError: 'ascii' codec can't encode character u'\xc5' in position 0: ordinal not in range(128) strxfrm seems to believe that the string is ascii??? If I encode, all is OK. > So strxfrm converts your string to something that is _not_ a string > but sorts just as strcoll. > > > while it should be > > A,B,C...X,Y,Z,Ä,Ö > > > > I have tried to get similar result with a little test program without > > success. However, with this test program I can get correct results with > > both strcoll AND strxfrm. > > Then it should work ... > Note however that what you get in gramps is UTF-8 code that is sorted, > while your program on the other hand uses unicode strings and converts > that to the encoding of the OS then sorts. This is not the same as > what you see in GRAMPS. UTF-8 is what is in your database, that is > obtained and sorted with strxfrm. > To mimic this, you should take your unicode list, convert to UTF-8, > then output that to a file, and on windows read that file and do the > sorting. I will try this later. > > I do an string.encode(<code from locale.getlocale()) before performing > > the sorting. I don't know if this is a method that can be used in gramps. > > If you have unicode, you can encode, but we already have UTF-8, so you > need to decode, then encode. > If you take the present code of GRAMPS trunk for sorting (in > flatbasemodel.py): > > def sort_keys(self): > with self.gen_cursor() as cursor: > #loop over database and store the sort field, and the handle > return sorted( (locale.strxfrm(self.sort_func(data)), key) > for key, data in cursor ) > > and change it to the following on windows ; > > def sort_keys(self): > with self.gen_cursor() as cursor: > #loop over database and store the sort field, and the handle > return sorted( > (locale.strxfrm(unicode(self.sort_func(data), "utf-8").encode()), > key) > for key, data in cursor ) > > Does it then work? So this takes the UTF-8 bytes from the database, > makes it unicode, then encodes it with the OS encoding, then applies > strxfrm. No: 34671: ERROR: gramps.py: line 128: Unhandled exception Traceback (most recent call last): File "C:\Program\grampstrunk\gui\viewmanager.py", line 1009, in change_page self.__do_change_page(num) File "C:\Program\grampstrunk\gui\viewmanager.py", line 1028, in __do_change_page self.active_page.set_active() File "C:\Program\grampstrunk\PageView.py", line 405, in set_active PageView.set_active(self) File "C:\Program\grampstrunk\PageView.py", line 118, in set_active self.build_tree() File "C:\Program\grampstrunk\PageView.py", line 953, in build_tree sort_map=self.column_order()) File "C:\Program\grampstrunk\DisplayModels\_PlaceModel.py", line 93, in __init__ search=search, skip=skip, sort_map=sort_map) File "C:\Program\grampstrunk\gui\views\treemodels\flatbasemodel.py", line 397, in __init__ self.rebuild_data() File "C:\Program\grampstrunk\gui\views\treemodels\flatbasemodel.py", line 523, in _rebuild_filter allkeys = self.sort_keys() File "C:\Program\grampstrunk\gui\views\treemodels\flatbasemodel.py", line 482, in sort_keys sort_data.append((locale.strxfrm(unicode(self.sort_func(data), "utf-8").encode()), key)) TypeError: decoding Unicode is not supported /Peter > We could change what is used easily based on the platform that is > being used. In linux going to unicode and back will just result in the > same thing. > > Benny > > > I attached the test script. > > /Peter > > > >> Peter and windows interested people > >> > >> Gerald has started to move code again to the key attribute with > >> strxfrm as the cmp method of sort is deprecated in python 3.0, and > >> next version of GRAMPS tries to be as compatible as possible. > >> Also, this sort is often fasten. > >> For my listview changes, I also need to use strxfrm as I need to be > >> able to insert via the bisect module, so strings should sort correctly > >> with the < operator only to enable this. > >> > >> So, if this problem is windows is important to you, you should take it > >> upstream. I suggest first a bug submission with python, and if they > >> say it is a C lib issue on windows, then with Microsoft. > >> > >> Original bug that will be present in Trunk again: > >> http://www.gramps-project.org/bugs/view.php?id=2504 > >> Perhaps best to reopen this bug and link to an upstream bug submission > >> > >> Benny > >> > >> 2009/6/25 Benny Malengier <ben...@gm...>: > >> > 2009/6/25 Peter Landgren <pet...@te...>: > >> >> I have tried to get my little Python script to work in WinXP, but > >> >> somehow Python under win report it uses encoding 'cp1252' (Latin 1 I > >> >> think). My test strings are in UTF-8, so the collating will be wrong. > >> >> I have tried to change window coding to UTF-8, but without success. > >> >> It's strange that the sort works in Gramps, and not in my little > >> >> scipt? > >> > > >> > python works with str and utf internally, and can convert to the > >> > system encoding as needed (in and out). So that will be an extra layer > >> > you have on win. You can probably work around that with the encoding > >> > stuff in python, so convert the utf to binary string, and then to the > >> > encoding you have.... > >> > Encodings are complicated though, I try to stay away from it :-) > >> > > >> > Benny > >> > > >> >> It works OK in Linux. > >> >> > >> >> I also installed Python 3 on my Win box, and after some conversion of > >> >> the program I have a similar problem with 'cp1252'. > >> >> > >> >> I'm stuck for now. > >> >> > >> >> /Peter |