From: Doug B. <dou...@gm...> - 2013-10-24 13:03:41
|
Devs, I know that there is a log.warning that if you don't have PyICU installed, localization will be impaired, but I find that sorting with regular ASCII names is wrong without PyICU. For example, in the flat person view, I get the following order: "Blank, Edward" "Blankenship, Laura" "Blank, Frank" That is, it seems that the ", " are effectively ignored. I think that this is because of the grampslocale sort_key, Python 2.7, and no PyICU. The error goes away if PyICU is installed. So, one of two things: 1) we should change the warning so that it is more dire (eg, "Sorting operations will be incorrect") or 2) there is a bug in the sort_key function when no PyICU I don't know enough about the locale to see if it is a bug that can be fixed. -Doug PS - Yes, I really have a "Frank Blank" in my tree. Frank apparently had a cruel father, or perhaps Hank was just getting back at his father. |
From: Vassilii K. <vas...@ta...> - 2013-10-24 13:46:27
|
On 24.10.2013 16:03, Doug Blank wrote: > we should change the warning so that it is more dire (eg, "Sorting > operations will be incorrect") Right now it gives a "PyICU not available: sorting may be incorrect" w/o PyICU already! See src/Utils.py This is a console warning, though, so it's not reflected in the GUI at all. Whoever launches gramps with an icon will miss it. V |
From: Doug B. <dou...@gm...> - 2013-10-24 14:44:15
|
On Thu, Oct 24, 2013 at 9:46 AM, Vassilii Khachaturov <vas...@ta...> wrote: > On 24.10.2013 16:03, Doug Blank wrote: >> we should change the warning so that it is more dire (eg, "Sorting >> operations will be incorrect") > Right now it gives a "PyICU not available: sorting may be incorrect" w/o > PyICU already! See src/Utils.py > This is a console warning, though, so it's not reflected in the GUI at > all. Whoever launches gramps with an icon will miss it. Actually, I was referring to trunk: LOG.warning("ICU not loaded because %s. Localization will be impaired. " "Use your package manager to install PyICU", str(err)) But, my point, even with the 3.4 message, is that it isn't strong enough. If I am getting bad sorting with ASCII data, then I suspect that sorting *will always be incorrect* in the flatview. However, perhaps this is just a bug that can be fixed, and then one might not need PyICU as much as is needed now. -Doug > > V > > ------------------------------------------------------------------------------ > October Webinars: Code for Performance > Free Intel webinars can help you accelerate application performance. > Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from > the latest Intel processors and coprocessors. See abstracts and register > > http://pubads.g.doubleclick.net/gampad/clk?id=60135991&iu=/4140/ostg.clktrk > _______________________________________________ > Gramps-devel mailing list > Gra...@li... > https://lists.sourceforge.net/lists/listinfo/gramps-devel |
From: Vassilii K. <vas...@ta...> - 2013-10-24 13:50:04
|
On 24.10.2013 16:46, Vassilii Khachaturov wrote: > On 24.10.2013 16:03, Doug Blank wrote: >> we should change the warning so that it is more dire (eg, "Sorting >> operations will be incorrect") > Right now it gives a "PyICU not available: sorting may be incorrect" > w/o PyICU already! See src/Utils.py > This is a console warning, though, so it's not reflected in the GUI at > all. Whoever launches gramps with an icon will miss it. > > V I stand corrected --- src/Utils.py seems to wrap this in a function "encondingdefs" which I don't see called from anywhere?!... |
From: Vassilii K. <vas...@ta...> - 2013-10-24 13:51:23
|
On 24.10.2013 16:49, Vassilii Khachaturov wrote: > On 24.10.2013 16:46, Vassilii Khachaturov wrote: >> On 24.10.2013 16:03, Doug Blank wrote: >>> we should change the warning so that it is more dire (eg, "Sorting >>> operations will be incorrect") >> Right now it gives a "PyICU not available: sorting may be incorrect" >> w/o PyICU already! See src/Utils.py >> This is a console warning, though, so it's not reflected in the GUI >> at all. Whoever launches gramps with an icon will miss it. >> >> V > I stand corrected --- src/Utils.py seems to wrap this in a function > "encondingdefs" which I don't see called from anywhere?!... I see -- the function seems to be there for doc purposes only, it is followed with a sneaky try: which begins a global code section :-) |
From: John R. <jr...@ce...> - 2013-10-24 18:04:41
|
On Oct 24, 2013, at 6:03 AM, Doug Blank <dou...@gm...> wrote: > Devs, > > I know that there is a log.warning that if you don't have PyICU > installed, localization will be impaired, but I find that sorting with > regular ASCII names is wrong without PyICU. For example, in the flat > person view, I get the following order: > > "Blank, Edward" > "Blankenship, Laura" > "Blank, Frank" > > That is, it seems that the ", " are effectively ignored. I think that > this is because of the grampslocale sort_key, Python 2.7, and no > PyICU. The error goes away if PyICU is installed. > > So, one of two things: > > 1) we should change the warning so that it is more dire (eg, "Sorting > operations will be incorrect") > > or > > 2) there is a bug in the sort_key function when no PyICU > > I don't know enough about the locale to see if it is a bug that can be fixed. > > -Doug > > PS - Yes, I really have a "Frank Blank" in my tree. Frank apparently > had a cruel father, or perhaps Hank was just getting back at his > father. Doug, I'd go with a bug, but I'm not ready to say where it is. Do you see this on the 3.4 SVN branch as well? Regards, John Ralls |
From: Doug B. <dou...@gm...> - 2013-10-24 18:35:11
|
On Thu, Oct 24, 2013 at 2:04 PM, John Ralls <jr...@ce...> wrote: > > On Oct 24, 2013, at 6:03 AM, Doug Blank <dou...@gm...> wrote: > >> Devs, >> >> I know that there is a log.warning that if you don't have PyICU >> installed, localization will be impaired, but I find that sorting with >> regular ASCII names is wrong without PyICU. For example, in the flat >> person view, I get the following order: >> >> "Blank, Edward" >> "Blankenship, Laura" >> "Blank, Frank" >> >> That is, it seems that the ", " are effectively ignored. I think that >> this is because of the grampslocale sort_key, Python 2.7, and no >> PyICU. The error goes away if PyICU is installed. >> >> So, one of two things: >> >> 1) we should change the warning so that it is more dire (eg, "Sorting >> operations will be incorrect") >> >> or >> >> 2) there is a bug in the sort_key function when no PyICU >> >> I don't know enough about the locale to see if it is a bug that can be fixed. >> >> -Doug >> >> PS - Yes, I really have a "Frank Blank" in my tree. Frank apparently >> had a cruel father, or perhaps Hank was just getting back at his >> father. > > Doug, > > I'd go with a bug, but I'm not ready to say where it is. > Do you see this on the 3.4 SVN branch as well? Yes, I see it on gramps34 and gramps40 without PyICU and Python 2.7 (various flavors of Ubuntu). -Doug > Regards, > John Ralls > > > ------------------------------------------------------------------------------ > October Webinars: Code for Performance > Free Intel webinars can help you accelerate application performance. > Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from > the latest Intel processors and coprocessors. See abstracts and register > > http://pubads.g.doubleclick.net/gampad/clk?id=60135991&iu=/4140/ostg.clktrk > _______________________________________________ > Gramps-devel mailing list > Gra...@li... > https://lists.sourceforge.net/lists/listinfo/gramps-devel |
From: John R. <jr...@ce...> - 2013-10-25 00:11:12
|
On Oct 24, 2013, at 11:35 AM, Doug Blank <dou...@gm...> wrote: > On Thu, Oct 24, 2013 at 2:04 PM, John Ralls <jr...@ce...> wrote: >> >> >> I'd go with a bug, but I'm not ready to say where it is. >> Do you see this on the 3.4 SVN branch as well? > > Yes, I see it on gramps34 and gramps40 without PyICU and Python 2.7 > (various flavors of Ubuntu). OK, I see the same thing on Debian Jessie and Fedora-18, with both Python-2.7 and 3.3. I don't think that the problem is in grampslocale.sort_key though. When I try those names by hand I get the expected result: john@DebianJessie:~$ python3 Python 3.3.2+ (default, Sep 18 2013, 11:58:01) [GCC 4.8.1] on linux Type "help", "copyright", "credits" or "license" for more information. >>> import locale >>> locale.strxfrm('Blankenship, Laura') < locale.strxfrm('Blank, Frank') False >>> Perhaps it has to do with gen.display.name.raw_sorted_name()? It's implemented as a collection lookup that's a bit opaque without more study than I have time for ATM. Regards, John Ralls |
From: Doug B. <dou...@gm...> - 2013-10-25 02:03:59
|
On Thu, Oct 24, 2013 at 8:10 PM, John Ralls <jr...@ce...> wrote: > > On Oct 24, 2013, at 11:35 AM, Doug Blank <dou...@gm...> wrote: > >> On Thu, Oct 24, 2013 at 2:04 PM, John Ralls <jr...@ce...> wrote: >>> >>> >>> I'd go with a bug, but I'm not ready to say where it is. >>> Do you see this on the 3.4 SVN branch as well? >> >> Yes, I see it on gramps34 and gramps40 without PyICU and Python 2.7 >> (various flavors of Ubuntu). > > OK, I see the same thing on Debian Jessie and Fedora-18, with both Python-2.7 and 3.3. > I don't think that the problem is in grampslocale.sort_key though. When I try those > names by hand I get the expected result: > john@DebianJessie:~$ python3 > Python 3.3.2+ (default, Sep 18 2013, 11:58:01) > [GCC 4.8.1] on linux > Type "help", "copyright", "credits" or "license" for more information. >>>> import locale >>>> locale.strxfrm('Blankenship, Laura') < locale.strxfrm('Blank, Frank') > False >>>> Something interesting: I get the same outside of Gramps: $ python Python 2.7.4 (default, Sep 26 2013, 03:20:26) [GCC 4.7.3] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import locale >>> locale.strxfrm('Blankenship, Laura') < locale.strxfrm('Blank, Frank') False But I get the opposite inside of Gramps, stopping in grampslocale: (Pdb) locale.strxfrm('Blankenship, Laura') < locale.strxfrm('Blank, Frank') True I'm not sure what to look at here, but some info: (Pdb) locale.getlocale() ('en_US', 'UTF-8') Back outside of Gramps: >>> locale.getlocale() (None, None) Hope that helps, -Doug > > Perhaps it has to do with gen.display.name.raw_sorted_name()? It's implemented as a > collection lookup that's a bit opaque without more study than I have time for ATM. > > Regards, > John Ralls > |
From: John R. <jr...@ce...> - 2013-10-25 05:34:27
|
On Oct 24, 2013, at 7:03 PM, Doug Blank <dou...@gm...> wrote: > On Thu, Oct 24, 2013 at 8:10 PM, John Ralls <jr...@ce...> wrote: >> >> On Oct 24, 2013, at 11:35 AM, Doug Blank <dou...@gm...> wrote: >> >>> On Thu, Oct 24, 2013 at 2:04 PM, John Ralls <jr...@ce...> wrote: >>>> >>>> >>>> I'd go with a bug, but I'm not ready to say where it is. >>>> Do you see this on the 3.4 SVN branch as well? >>> >>> Yes, I see it on gramps34 and gramps40 without PyICU and Python 2.7 >>> (various flavors of Ubuntu). >> >> OK, I see the same thing on Debian Jessie and Fedora-18, with both Python-2.7 and 3.3. >> I don't think that the problem is in grampslocale.sort_key though. When I try those >> names by hand I get the expected result: >> john@DebianJessie:~$ python3 >> Python 3.3.2+ (default, Sep 18 2013, 11:58:01) >> [GCC 4.8.1] on linux >> Type "help", "copyright", "credits" or "license" for more information. >>>>> import locale >>>>> locale.strxfrm('Blankenship, Laura') < locale.strxfrm('Blank, Frank') >> False >>>>> > > Something interesting: I get the same outside of Gramps: > > $ python > Python 2.7.4 (default, Sep 26 2013, 03:20:26) > [GCC 4.7.3] on linux2 > Type "help", "copyright", "credits" or "license" for more information. >>>> import locale >>>> locale.strxfrm('Blankenship, Laura') < locale.strxfrm('Blank, Frank') > False > > But I get the opposite inside of Gramps, stopping in grampslocale: > > (Pdb) locale.strxfrm('Blankenship, Laura') < locale.strxfrm('Blank, Frank') > True > > I'm not sure what to look at here, but some info: > > (Pdb) locale.getlocale() > ('en_US', 'UTF-8') > > Back outside of Gramps: > >>>> locale.getlocale() > (None, None) > > Hope that helps, Hmmm. I'll have to test that in C, but it's starting to smell like a glibc bug -- which explains why it works on OSX (which uses an Apple-modified BSD libc) and not on Linux. If that bears out I'll have to come up with a workaround, because it will take way to long for a corrected glibc to filter down to the distros. Regards, John Ralls |
From: John R. <jr...@ce...> - 2013-10-25 19:34:20
|
On Oct 24, 2013, at 10:34 PM, John Ralls <jr...@ce...> wrote: > > On Oct 24, 2013, at 7:03 PM, Doug Blank <dou...@gm...> wrote: > >> On Thu, Oct 24, 2013 at 8:10 PM, John Ralls <jr...@ce...> wrote: >>> >>> On Oct 24, 2013, at 11:35 AM, Doug Blank <dou...@gm...> wrote: >>> >>>> On Thu, Oct 24, 2013 at 2:04 PM, John Ralls <jr...@ce...> wrote: >>>>> >>>>> >>>>> I'd go with a bug, but I'm not ready to say where it is. >>>>> Do you see this on the 3.4 SVN branch as well? >>>> >>>> Yes, I see it on gramps34 and gramps40 without PyICU and Python 2.7 >>>> (various flavors of Ubuntu). >>> >>> OK, I see the same thing on Debian Jessie and Fedora-18, with both Python-2.7 and 3.3. >>> I don't think that the problem is in grampslocale.sort_key though. When I try those >>> names by hand I get the expected result: >>> john@DebianJessie:~$ python3 >>> Python 3.3.2+ (default, Sep 18 2013, 11:58:01) >>> [GCC 4.8.1] on linux >>> Type "help", "copyright", "credits" or "license" for more information. >>>>>> import locale >>>>>> locale.strxfrm('Blankenship, Laura') < locale.strxfrm('Blank, Frank') >>> False >>>>>> >> >> Something interesting: I get the same outside of Gramps: >> >> $ python >> Python 2.7.4 (default, Sep 26 2013, 03:20:26) >> [GCC 4.7.3] on linux2 >> Type "help", "copyright", "credits" or "license" for more information. >>>>> import locale >>>>> locale.strxfrm('Blankenship, Laura') < locale.strxfrm('Blank, Frank') >> False >> >> But I get the opposite inside of Gramps, stopping in grampslocale: >> >> (Pdb) locale.strxfrm('Blankenship, Laura') < locale.strxfrm('Blank, Frank') >> True >> >> I'm not sure what to look at here, but some info: >> >> (Pdb) locale.getlocale() >> ('en_US', 'UTF-8') >> >> Back outside of Gramps: >> >>>>> locale.getlocale() >> (None, None) >> >> Hope that helps, > > Hmmm. I'll have to test that in C, but it's starting to smell like a glibc bug -- which explains why it works on OSX (which uses an Apple-modified BSD libc) and not on Linux. If that bears out I'll have to come up with a workaround, because it will take way to long for a corrected glibc to filter down to the distros. > Yup. Or at least not Python's fault. The test program: #include <string.h> #include <stdio.h> #include <locale.h> #define BUFSIZE 128 int main (int argc, char **argv) { const char *frank = "Blank, Frank"; const char *laura = "Blankenship, Laura"; char sf1[BUFSIZE], sl1[BUFSIZE], sf2[BUFSIZE], sl2[BUFSIZE]; size_t len = BUFSIZE, olen; olen = strxfrm (sf1, frank, len); olen = strxfrm (sl1, laura, len); printf ("C: %s\n", strcmp (sf1, sl1) < 0 ? "OK" : "Bad"); setlocale (LC_ALL, ""); olen = strxfrm (sf2, frank, len); olen = strxfrm (sl2, laura, len); printf ("Locale: %s\n", strcmp (sf2, sl2) < 0 ? "OK" : "Bad"); printf ("Sort Forms:\nC:\n\t%s\n\t%s\nLocale:\n\t%s\n\t%s\n", sf1, sl1, sf2, sl2); return 0; } Compiled on Debian Jessie, the results are: john@DebianJessie:~$ ./strxfrmtest C: OK Locale: Bad Sort Forms: C: Blank, Frank Blankenship, Laura Locale: 0xd, 0x17, 0xc, 0x19, 0x16, 0x11, 0x1d, 0xc, 0x19, 0x16, 0x1, 0x9, 0x9, 0x9, 0x9, 0x9, 0x9, 0x9, 0x9, 0x9, 0x9, 0x1, 0x10, 0x9, 0x9, 0x9, 0x9, 0x10, 0x9, 0x9, 0x9, 0x9, 0x1, 0x6, 0x3b, 0x1, 0x35 0xd, 0x17, 0xc, 0x19, 0x16, 0x10, 0x19, 0x1e, 0x13, 0x14, 0x1b, 0x17, 0xc, 0x20, 0x1d, 0xc, 0x1, 0x9 <repeats 16 times>, 0x1, 0x10, 0x9, 0x9, 0x9, 0x9, 0x9, 0x9, 0x9, 0x9, 0x9, 0x9, 0x10, 0x9, 0x9, 0x9, 0x9, 0x1, 0xc, 0x3b, 0x1, 0x35 (The terminal output of sf2 and sl2 is actually undisplayable, the hex codes are from inspecting the variables in the debugger.) Compiled on OSX: C: OK Locale: OK Sort Forms: C: Blank, Frank Blankenship, Laura Locale: 0014001^001S001`001]000^000R0018001d001S001`001]00000014001^001S001`001]000^000R0018001d001S001`001] 0014001^001S001`001]001W001`001e001Z001[001b000^000R001>001S001g001d001S00000014001^001S001`001]001W001`001e001Z001[001b000^000 I played with some other locales and found that C.utf8 works too, so it's probably a problem with the collation files in /usr/share/i18/locales, in particular iso_14651_t_common. Anyway, we need to work around it. One approach would be to tokenize the strings and compare the tokens in succession until there's a difference. Another, possibly better, approach would be to make ICU a mandatory dependency and get rid of all of the libc localization. I've opened http://www.gramps-project.org/bugs/view.php?id=7161 for further discussion. Regards, John Ralls |