## Re: [Gramps-devel] Sort mystery

 Re: [Gramps-devel] Sort mystery From: Peter Landgren - 2013-02-01 14:48:01 Attachments: Message as HTML ```Den Friday 01 February 2013 12.07.14 skrev Tim Lyons: > A challenge for searching for an answer!! > > > > Benny Malengier wrote > > > 2009/4/24 Peter Landgren < > > > > peter.talken@ > > > > > > > That indicates that the procedure that adds these first letters must be > > made > > more clever: > > > > 1/ If symbols are different but equal in the sort of the locale, consider > > them as one group. I guess doing sort of va and wb then vb and wa would > > indicate that v and w are one group, so the logic for a small function is > > not difficult. > > You expand on this algorithm in > http://www.gramps-project.org/bugs/view.php?id=2933#c9317. See > http://www.unicode.org/charts/uca/ for the main collation chart with > primary differences marked. > > I entirely agree, and plan to implement this in NarWeb. > > However, I have one problem. > > I can't find out how to determine the letter that has a primary difference > from the current letter (sorry, that's not quite the right wording, but I > am not sure how to express it). > > For example, I do a sort, and the first few names are "Ándre, Arnot", The > algorithm shows that these should be grouped together. But which letter > should be used for the index header. In this case, it should be "a" (or "A" > if I upper case everything) as this is the letter from which "Á" and "A" > have secondary or teriary differences. > > In another language, "Á" might have a primary difference from "Z", and then > the sort order would be "Andrew, Arnot, Zulu, Ándre". In this case the > index header should be "Á". So I can't just normalise the character to > remove accents etc. > > I have studied Unicode, CLDR and ICU and Googled extensively, but I can't > find out how to determine the preceding primary character! > > Can anyone help? Tim, The examle you give does use A with an accent and I think that should be sorted as A. The letter after Z, should be Å. Which from the beginning (in the middelage) was written as AA. It later became A with a small "o" above it. Similiar with Ä and Ö. The two dots was originally "e". So sort order ÅÄÖ in Swedish. Å in UNICODE is U+00C5 and å is U+00E5 Å is used in Swedish, Danish and Norwegian with very similar pronounciation. I'm not sure this helps, but note the difference between Á and Å, which can be hard to to see with some fonts. /Peter ```