Den Friday 01 February 2013 12.07.14 skrev Tim Lyons:
> A challenge for searching for an answer!!
> Benny Malengier wrote
> > 2009/4/24 Peter Landgren <
> > peter.talken@
> > >
> > That indicates that the procedure that adds these first letters must be
> > made
> > more clever:
> > 1/ If symbols are different but equal in the sort of the locale, consider
> > them as one group. I guess doing sort of va and wb then vb and wa would
> > indicate that v and w are one group, so the logic for a small function is
> > not difficult.
> You expand on this algorithm in
> http://www.gramps-project.org/bugs/view.php?id=2933#c9317. See
> http://www.unicode.org/charts/uca/ for the main collation chart with
> primary differences marked.
> I entirely agree, and plan to implement this in NarWeb.
> However, I have one problem.
> I can't find out how to determine the letter that has a primary difference
> from the current letter (sorry, that's not quite the right wording, but I
> am not sure how to express it).
> For example, I do a sort, and the first few names are "┴ndre, Arnot", The
> algorithm shows that these should be grouped together. But which letter
> should be used for the index header. In this case, it should be "a" (or "A"
> if I upper case everything) as this is the letter from which "┴" and "A"
> have secondary or teriary differences.
> In another language, "┴" might have a primary difference from "Z", and then
> the sort order would be "Andrew, Arnot, Zulu, ┴ndre". In this case the
> index header should be "┴". So I can't just normalise the character to
> remove accents etc.
> I have studied Unicode, CLDR and ICU and Googled extensively, but I can't
> find out how to determine the preceding primary character!
> Can anyone help?
The examle you give does use A with an accent and I think that should be sorted as A.
The letter after Z, should be ┼. Which from the beginning (in the middelage) was written as AA.
It later became A with a small "o" above it.
Similiar with ─ and Í. The two dots was originally "e".
So sort order ┼─Í in Swedish.
┼ in UNICODE is U+00C5 and ň is U+00E5
┼ is used in Swedish, Danish and Norwegian with very similar pronounciation.
I'm not sure this helps, but note the difference between ┴ and ┼, which can be hard to
to see with some fonts.