Den Friday 01 February 2013 12.07.14 skrev Tim Lyons:

> A challenge for searching for an answer!!

>

>

>

> Benny Malengier wrote

>

> > 2009/4/24 Peter Landgren <

> >

> > peter.talken@

> >

> > >

> > That indicates that the procedure that adds these first letters must be

> > made

> > more clever:

> >

> > 1/ If symbols are different but equal in the sort of the locale, consider

> > them as one group. I guess doing sort of va and wb then vb and wa would

> > indicate that v and w are one group, so the logic for a small function is

> > not difficult.

>

> You expand on this algorithm in

> http://www.gramps-project.org/bugs/view.php?id=2933#c9317. See

> http://www.unicode.org/charts/uca/ for the main collation chart with

> primary differences marked.

>

> I entirely agree, and plan to implement this in NarWeb.

>

> However, I have one problem.

>

> I can't find out how to determine the letter that has a primary difference

> from the current letter (sorry, that's not quite the right wording, but I

> am not sure how to express it).

>

> For example, I do a sort, and the first few names are "┴ndre, Arnot", The

> algorithm shows that these should be grouped together. But which letter

> should be used for the index header. In this case, it should be "a" (or "A"

> if I upper case everything) as this is the letter from which "┴" and "A"

> have secondary or teriary differences.

>

> In another language, "┴" might have a primary difference from "Z", and then

> the sort order would be "Andrew, Arnot, Zulu, ┴ndre". In this case the

> index header should be "┴". So I can't just normalise the character to

> remove accents etc.

>

> I have studied Unicode, CLDR and ICU and Googled extensively, but I can't

> find out how to determine the preceding primary character!

>

> Can anyone help?

Tim,

The examle you give does use A with an accent and I think that should be sorted as A.

The letter after Z, should be ┼. Which from the beginning (in the middelage) was written as AA.

It later became A with a small "o" above it.

Similiar with ─ and Í. The two dots was originally "e".

So sort order ┼─Í in Swedish.

┼ in UNICODE is U+00C5 and ň is U+00E5

┼ is used in Swedish, Danish and Norwegian with very similar pronounciation.

I'm not sure this helps, but note the difference between ┴ and ┼, which can be hard to

to see with some fonts.

/Peter