Den Friday 01 February 2013 12.07.14 skrev Tim Lyons:

> A challenge for searching for an answer!!




> Benny Malengier wrote


> > 2009/4/24 Peter Landgren <

> >

> > peter.talken@

> >

> > >

> > That indicates that the procedure that adds these first letters must be

> > made

> > more clever:

> >

> > 1/ If symbols are different but equal in the sort of the locale, consider

> > them as one group. I guess doing sort of va and wb then vb and wa would

> > indicate that v and w are one group, so the logic for a small function is

> > not difficult.


> You expand on this algorithm in

> See

> for the main collation chart with

> primary differences marked.


> I entirely agree, and plan to implement this in NarWeb.


> However, I have one problem.


> I can't find out how to determine the letter that has a primary difference

> from the current letter (sorry, that's not quite the right wording, but I

> am not sure how to express it).


> For example, I do a sort, and the first few names are "┴ndre, Arnot", The

> algorithm shows that these should be grouped together. But which letter

> should be used for the index header. In this case, it should be "a" (or "A"

> if I upper case everything) as this is the letter from which "┴" and "A"

> have secondary or teriary differences.


> In another language, "┴" might have a primary difference from "Z", and then

> the sort order would be "Andrew, Arnot, Zulu, ┴ndre". In this case the

> index header should be "┴". So I can't just normalise the character to

> remove accents etc.


> I have studied Unicode, CLDR and ICU and Googled extensively, but I can't

> find out how to determine the preceding primary character!


> Can anyone help?


The examle you give does use A with an accent and I think that should be sorted as A.

The letter after Z, should be ┼. Which from the beginning (in the middelage) was written as AA.

It later became A with a small "o" above it.

Similiar with ─ and Í. The two dots was originally "e".

So sort order ┼─Í in Swedish.

┼ in UNICODE is U+00C5 and ň is U+00E5

┼ is used in Swedish, Danish and Norwegian with very similar pronounciation.

I'm not sure this helps, but note the difference between ┴ and ┼, which can be hard to

to see with some fonts.