I suppose if you seed a dictionary with our alphabet, with value the letters to group there, you can maintain a list with the order at the same time.
When a first letter is encountered that is not in the dict, the sorted list can be used to see where it should be added. Then insert in the list, and seed the dict further.
Like this, the first encountered symbol will be used for the grouping (as it is the key in the dict), which might not be how a user of a culture would expect it, but I cannot think of a way to avoid that.
One could count the letter that is found most though, and use that letter as the indication of the group.


2013/2/1 Peter Landgren <peter.talken@telia.com>

Den Friday 01 February 2013 12.07.14 skrev Tim Lyons:

> A challenge for searching for an answer!!




> Benny Malengier wrote


> > 2009/4/24 Peter Landgren &lt;

> >

> > peter.talken@

> >

> > &gt;

> > That indicates that the procedure that adds these first letters must be

> > made

> > more clever:

> >

> > 1/ If symbols are different but equal in the sort of the locale, consider

> > them as one group. I guess doing sort of va and wb then vb and wa would

> > indicate that v and w are one group, so the logic for a small function is

> > not difficult.


> You expand on this algorithm in

> http://www.gramps-project.org/bugs/view.php?id=2933#c9317. See

> http://www.unicode.org/charts/uca/ for the main collation chart with

> primary differences marked.


> I entirely agree, and plan to implement this in NarWeb.


> However, I have one problem.


> I can't find out how to determine the letter that has a primary difference

> from the current letter (sorry, that's not quite the right wording, but I

> am not sure how to express it).


> For example, I do a sort, and the first few names are "Ándre, Arnot", The

> algorithm shows that these should be grouped together. But which letter

> should be used for the index header. In this case, it should be "a" (or "A"

> if I upper case everything) as this is the letter from which "Á" and "A"

> have secondary or teriary differences.


> In another language, "Á" might have a primary difference from "Z", and then

> the sort order would be "Andrew, Arnot, Zulu, Ándre". In this case the

> index header should be "Á". So I can't just normalise the character to

> remove accents etc.


> I have studied Unicode, CLDR and ICU and Googled extensively, but I can't

> find out how to determine the preceding primary character!


> Can anyone help?


The examle you give does use A with an accent and I think that should be sorted as A.

The letter after Z, should be Å. Which from the beginning (in the middelage) was written as AA.

It later became A with a small "o" above it.

Similiar with Ä and Ö. The two dots was originally "e".

So sort order ÅÄÖ in Swedish.

Å in UNICODE is U+00C5 and å is U+00E5

Å is used in Swedish, Danish and Norwegian with very similar pronounciation.

I'm not sure this helps, but note the difference between Á and Å, which can be hard to

to see with some fonts.


Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
Gramps-devel mailing list