2013/2/3 Tim Lyons <guy.linton@gmail.com>

On 1 Feb 2013, at 18:59, Benny Malengier wrote:




2013/2/1 Tim Lyons <guy.linton@gmail.com>

On 1 Feb 2013, at 15:50, Benny Malengier wrote:
I suppose if you seed a dictionary with our alphabet, with value the letters to group there, you can maintain a list with the order at the same time.

I am not sure what is the point of pre-seeding the dict with any particular alphabet. (The NarWeb does not show groupings if there is nothing in them).


When a first letter is encountered that is not in the dict, the sorted list can be used to see where it should be added. Then insert in the list, and seed the dict further.
Like this, the first encountered symbol will be used for the grouping (as it is the key in the dict), which might not be how a user of a culture would expect it, but I cannot think of a way to avoid that.

I was simply going to use the first encountered symbol as the heading for the grouping, just as you suggest here.


One could count the letter that is found most though, and use that letter as the indication of the group.

I don't think it is worth the added complexity of doing this for the possible marginal gain in some edge cases (and the possible loss in some other edge cases).



I was hoping someone would know an algorithm or interface in the ICU/Unicode/CLDR for doing what is needed - after all, this is not a unique requirement - it must occur whenever a dictionary etc is automatically constructed!

My guess is things are or grouped with the ascii characters, or are not grouped.
Which I why I suggested the preseeding with our ascii characters.

If above is true, there is not need for a difficult algorithm.

Well, no, I don't see how that works.


I have one person in my tree called Ångström. When I look this word up in my Collins English Dictionary with the thumb index, it is under the thumb index letter A. When the program encounters it, it is not in the dict, so I add it. I now have A and Å in my dict. But actually in English, both should come under the same letter.

I was in the impression you had an algorithm to determine if they needed to be grouped. So, as I assumed that, it would be known and  Ånever  added to the dict. My comments where for other new letters which are encountered for the first time.

Benny


Perhaps Peter can tell us what happens in a Swedish dictionary. Do dictionaries really have a separate heading at the end for Å? As I understand it, Å comes as a separate index entry after Z. So again, when the program encounters Å, it will be added to the dict, but in this case that would be correct.

I have attached the Unicode Common Locale Data Repository data for Swedish which should include all the extra letters after Z.

regards,
Tim.