Thread: [Refdb-users] citestyle's BIBLIOSEQUENCE sorts upper and lower alpha separately
Status: Beta
Brought to you by:
mhoenicka
|
From: Jeremy M. <Je...@Ma...> - 2007-04-17 04:38:27
|
It there any way for RefDB not to sort bibliography entries in this way? Zippay, A ... Zittrain, Jonathan ... Zumbansen, P ... de Montesquieu, Charles ... de la Chapelle, Bertrand ... van Dijk, Jan ... von Hippel, Eric ... TIA -- Jeremy Malcolm LLB (Hons) B Com Internet and Open Source lawyer, IT consultant, actor host -t NAPTR 1.0.8.0.3.1.2.9.8.1.6.e164.org|awk -F! '{print $3}' |
|
From: Markus H. <mar...@mh...> - 2007-04-17 08:37:53
|
Hi Jeremy, Quoting Jeremy Malcolm <Je...@Ma...>: > It there any way for RefDB not to sort bibliography entries in this way? > > Zippay, A ... > Zittrain, Jonathan ... > Zumbansen, P ... > de Montesquieu, Charles ... > de la Chapelle, Bertrand ... > van Dijk, Jan ... > von Hippel, Eric ... > There is no such way out of the box, but I see that we need one. Would =20 you be kind enough to create a feature request on SourceForge? I see two solutions to this problem. The more limited approach is to =20 make the ORDER BY case-insensitive (SQLite and newer versions of MySQL =20 allow this via collations; PostgreSQL apparently requires uppercasing =20 or something). I guess this could be part of the next prerelease, =20 unless I miss some basic problem with this approach. However, this is may be insufficient to solve sorting problems with =20 special characters like umlauts or accented characters, as these are =20 often sorted outside a-z. Therefore, some bibliographic data formats =20 like MODS carry extra fields that describe where a name or a title =20 should appear in a sorted list. Just like titles are often sorted by =20 the first non-article word instead of just lexically (ain't no fun to =20 scan through half a million titles starting with "The"), the compound =20 name "van Dijk" in your example can appear either close to =20 "Dijkstra,A." or close to "Vandijk,A.". I'm sure this distinction =20 depends on the bibliography style as well. In any case, we'd have to =20 add a field that records a normalized name string (case insensitive, =20 with umlauts and accents replaced in an intelligent fashion) to =20 provide a real solution to the sorting problem. regards, Markus --=20 Markus Hoenicka mar...@ca... (Spam-protected email: replace the quadrupeds with "mhoenicka") http://www.mhoenicka.de |
|
From: <Jus...@Pi...> - 2007-04-17 10:52:02
|
Markus Hoenicka <mar...@mh...> wrote on Tue, 17 Apr
2007 10:37:49 +0200:
> we'd have to add a field that records a normalized name string (case
> insensitive, with umlauts and accents replaced in an intelligent
> fashion) to provide a real solution to the sorting problem.
Not sure about this... As you said, the following issues arise in
sorting:
1. multi-part keys and their respective influence on sorting order
("van Dijk", "The Title of This Article"),
2. character encoding,
3. localization.
The first and third issues depend on the context (language,
bibliography style, ...) and thus cannot be resolved by sort-key
normalization.
The second and third issues are addressed simultaneously by the POSIX
localization mechanism.
Thus, I think bibliography sorting should, if possible, be done by
POSIX-compliant sorting (LC_COLLATE, sort(1)) of keys normalized by
the bibliography style.
Justus
|
|
From: Markus H. <mar...@mh...> - 2007-04-17 13:16:32
|
Quoting Jus...@Pi...:
> Not sure about this... As you said, the following issues arise in
> sorting:
>
> 1. multi-part keys and their respective influence on sorting order
> ("van Dijk", "The Title of This Article"),
>
> 2. character encoding,
>
> 3. localization.
>
> The first and third issues depend on the context (language,
> bibliography style, ...) and thus cannot be resolved by sort-key
> normalization.
>
I see. An umlaut in an English document needs a normalization to sort =20
properly, whereas the same umlaut in a German document doesn't. OTOH =20
is it reasonable to expect e.g. from an XSLT-based bibliography =20
formatting system to correctly identify the first word or words =20
relevant for sorting names or titles? In any language? I somehow feel =20
that this should be done by a human.
> The second and third issues are addressed simultaneously by the POSIX
> localization mechanism.
>
> Thus, I think bibliography sorting should, if possible, be done by
> POSIX-compliant sorting (LC_COLLATE, sort(1)) of keys normalized by
> the bibliography style.
XSLT apparently has all provisions to do a POSIX-compliant sorting, =20
whereas DSSSL almost certainly doesn't (so I'm screwed again, for the =20
time being).
regards,
Markus
--=20
Markus Hoenicka
mar...@ca...
(Spam-protected email: replace the quadrupeds with "mhoenicka")
http://www.mhoenicka.de
|
|
From: <Jus...@Pi...> - 2007-04-17 18:34:14
|
Markus Hoenicka <mar...@mh...> wrote on Tue, 17 Apr 2007 15:16:27 +0200: > is it reasonable to expect e.g. from an XSLT-based bibliography > formatting system to correctly identify the first word or words > relevant for sorting names or titles? For names, I think the proper way would be to represent their different parts separately (as BibTeX does it implictly through obscure syntax conventions); they could then be accessed individually by style sheets. For titles, one would have to add markup to identify the sort-key portion, as you said. (But who would want to sort by titles anyway :-/.) Justus |
|
From: Jeremy M. <Je...@Ma...> - 2007-04-18 11:43:39
|
Jeremy Malcolm wrote: > It there any way for RefDB not to sort bibliography entries in this way? > > Zippay, A ... > Zittrain, Jonathan ... > Zumbansen, P ... > de Montesquieu, Charles ... > de la Chapelle, Bertrand ... > van Dijk, Jan ... > von Hippel, Eric ... Another problem is that if no primary author is specified but a secondary author is specified (generally because you are citing a book that has editor/s rather than author/s), it lists all of those references at the start of the bibliography, before the letter A. The preferred behaviour would obviously be to interleave them with the other references. -- Jeremy Malcolm LLB (Hons) B Com Internet and Open Source lawyer, IT consultant, actor host -t NAPTR 1.0.8.0.3.1.2.9.8.1.6.e164.org|awk -F! '{print $3}' |
|
From: Markus H. <mar...@mh...> - 2007-04-18 13:04:23
|
Quoting Jeremy Malcolm <Je...@Ma...>: > Another problem is that if no primary author is specified but a > secondary author is specified (generally because you are citing a book > that has editor/s rather than author/s), it lists all of those > references at the start of the bibliography, before the letter A. The > preferred behaviour would obviously be to interleave them with the other > references. > This is probably due to my (or your?) misunderstanding of how a book =20 should be encoded in RIS. I've assumed that the names printed on the =20 cover go into A1. Compare with book chapter: The author of the chapter =20 goes into A1, the book title into T1, and the book editors into A2. =20 Now if the book title goes into T1 in a whole book reference, I =20 assumed it to be straightforward to have the names in A1, regardless =20 of whether they act as authors or editors. You apparently prefer to =20 put authors into A1 and editors into A2. This is the usual mess caused =20 by RIS mixing up the orthogonal concepts of bibliographic levels and a =20 person's responsibility. I'm sure that a more sane data format will allow to avoid this in the =20 future, but for the time being I'll try to have RefDB check for =20 missing primary authors and use secondary ones whenever this makes =20 sense. regards, Markus --=20 Markus Hoenicka mar...@ca... (Spam-protected email: replace the quadrupeds with "mhoenicka") http://www.mhoenicka.de |
|
From: Markus H. <mar...@mh...> - 2007-04-18 13:07:52
|
Quoting Markus Hoenicka <mar...@mh...>: > cover go into A1. Compare with book chapter: The author of the chapter > goes into A1, the book title into T1, and the book editors into A2. please read as: > cover go into A1. Compare with book chapter: The author of the chapter > goes into A1, the book title into *T2*, and the book editors into A2. -- Markus Hoenicka mar...@ca... (Spam-protected email: replace the quadrupeds with "mhoenicka") http://www.mhoenicka.de |
|
From: Jeremy M. <Je...@Ma...> - 2007-04-24 14:03:24
|
> Quoting Jeremy Malcolm <Jeremy <at> Malcolm.id.au>: > > > Another problem is that if no primary author is specified but a > > secondary author is specified (generally because you are citing a > > book that has editor/s rather than author/s), it lists all of > > those references at the start of the bibliography, before the letter > > A. The preferred behaviour would obviously be to interleave them > > with the other references. > > This is probably due to my (or your?) misunderstanding of how a book > should be encoded in RIS. I've assumed that the names printed on the > cover go into A1. Firstly, sorry for the delay in following up. I had disabled mail delivery from this list for a while and thought I had re-enabled it, but Sourceforge is a bit flaky that way. Anyway, I'm sure it is my misunderstanding, but the reason why I do it that way is because I need RefDB to print (ed) or (eds) after their names if they are editors, and it won't do that unless I put them into A2 rather than A1. -- Jeremy Malcolm LLB (Hons) B Com Internet and Open Source lawyer, IT consultant, actor host -t NAPTR 1.0.8.0.3.1.2.9.8.1.6.e164.org|awk -F! '{print $3}' |
|
From: Markus H. <mar...@mh...> - 2007-04-24 14:13:51
|
Quoting Jeremy Malcolm <Je...@Ma...>: > Anyway, I'm sure it is my misunderstanding, but the reason why I do it > that way is because I need RefDB to print (ed) or (eds) after their > names if they are editors, and it won't do that unless I put them into > A2 rather than A1. > I see. This is a part of the RIS hell that needs to be fixed with an extended data model. I've started working on these issues. Sorting "van Beethoven" before "ZZ Top" seems to work now, but I haven't got round to address the missing A1 in BOOKs issue yet. This will all show up in the next prerelease. regards, Markus -- Markus Hoenicka mar...@ca... (Spam-protected email: replace the quadrupeds with "mhoenicka") http://www.mhoenicka.de |
|
From: Markus H. <mar...@mh...> - 2007-04-27 23:18:46
|
Jeremy Malcolm writes: > Anyway, I'm sure it is my misunderstanding, but the reason why I do it > that way is because I need RefDB to print (ed) or (eds) after their > names if they are editors, and it won't do that unless I put them into > A2 rather than A1. > Please have a look at 0.9.9-pre2 if you find some time. I'd like to know whether the most recent changes to the bibliography sorting stuff also fix your problem. regards, Markus -- Markus Hoenicka mar...@ca... (Spam-protected email: replace the quadrupeds with "mhoenicka") http://www.mhoenicka.de |
|
From: Jeremy M. <Je...@Ma...> - 2007-04-29 11:19:58
|
Markus Hoenicka wrote: > Jeremy Malcolm writes: > > Anyway, I'm sure it is my misunderstanding, but the reason why I do it > > that way is because I need RefDB to print (ed) or (eds) after their > > names if they are editors, and it won't do that unless I put them into > > A2 rather than A1. > > Please have a look at 0.9.9-pre2 if you find some time. I'd like to > know whether the most recent changes to the bibliography sorting stuff > also fix your problem. Yes! Perfect now. Thank you! -- Jeremy Malcolm LLB (Hons) B Com Internet and Open Source lawyer, IT consultant, actor host -t NAPTR 1.0.8.0.3.1.2.9.8.1.6.e164.org|awk -F! '{print $3}' |
|
From: Bruce D'A. <bda...@gm...> - 2007-04-23 14:34:18
|
On 4/17/07, Jus...@pi... <Jus...@pi...> wrote: > Markus Hoenicka <mar...@mh...> wrote on Tue, 17 Apr > 2007 15:16:27 +0200: > > > is it reasonable to expect e.g. from an XSLT-based bibliography > > formatting system to correctly identify the first word or words > > relevant for sorting names or titles? > > For names, I think the proper way would be to represent their > different parts separately (as BibTeX does it implictly through > obscure syntax conventions); they could then be accessed individually > by style sheets. But sorting conventions vary by locale, so you can't rely on this. In Asia (and indeed, in Western Europe when dealing with Asian names), for example, you sort the same way you display. E.g. "Mao Zedong" sorts like "Mao Zedong." I've come around to believing that it's easier and more straightforward for people records to have an explicit sort-string property, as vCard does. Bruce |
|
From: <Jus...@Pi...> - 2007-04-23 17:13:29
|
"Bruce D'Arcus" <bda...@gm...> wrote on Mon, 23 Apr 2007 10:34:11 -0400: > But sorting conventions vary by locale, What I meant and neglected to say is that sorting conventions are usually imposed by the required bibliography style, and that the sorting locale should thus be set by the bibliography style definition. > I've come around to believing that it's easier and more > straightforward for people records to have an explicit sort-string > property, as vCard does. I would love this to be true, but is it? Does the entire world agree on how to sort names beginning with Van, von, ... (D'...)? Justus |
|
From: Bruce D'A. <bda...@gm...> - 2007-04-24 15:43:08
|
On 4/23/07, Jus...@pi... <Jus...@pi...> wrote: > "Bruce D'Arcus" <bda...@gm...> wrote on Mon, 23 Apr 2007 > 10:34:11 -0400: > > > But sorting conventions vary by locale, > > What I meant and neglected to say is that sorting conventions are > usually imposed by the required bibliography style, and that the > sorting locale should thus be set by the bibliography style > definition. True, but sorting still depends on the name in question. If I have a style that says sort on "last name" in English, "Mao Zedong" still sorts on the first name, which is the family name. > > I've come around to believing that it's easier and more > > straightforward for people records to have an explicit sort-string > > property, as vCard does. > > I would love this to be true, but is it? Does the entire world agree > on how to sort names beginning with Van, von, ... (D'...)? But here's the complication as I understand it. The sorting convention of articulars is not just a question of the display locale, but of the origin of the name itself. The last I looked into this, some European languages would sort on the articulars, and others not. And I presume that they might be mixed in a bibliography list. Note: I'm not 100% sure of this and would like to clarify, but certainly doing simple culturally-specific name parts like first and last is not at all international-friendly. And even family, given , etc. the introduces the issues of how you sort different kinds of names. Bruce |