Menu

#222 add @sortKey to <bibl>

AMBER
closed-accepted
nobody
3
2011-08-18
2010-04-09
No

An FR emerging from group work at the Brown U. WWP Advanced Seminar on TEI Encoding: add optional @sortKey to the content model for <bibl>, to allow sorting of bibliographic entries that are not already ordered. (One cannot always do an automated sort based on author or other components as not every <bibl> element will have the same subcomponents.)

Currently, <tem> is the only TEI element to take @sortKey

Discussion

1 2 > >> (Page 1 of 2)
  • David Sewell

    David Sewell - 2010-04-09

    Sorry, last sentence should read "Currently, <term> is the only TEI element to take @sortKey"

     
  • David Sewell

    David Sewell - 2010-04-09
    • priority: 5 --> 3
     
  • Lou Burnard

    Lou Burnard - 2010-05-09
    • milestone: --> AMBER
     
  • Lou Burnard

    Lou Burnard - 2010-05-09

    Apologies for overlooking this ticket when the last list was prepared for council. However, I don't think the proposal as it stands is very persuasive: sorting of <bibl> entries is an application specific process -- you might want to produce a list ordered by title, or author, or subject, or any number of other keys, deriving them all from the same set of bibls so why should the bibl itself dictate to you what its sort key should be? What is the use case? I can see why you might want to specify a normalised value on e.g. author or title, but that is already possible. Your sorting procedure needs to be sophisticated enough to deal with missing cases, iI agree.

     
  • Martin Holmes

    Martin Holmes - 2010-08-13

    Actually I like this idea, especially if it could be extended to <biblStruct> and <biblFull>. In an ideal world, our XQuery or XSLT should be able to handle sorting, but in practice, bibliographies containing (for instance) many languages, with a large variety of name prefixes or demonstratives that need to be ignored during sorting make this remarkably hard. I would find this extremely useful, especially when writing tools for combining bibliographies from different sources.

     
  • Lou Burnard

    Lou Burnard - 2010-09-13

    Proposal is to define a class att.sortable, which supplies an attribute @sortKey, with members <term>, <bibl>, <biblStruct>.

     
  • Lou Burnard

    Lou Burnard - 2010-09-13

    Proposal is to define a class att.sortable, which supplies an attribute @sortKey, with members <term>, <bibl>, <biblStruct>.

     
  • Martin Holmes

    Martin Holmes - 2010-09-13

    Will people then ask for other elements to be added to this class? I can imagine <item> and its correlates also benefiting from @sortKey.

     
  • Nobody/Anonymous

    Using sort key to <author> is fine, but not this. Its abuse, when more info is needed

     
  • Kevin Hawkins

    Kevin Hawkins - 2010-09-13

    I would like to amend Lou's proposal to add <biblFull> to att.sortable as well. If <bibl> and <biblStruct> will be in it, <biblFull> should be as well.

    The sortkey= atttribute on <term> lets you supply a string, which software would process for a set of <term>s in order to know how to sort them. However, you're still dependent on particular sorting algorithms to arrange these: while most of us agree "A" comes before "B", does "a" come before "B"? How about a with an umlaut over it? I don't think we can really express desired order in the encoding aside from the order in which <bibl> or other elements appear. If you're combining from various sources, you'll need to process the content of the <bibl> or other elements.

    In short, I think sortKey= on <term> is a mistake and an unsure I want to extend this to other elements.

     
  • Martin Holmes

    Martin Holmes - 2010-09-13

    A sort key is a useful way of avoiding some of the pitfalls that come from trying to write generic sort algorithms, so its availability actually mitigates the problems Kevin mentions in his second paragraph. I agree that <biblFull> should also be a member.

    The comment before Kevin's is completely inscrutable to me. Could the commenter explain in more detail?

     
  • Nobody/Anonymous

    The value of @sortKey is a string, not an integer. So it can contain "A", "B", "a", or "a with umlaut", just like the character data. I don't see how this helps the problem since sorting on this value will still depend on local sorting algorithms.

     
  • Kevin Hawkins

    Kevin Hawkins - 2010-09-14

    I wrote that last comment but had forgotten to log in.

     
  • James Cummings

    James Cummings - 2010-09-14

    Kevin: I think the point (though others might understand it better than I) is that in @sortKey you would not put any characters you thought your sorting /processing needs would have any problem with. So for most western languages this would be basic ascii alphanumeric.

    So say we have the place name Jyväskylä the @sortKey value might be 'jyvaskyla' here conveying a couple pieces of metadata: a) that we're not sorting on capitalization and b) that ä should be treated as 'a'. It would, of course, be perhaps better if this was just declared somewhere generally and we say for sorting purposes all 'ä' should be treated as 'a'. However, then you'd still need a local mechanism to override this where it wasn't the case. @sortKey, I'm assuming, would be that mechanism.

    After reading this I'm in favour of a att.sorting class and it being able to be added to other things like bibl, biblStruct, term, item, etc.

     
  • Sebastian Rahtz

    Sebastian Rahtz - 2010-09-14

    I really think this is misguided. First choose the components by which you want to sort (year, author, place of pub, whatever), then identify the sort key for each of them and apply your algorithm. For the difficult ones like name, we supply a way to help with this already. So if att.sortable is created, it should be for the individual components, not the container.

    Sorting things like Jyväskylä really is pretty well understood now, I'd have thought. Going back to the bad old days of supplying effectively ascii transcriptions to provide sort keys is ignoring our commitment to Unicode. You supply a sort key not to sort out non-ascii characters, but to say that "Matilde Barroca Rahtz" should sort under Barroca, not Rahtz. Ie it is informed by local cultural knowledge about that author.

     
  • James Cummings

    James Cummings - 2010-09-14

    Fair enough... but you're just suggesting my example was misguided and I could agree that it is...and that the proposal should still go ahead to allow sorting by Barroca.

     
  • Sebastian Rahtz

    Sebastian Rahtz - 2010-09-14

    no. making Matilde sort by Barroca not Rahtz is a properly of the name, not the enclosing <bibl>. the @sort attribute on the components is a numeric ranker; deciding whether
    "Barroca" sorts before "braams" is up to local processing and conventions, but deciding whether to take Barroca before Rahtz is something that lives within the name component.

     
  • Martin Holmes

    Martin Holmes - 2010-09-14

    I thought the point here was that, while some items may need to be sorted by one element (say <author>), others might need to be sorted by something else (<title>, <editor> or whatever). It is possible to write rather tortuous code to handle this sort of thing, but it's painful and prone to error; a sort key gives us the option to specify in a simple manner how to sort elements.

    I'd use this regularly.

     
  • Laurent Romary

    Laurent Romary - 2010-09-14

    That's exactly the way I had understood the issue (I mean, like Martin). The key is there to provide a reference "key" wherever it comes from (authors in the simple case, publisher for catalogues, title for ISO standards, etc.).

     
  • Sebastian Rahtz

    Sebastian Rahtz - 2010-09-14

    Deciding whether to sort by title or author is something the user decides, not the document. Uf you know the right sort order, make it so in the source.... But you don't know what order _I_ want it in.

     
  • Sebastian Rahtz

    Sebastian Rahtz - 2010-09-14

    Laurent: a reference key is not a sort key. What you describe sounds like an xml:id to me.....

     
  • Martin Holmes

    Martin Holmes - 2010-09-14

    I think Sebastian is taking a rather narrow view of how people use (and want to use) TEI. With born-digital documents, or where TEI is being used as a way to store and manipulate information that's subject to regular changes and updates, there's definitely a need for something like this. For instance, imagine a large and growing bibliography, which has complex <biblFull>s or <biblStruct>s. You're adding new stuff to this all the time, and providing a feed on a web site somewhere which has the biblio sorted according to (say) some standard such as MLA. Sometimes items have authors and need to be sorted by them; sometimes editors; sometimes there are no such elements, and sorting needs to be done by title, or by website name, or by something else. You don't want to be trying to maintain complicated XSLT that needs to be regularly updated to try to figure out what element should be used as the sort key, nor do you want the burden of having to sort the items manually whenever adding new content or editing the list. A sort key is perfect for this scenario.

    If the user then wants to sort by some other principle, then either providing access to the XML or rendering it into a table with clickable column heads will of course allow this; but the default sort order specified by the author(s) of the bibliography can be done most simply and sensibly with a sort key, IMHO.

     
  • Kevin Hawkins

    Kevin Hawkins - 2010-09-14

    There are a few different types of sorting that we need to distinguish:

    A) Given strings of characters, how do you sort these? People sometimes rely on the codepoints in a character encoding, but even if we use Unicode codepoints, we must acknowledge that there is no universal way to sort characters in a given script. (Do you sort "a with an umlaut" with As or after Z?) Practice varies by language, by country, and possibly by publication or catalog (e.g., when producing a bibliography, you might choose to interfile Latin and Cyrillic names or have all Latin followed by all Cyrillic.) See http://www.unicode.org/reports/tr10/ . To me, @sortKey does not really help with this.

    B) Given certain lexical items, do you ignore the usual sorting rules? Do you ignore initial articles in titles and in surnames containing articles? Do you file "McDonnell and "MacDonnell" the same and interfile "San", "Santo", and "Santa"? See http://en.wikipedia.org/wiki/Collation#Alphabetical_order . There is no clear division between this and (A) since there are borderlien cases (like hyphens and spaces) which are not clearly case of (A) or (B). To me, @sortKey has the potential to help here, though using it means you are imposing your desired sorting order on the encoded text.

    C) Given a bunch of personal names such as "Matilde Barroca Rahtz" where you want to produce an alphabetical list of names, do you file under "Rahtz" or "Barroca"? Practice here varies by language, by country (and within countries), and possibly by publication or catalog. The same name may be treated differently in different countries -- not just because of cultural ignorance but intentionally. See http://wikis.ala.org/professionaltips/index.php/Filing_rules . To impose your desired sorting order, you could use @sortKey on a bibl/biblStruct/biblFull (assuming that citations are only to be sorted by name) or on an <author> (if the name components have not been tagged but you want to allow sorting by name).

    D) Given a bunch of bibliographi citations, should you file them under the name of an author, editor, of the title (when no author or editor is given). This depends first on what components of the bibliographic citation are present and second on the practice of that publication or catalog. You could impose your desired order with @sortKey on bibl/biblStruct/biblFull.

    ***

    If you're encoding a source document, you are probably content to keep the order of citations or names given there. However, if you are using TEI to store names or bibliographic citations for output in various formats or to be sorted by the user in various ways, you might want to use something like @sortKey to help a sorting algorithm. By doing so, you would end up imposing your way of sorting, but maybe this is exactly what you want to do. If you were going to impose a single way to sort citations, you would put this attribute on bibl/biblStruct/biblFull'.

    On the other hand, if you know citations will be sorted in various ways and you want to give instructions on how to sort by author and title (when these are present in a citation), you would want to be able to provide sort instructions on <author>, <editor>, and <title> (yet also have processing instructions that know what to do when an author or editor is absent). Maybe, instead of using author@key and editor@key (as some people do to handle this), we should add @sortKey to <author>, <editor>, and <title> in addition to <bibl>, <biblStruct>, and <biblFull>?

     
  • BODARD Gabriel

    BODARD Gabriel - 2010-09-16

    I think the valuable use-case for this (which I like and approve the proposal to create att.sortable) is not where we're not sure how to sort a name or a Unicode character or an ignore-word, but where convention requires an item to be sorted in a way different from the rules normally applied to this set.

    As a bibliographical example (real use-cases), I want to sort <bibl><editor>Reynolds</editor><author>Goodchild</author></bib> under "Goodchild", but <bibl><editor>Reynolds</editor><author>Pacho</author></bibl> under "Reynolds"; <bibl><editor>Biville</editor><title>Bilinguisme</title></bibl> under "Biville", but <bib><editor>Chaniotis</editor><title>Supplementum Epigraphicum Graecum</title></bibl> under "SEG".

    (I suppose an @sortBy attribute might also be useful, simply to point to the element(s) to use in sorting?)

     
  • Martin Holmes

    Martin Holmes - 2011-08-18

    Should be closed because superceded by #3393989.

     
1 2 > >> (Page 1 of 2)