|
From: Michael S. <sm...@xm...> - 2005-10-29 11:02:08
|
Has anyone implemented code in XSLT (with our without EXSLT extension functions) for doing collation/sorting of XML content using the CLDR collation data? (That is, the data in the files in the common/collation/ directory in the CLDR cvs source tree.) If not, is there any documentation available (for example, a functional specification or design specification) that describes the behavior of an application which uses that data? Or is there a reference implementaion of some kind? The context for my question is: I am one of the developers involved with the DocBook Project[1], and I am interested in trying to see if we can find a way to make use of the CLDR collation data in the part of the DocBook XSL stylesheets code that deals with generating indexes. The DocBook XSL stylesheets make use of locale data (we currently have locale files for about 60 locales/languages). Some of that locale data is sort of DocBook-specific -- for example, for generating localized text for the equivalents of "Table of Contents", "Chapter", "Section", etc. -- but some of it is just general locale data; for example, data for generating localized date strings. In the case of the date-string data, a while back I revised our build setup so that instead of having the date-string data maintained in the source for our locale files, the build now picks it up from the CLDR locale files. So we and our translators don't need to maintain it separately any longer. But the bigger issue we have is with generating collated indexes. The DocBook XSL stylesheets automatically generate indexes based on instances of indexterm markers in DocBook XML source content. However, XSLT 1.0 does not itself provide a means for doing locale-aware collation, so we needed to add a means for handling collation in indexterms in indexes for non-English locales. One of the project developers, Jirka Kosek, came up with a method. It is described in a paper he presented at XML 2004[2]. However, at the time he wrote it, he was not aware of the availability of the CLDR collation data, and the method he developed uses data in a form that is quite different from the CLDR data (our data is basically just a number list of characters for all characters in the locale; characters that should be grouped together have the same number). That method is so far only supported for less than 10 or so of the 60 locales we have data for. (The reason is that to get it supported in a particular locale, we need to ask our translators to add the data in the numbered-list for Jirka's method requires, and we so far have not done that for many locales). So, I think our project and our users would be much better off if we could figure out a way to replace Jirka's method with on that relies instead on the CLDR collation data. One big limitation we have is that the DocBook XSL stylesheets are meant to be a "pure XSLT" solution that allows users to generate HTML and XSL-FO output just using any XSLT engine they choose (whether that be a C-based one such as xsltproc/libxslt, or a Java-based one such as Saxon 6 or Xalan, or an engine implemented in any other language). That said, we do already make use of EXSLT extensions to XSLT 1.0 that are widely supported in most common XSLT engines (for example, the EXSLT node-set() function). In fact, Jirka's current index-collation method makes use of an EXSLT extension function. So, ideally, I would hope that any replacement method we came up with would still be just XSLT+EXSLT-based, except that it would use the CLDR data instead of our current ad-hoc system. --Mike [Apologies for posting here if this is not the appropriate list for questions of this type. I couldn't find a specific CLDR mailing list.] [1] http://sourceforge.net/projects/docbook The current focus of the DocBook Project is work on a set of XSLT stylesheets for transforming DocBook XML source content into HTML and XSL-FO output. [2] "Using XSLT for getting back-of-the-book indexes" http://www.idealliance.org/proceedings/xml04/papers/77/xslindex.html --=20 Michael Smith http://sideshowbarker.net/ |