[icu-design] Using CLDR data with XSLT?

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Has anyone implemented code in XSLT (with our without EXSLT
extension functions) for doing collation/sorting of XML content
using the CLDR collation data?  (That is, the data in the files in
the common/collation/ directory in the CLDR cvs source tree.)

If not, is there any documentation available (for example, a
functional specification or design specification) that describes
the behavior of an application which uses that data? Or is there a
reference implementaion of some kind?

The context for my question is: I am one of the developers
involved with the DocBook Project[1], and I am interested in
trying to see if we can find a way to make use of the CLDR
collation data in the part of the DocBook XSL stylesheets code
that deals with generating indexes.

The DocBook XSL stylesheets make use of locale data (we currently
have locale files for about 60 locales/languages). Some of that
locale data is sort of DocBook-specific -- for example, for
generating localized text for the equivalents of "Table of
Contents", "Chapter", "Section", etc. -- but some of it is just
general locale data; for example, data for generating localized
date strings.

In the case of the date-string data, a while back I revised our
build setup so that instead of having the date-string data
maintained in the source for our locale files, the build now picks
it up from the CLDR locale files. So we and our translators don't
need to maintain it separately any longer.

But the bigger issue we have is with generating collated indexes.
The DocBook XSL stylesheets automatically generate indexes based
on instances of indexterm markers in DocBook XML source content.
However, XSLT 1.0 does not itself provide a means for doing
locale-aware collation, so we needed to add a means for handling
collation in indexterms in indexes for non-English locales.

One of the project developers, Jirka Kosek, came up with a method.
It is described in a paper he presented at XML 2004[2].

However, at the time he wrote it, he was not aware of the
availability of the CLDR collation data, and the method he
developed uses data in a form that is quite different from the
CLDR data (our data is basically just a number list of characters
for all characters in the locale; characters that should be
grouped together have the same number).

That method is so far only supported for less than 10 or so of the
60 locales we have data for. (The reason is that to get it
supported in a particular locale, we need to ask our translators
to add the data in the numbered-list for Jirka's method requires,
and we so far have not done that for many locales).

So, I think our project and our users would be much better off if
we could figure out a way to replace Jirka's method with on that
relies instead on the CLDR collation data.

One big limitation we have is that the DocBook XSL stylesheets are
meant to be a "pure XSLT" solution that allows users to generate
HTML and XSL-FO output just using any XSLT engine they choose
(whether that be a C-based one such as xsltproc/libxslt, or a
Java-based one such as Saxon 6 or Xalan, or an engine implemented
in any other language).

That said, we do already make use of EXSLT extensions to XSLT 1.0
that are widely supported in most common XSLT engines (for
example, the EXSLT node-set() function). In fact, Jirka's current
index-collation method makes use of an EXSLT extension function.

So, ideally, I would hope that any replacement method we came up
with would still be just XSLT+EXSLT-based, except that it would
use the CLDR data instead of our current ad-hoc system.

  --Mike

[Apologies for posting here if this is not the appropriate list
for questions of this type. I couldn't find a specific CLDR
mailing list.]

[1] http://sourceforge.net/projects/docbook
    The current focus of the DocBook Project is work on a set of
    XSLT stylesheets for transforming DocBook XML source content
    into HTML and XSL-FO output.

[2] "Using XSLT for getting back-of-the-book indexes"
    http://www.idealliance.org/proceedings/xml04/papers/77/xslindex.html

--=20
Michael Smith
http://sideshowbarker.net/

[icu-design] Using CLDR data with XSLT?

Open Source C/C++/Java libraries from Unicode

[icu-design] Using CLDR data with XSLT?