From: Wolfgang M. <sb...@we...> - 2004-09-24 13:24:21
|
Hi, eXist finally supports collations! Sorting and all string comparison operat= ors=20 have been modified to use a default collation if specified in the XQuery.=20 Also, a specific collation can be defined for each order spec in an "order= =20 by" clause. The optional collation parameter allowed by most of the string= =20 functions is not implemented yet, but the default collation will be observe= d. I would be happy if users with knowledge in other languages could help to t= est=20 the functionality. I guess, the languages I speak have rather simple=20 rules ;-) Please have a look at the current CVS or today's snapshot. The syntax to set the default collation is: declare default collation collation-uri; eXist recognizes the following URIs: 1) http://www.w3.org/2004/07/xpath-functions/collation/codepoint Selects the unicode codepoint collation. This is the default if no collatio= n=20 is specified. Basically, it means that only the standard Java implementatio= ns=20 of the comparison and string search functions are used. 2) http://exist-db.org/collation?lang=3Dxxx&strength=3Dxxx&decomposition=3D= xxx or just ?lang=3Dxxx&strength=3Dxxx&decomposition=3Dxxx lang selects a locale. The parameter should have the same form as in xml:la= ng,=20 for example: "de" or "de-DE" to select a german locale. strength (optional): value should be one of "primary", "secondary", "tertia= ry"=20 or "identical". decomposition (optional): one of "none", "full" or "standard". I don't really know all the implications of these parameters. Please check = the=20 Java documentation for java.text.Collator. Examples: 1) the collation can be specified for each of sort expression in an FLWR: for $w in=20 ("das", "da=DF", "Buch", "B=FCcher", "Bauer", "B=E4uerin", "Jagen", "J=E4ge= r") order by $w collation "?lang=3Dde-DE" return $w returns: Bauer, B=E4uerin, Buch, B=FCcher, das, da=DF, Jagen, J=E4ger Without specifying the collation, it returns: Bauer, Buch, B=E4uerin, B=FCcher, Jagen, J=E4ger, das, da=DF 2) but it also changes the behaviour of string comparisons: declare default collation "?lang=3Dde-DE"; "B=E4uerin" < "Bier" returns "true". If you just use the default codepoint collation, it returns= =20 "false". Wolfgang |