I think there are too many unknowns to give a reasonable answer. I would say it's best to make the key as selective as possible, unless the time taken to build the index is becoming excessive - one of the variables is how often the index is being used after constructing it.

As always, the best way -- perhaps the only way -- to answer performance questions is by measurement.

Michael Kay

On 15/04/2011 18:11, Joseph R. McEnerney wrote:

I am using Saxon 9 EE on a semantic web project at Cornell University's Mann Library.

My work involves tens of thousands of organization names arrayed in a XML e.g.


<?xml version="1.0" encoding="UTF-8"?>




     <name>Abraham House</name>




     <name>ABS Global. Inc</name>







I am using <xsl:key> and key() together with a function to cut down on

the size of the sets on which I need to do isomorphic string matches.


My transform flow goes like this:


<!-- Read in the big file: -->


<xsl:variable name='extantOrgs'



<!-- Key Defn: -->


<xsl:key name='orgsKey' match='org' use='vfx:myfunc(name)'/>


<!-- Given a string to match ( i.e. rdfs:label), construct a small

list of possible candidates -->


<xsl:variable name='orglist'



<!-- Search the list: -->


<xsl:variable name='results'



<!-- do something interesting with $results -->




For example, if vfx:myfunc just returns the first n characters

of an organization name, I typically have only a few dozen iso matches to do.

As I hoped; this technique produces small searches and good performance.


I can see that there are (at least) two searches in this technique:


  1. the search of the key index and

  2. the search of $orglist


MY QUESTION concerns the design of vfx:myfunc:


Should vfx:myfunc tend to make the size of $orglist as small as possible or

just produce an approximately uniform 'set size' distribution or

is there a better way to proceed?


Thanks for your attention.




Joseph R. Mc Enerney


------------------------------------------------------------------------------ Benefiting from Server Virtualization: Beyond Initial Workload Consolidation -- Increasing the use of server virtualization is a top priority.Virtualization can reduce costs, simplify management, and improve application availability and disaster protection. Learn more about boosting the value of server virtualization. http://p.sf.net/sfu/vmware-sfdev2dev
_______________________________________________ saxon-help mailing list archived at http://saxon.markmail.org/ saxon-help@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/saxon-help