I am using Saxon 9 EE on a semantic web project at Cornell University's Mann Library.

My work involves tens of thousands of organization names arrayed in a XML e.g.


<?xml version="1.0" encoding="UTF-8"?>




     <name>Abraham House</name>




     <name>ABS Global. Inc</name>







I am using <xsl:key> and key() together with a function to cut down on

the size of the sets on which I need to do isomorphic string matches.


My transform flow goes like this:


<!-- Read in the big file: -->


<xsl:variable name='extantOrgs'



<!-- Key Defn: -->


<xsl:key name='orgsKey' match='org' use='vfx:myfunc(name)'/>


<!-- Given a string to match ( i.e. rdfs:label), construct a small

list of possible candidates -->


<xsl:variable name='orglist'



<!-- Search the list: -->


<xsl:variable name='results'



<!-- do something interesting with $results -->




For example, if vfx:myfunc just returns the first n characters

of an organization name, I typically have only a few dozen iso matches to do.

As I hoped; this technique produces small searches and good performance.


I can see that there are (at least) two searches in this technique:


  1. the search of the key index and

  2. the search of $orglist


MY QUESTION concerns the design of vfx:myfunc:


Should vfx:myfunc tend to make the size of $orglist as small as possible or

just produce an approximately uniform 'set size' distribution or

is there a better way to proceed?


Thanks for your attention.




Joseph R. Mc Enerney