From: Christian W. <cwi...@gm...> - 2020-12-02 23:52:14
|
Dear Joe, Thank you for your help. The bug was between my ears, eXist-db works as expected :-) All the best, Christian On 03/12/2020 04.01, Joe Wicentowski wrote: > Hi Christian, > > Could you please try: > > //tei:p[ngram:contains(., "CD.EF")] > > Instead of: > > //ngram:contains(., "CD.EF") > > Joe > > On Wed, Dec 2, 2020 at 4:41 AM Christian Wittern <cwi...@gm... > <mailto:cwi...@gm...>> wrote: > > Dear eXist users, > > I am trying to improve search recall for my application, which > contains > premodern Chinese text. My documents are set up to have <tei:seg> > elements for every phrase, which are in turn contained by tei:p > elements. At the moment, I am simple defining a ngram index on > tei:seg, > but that of course limits the matches to the contents of one tei:seg > element. To overcome this limitation, I am defining a ngram index on > tei:p as well, in the expectation that the ngrams will be > constructed by > concating the tei:seg elements that make up a paragraph. So for > example: > > <tei:p><tei:seg>ABCD.</tei:seg><tei:seg>EFGH</tei:seg></tei:p> > > With such a text, I would expect to be able to search for "CD.EF" and > find one match. However there is no match for > > //ngram:contains(., "CD.EF"), also not with > //ngram:wildcard-contains(., > "CD.EF") > > The reason for this assumption is that the documentation for the > ngram > module says: > > "Note: a ngram match on mixed content may span multiple nodes. " > (this > is in the documentation for the ngram:filter-matches function). > > Since there are no parameters when setting up an ngram index, I would > expect that elements with mixed content like the tei:p element > would be > able to find a term across tei:seg elements. > > Is this a bug or am I missing something? Any help appreciated, > > Christian > > > > > _______________________________________________ > Exist-open mailing list > Exi...@li... > <mailto:Exi...@li...> > https://lists.sourceforge.net/lists/listinfo/exist-open > <https://lists.sourceforge.net/lists/listinfo/exist-open> > |