From: Stefan M. <ste...@un...> - 2009-01-21 13:05:01
|
Dear all, we are currently seeing Problems with near() when used with words span over element boundaries. We have a fulltext index with content="mixed" defined for the collection. We know that the index as such works, as near() works as expected with single words, even when they overlap element tags. Nevertheless when searching for a succession of multiple words the search fails if at least one of the words is split by an element. Assume the following xql: --- declare namespace tei = "http://www.tei-c.org/ns/1.0"; let $q := "mixed test" return //tei:u[near(. , $q)] --- and this sample document: --- <?xml version="1.0" encoding="utf-8"?> <TEI xmlns="http://www.tei-c.org/ns/1.0"> <!-- snipped header --> <text> <body> <div> <u xml:id="u1"> this the first mi<seg type="overlap">xed test </seg> </u> <u xml:id="u2"> this the second mi<anchor/>xed test </u> <u xml:id="u3"> this is the third <seg type="overlap"> mixed </seg> test </u> <u xml:id="u4"> this is last <seg type="overlap"> mixed test </seg> </u> </div> </body> </text> </TEI> --- several searches yield very different results, even though they should imho be equal 1) $q="mixed" returns tei:u with id u1,u2,u3,u4 2) $2="mixed test" only returns tei:u with id u3,u4 Does anybody see a different behaviour? I might have misinterpreted something in the docs, such that the assumption that the second search should return the same four tei:u elements is wrong, or maybe there could also be a bug in near() or the fulltext index causing this issue. However it might be, I would be very glad to get some hints how I could circumvent this issue as I currently implement searches over highly segmented texts. cheers, Stefan |