From: AG <gru...@gm...> - 2020-12-07 15:23:00
|
Thanks to all for all your help! Especially to Joe for his detailed explanations and suggestions! Ok, the problem is in the losing identity of original nodes once they are wrapped in a new tag... I got it. I'm not yet very familiar with mapping techniques, so I will read up about it. Thanks for the suggestion! "Defer output generation until really needed" - absolutely agree. It's my biggest problem. In my complex query I try to find a balance between "generate the output too early" and "still have access to original data to extract that I need". So the problem is solved (or rather there is no solution as is). Thanks once again to all the eXist-db community! P.S. There is my related question on stackoverflow: https://stackoverflow.com/questions/65174232/fulltext-xquery-lucene-kwic-doesnt-work-on-tagged-result-exist-db-bug I'm going to suggest to close it as well (with the link to the mailing list answer : https://sourceforge.net/p/exist/mailman/message/37170946/ ) On Mon, Dec 7, 2020 at 3:39 PM Joe Wicentowski <jo...@gm...> wrote: > Hi AG, > > When you wrap a node in an element, you are constructing a new node, in > memory. As a result, the newly constructed element has no connection to the > original one (i.e., the wrapped node loses its identity), and you are no > longer able to query it using the full text index, since the full text > index only queries nodes stored in the database and your newly constructed > node is in memory, not stored in the database. Here are two alternatives: > > 1. Perform ft:query first, then wrap the results as needed. You still lose > the node's original identity, but you may no longer need it once you've > performed the full text index query. > > collection("path_to_my_collection")//node[ft:query(., "KEYWORD")] ! > <tag>{.}</tag> > > Do keep in mind the advice to "Defer output generation until really > needed" (https://exist-db.org/exist/apps/doc/tuning#defer-output). > > 2. A bit more advanced technique is to use a map instead of an element. > This preserves the node's original identity while letting you associate it > with arbitrary entries, for labeling. > > collection("path_to_my_collection")//node[ft:query(., "KEYWORD")] ! map { > "label": "my tag", "hit": . } > > You could use these entries for grouping, etc. > > Joe > > On Mon, Dec 7, 2020 at 8:29 AM AG <gru...@gm...> wrote: > >> Thank you, Eduard, for all your suggestions. >> >> - there is no index for <tag>, so ft:query on it yields no results, yes >>> you tried adding index config for it, but your <tag> isn't there at the >>> time the index is built >>> >> >> You are absolutely right. So the question is how to put an index on the >> intermediate internal result. Is it even possible? Maybe someone has some >> leads? eXist-db documentation is not very helpful. The closest to this I >> have found is: >> https://exist-db.org/exist/apps/doc/lucene#constructed-fields >> >> P.S. >> >> - I don't understand your need to wrap in <tag> >> >> >> I have two collections with quite similar data but the different XML >> schemas, so I have to query them separately (but I need a common result). >> So for now I have two fulltext queries on each collection and then I >> combine obtained results. My goal is optimization: go from two fulltext >> queries (slow) to only one (fast). For this I do 1) from each collection >> select the files that meet my criteria; 2) from selected files (from two >> collections) extract data I need; 3) from these data construct combined >> intermediate internal result (here I put <tag> on the part of this result >> where I want to make fulltext query); 4) make fulltext query (only one) on >> this combined intermediate internal result. >> Maybe I'm wrong and this approach is not most optimised… >> >> On Mon, Dec 7, 2020 at 12:34 PM Eduard Drenth <ed...@fr...> >> wrote: >> >>> Hi, >>> >>> A few things: >>> >>> - there is no index for <tag>, so ft:query on it yields no results, yes >>> you tried adding index config for it, but your <tag> isn't there at the >>> time the index is built >>> - use exist 5.2.0, but that isn't your problem >>> - you query can be simpler, but perhaps you know: >>> >>>> let $results := collection("path_to_my_collection")//node[ft:query(., "KEYWORD")] >>>> >>>> - I don't understand your need to wrap in <tag> >>> >>> >>> Good luck, Eduard >>> >>> >>> -----Original Message----- >>> *From*: AG <gru...@gm... <AG%20%3cg...@gm...%3e> >>> > >>> *To*: exi...@li... >>> *Subject*: [Exist-open] Bug? KWIC fulltext (ft:query) doesn't work if >>> variable in path was "tagged" in preceding "return". >>> *Date*: Mon, 07 Dec 2020 12:04:38 +0100 >>> >>> First of all, Hello everybody! >>> I'm new here and very excited to be on this list! >>> Here is my problem. After reading XQuery and eXist-db >>> documentations, I can't figure it out, so I decided to ask the question >>> here. >>> >>> *In a nutshell:* The fulltext search with KWIC doesn't work if the >>> variable in the path for fulltext search was put in a tag in preceding >>> "result". It returns an empty result. >>> >>> *Explanations: * >>> >>> *XML file* >>> >>> < >>> >>> root >>> >>> > >>> >>> >>> < >>> >>> node >>> >>> > >>> >>> blablabla >>> >>> </ >>> >>> node >>> >>> > >>> >>> >>> < >>> >>> node >>> >>> > >>> >>> blab KEYWORD labla >>> >>> </ >>> >>> node >>> >>> > >>> >>> >>> < >>> >>> node >>> >>> > >>> >>> blablabla >>> >>> </ >>> >>> node >>> >>> > >>> >>> >>> </ >>> >>> root >>> >>> > >>> >>> >>> >>> *Index configuration (collection.xconf)* >>> >>> < >>> >>> collection >>> >>> xmlns >>> >>> = >>> >>> " >>> >>> http://exist-db.org/collection-config/1.0 >>> >>> " >>> >>> > >>> >>> >>> < >>> >>> index >>> >>> xmlns:xs >>> >>> = >>> >>> " >>> >>> http://www.w3.org/2001/XMLSchema >>> >>> " >>> >>> > >>> >>> >>> < >>> >>> lucene >>> >>> > >>> >>> >>> < >>> >>> text >>> >>> qname >>> >>> = >>> >>> "root" >>> >>> /> >>> >>> >>> < >>> >>> text >>> >>> qname >>> >>> = >>> >>> "node" >>> >>> /> >>> >>> >>> </ >>> >>> lucene >>> >>> > >>> >>> >>> </ >>> >>> index >>> >>> > >>> >>> >>> </ >>> >>> collection >>> >>> > >>> >>> >>> >>> *XQuery without "tagged" result (it works)* *(look at "return $node")* >>> >>> let $my_texts := >>> >>> for $node in collection("path_to_my_collection")//node >>> >>> return >>> >>> $node >>> >>> >>> for $my_hit in $my_texts[ft:query(., "KEYWORD")] >>> >>> return >>> >>> $my_hit >>> >>> >>> The Xquery code above works and I get a result. >>> >>> 1 >>> >>> < >>> >>> node >>> >>> > >>> >>> blab KEYWORD labla >>> >>> </ >>> >>> node >>> >>> > >>> >>> >>> >>> But it doesn't work when the first result on which fulltext search is >>> launched was put in a tag. (My whole query is more complex and I need to >>> put this result in the tag to use it in another place of my code.) >>> >>> *XQuery with "tagged" result (it doesn't work)* *(look at "return >>> <tag>{$node}</tag>")* >>> >>> let $my_texts := >>> >>> for $node in collection("path_to_my_collection")//node >>> >>> return >>> >>> < >>> >>> tag >>> >>> > >>> >>> {$node} >>> >>> </ >>> >>> tag >>> >>> > >>> >>> >>> >>> for $my_hit in $my_texts[ft:query(., "KEYWORD")] >>> >>> return >>> >>> $my_hit >>> >>> >>> This query return 0 result. >>> >>> When I debug like this: >>> >>> *XQuery for debugging* >>> >>> let $my_texts := >>> >>> for $node in collection("path_to_my_collection")//node >>> >>> return >>> >>> < >>> >>> tag >>> >>> > >>> >>> {$node} >>> >>> </ >>> >>> tag >>> >>> > >>> >>> >>> >>> return >>> >>> $my_texts >>> >>> >>> I get this: >>> >>> 1 >>> >>> < >>> >>> tag >>> >>> > >>> >>> >>> < >>> >>> node >>> >>> > >>> >>> blablabla >>> >>> </ >>> >>> node >>> >>> > >>> >>> >>> </ >>> >>> tag >>> >>> > >>> >>> >>> >>> 2 >>> >>> < >>> >>> tag >>> >>> > >>> >>> >>> < >>> >>> node >>> >>> > >>> >>> blab KEYWORD labla >>> >>> </ >>> >>> node >>> >>> > >>> >>> >>> </ >>> >>> tag >>> >>> > >>> >>> >>> >>> 3 >>> >>> < >>> >>> tag >>> >>> > >>> >>> >>> < >>> >>> node >>> >>> > >>> >>> blablabla >>> >>> </ >>> >>> node >>> >>> > >>> >>> >>> </ >>> >>> tag >>> >>> > >>> >>> >>> >>> What I tried: >>> >>> - different path combinations: $my_texts/tag[ft:query(., "KEYWORD")], >>> $my_texts/tag/node[ft:query(., "KEYWORD")], $my_texts/*[ft:query(., >>> "KEYWORD")], $my_texts/tag//*[ft:query(., "KEYWORD")], $my_texts//*//*[ft:query(., >>> "KEYWORD")] etc... >>> - add <tag> in the Index configuration (<text qname="tag"/>) >>> >>> What I missed? Or it is an eXist-db bug? (my eXist version: 4.7.0) >>> Many thanks in advance for all your help! >>> Sincerely, AG. >>> >>> _______________________________________________ >>> >>> Exist-open mailing list >>> >>> Exi...@li... >>> >>> >>> https://lists.sourceforge.net/lists/listinfo/exist-open >>> >>> >>> -- >>> >>> Eduard Drenth, Software Architekt >>> >>> ed...@fr... >>> >>> Doelestrjitte 8 >>> 8911 DX Ljouwert >>> +31 58 234 30 47 >>> +31 62 094 34 28 (privé) >>> >>> skype: eduarddrenth >>> https://github.com/eduarddrenth >>> frisian.eu >>> gpg: https://pgp.surfnet.nl/pks/lookup?search=eduarddrenth >>> >>> >>> Op freed bin ik thús/wurkje ik minder >>> >>> >>> >>> _______________________________________________ >> Exist-open mailing list >> Exi...@li... >> https://lists.sourceforge.net/lists/listinfo/exist-open >> > |