From: PIERRICK B. <Pie...@br...> - 2021-01-15 15:34:33
|
Hello, > Perhaps, but the fields option enables you to define multiple indexes on one qname already, see Joe's answer: > > collection.xconf: > > > <analyzer id="nostops" class="org.fryske_akademy.exist.lucene.InsensitiveWhitespaceAnalyzer"/> > > <analyzer id="nostopssensitive" class="org.fryske_akademy.exist.lucene.SensitiveWhitespaceAnalyzer"/> Yeap, you're right. I was rather speaking about query terms and "serializers" in fact, as defined by Lucene standards... Sorry. Here is an example : In german, "sprächt" could be indexed as "sprächt" or "spracht" without the diacritic. Good to query with 2 indexes (diacritics or no_diacritics), depending of the applied indexing filter... My arabic indexer indexes it as "sprechen", i.e. the infinitive form (the arabic root "SPRCH" in fact, but never mind...). This is how arabic works and that is how every semitic language work also. Arabic dictionnaries are ordered by "root", not by derivation (noun, adjectives, verbs, adverbs...)... A simple (and funny ?) example is : - singular "el Qasr" (which is the latin term "castrum", the modern "castle" !!!) - plural "el-uqsur" (arabic "internal" plural : specific vowels are dispatched around the QSR root) ! Doh ! In fact, everyboy know this term as... "Luxor" ! I mean, there is such a difference between singular and pural forms that we can not index terms as they are... For instance, to mimic arabic indexing in german, my analyzer would index "sprächt", "spreche", "sprachen", "gesprochen", "gesprä̱chig" as "sprechen", i.e. the infinitive form... At query time, I have to analyze the query term. "spreche" for instance, would in fact query "sprechen", that is the root form, So, first point, I would like to index the query term with a specific analyzer. When I query "sprichst", I would like in fact it to query "sprechen"... At serialization time, and that is the main point IMHO, I need to retrieve an original analyzed term most of the time... but not always. I mean : index "6" and return "six" as a simple example... As a conclusion, we need some filters at query time and serialization time IMHO... Cheers, p.b. |