If the scores calculated on a specific document are equal for 2 or more classifier values, how does it select which one to add ? Looks like its adding the earliest value alphabetically ?
How exactly can I use the boost value? In one of my tests where it picked 1 from a group of 2 equal scoring classifier values, I couldn't seem to get it to pick the other one by changing its boost value ?
Lastly, I assume I am right in saying that classifying is a content load time function not a dynamic query time function (I am reloading content on every classifier change), so why am I given the option (under schema) for adding a query template attribute for each classifier value ? What happens if I do not add one at all or a mixture of different ones ?
Also can you tell me what exactly can be used in the Query Keywords field for each classifier and the exact behaviour. Are analysers involved.
As far as I can tell multiple words separated by spaces or commas behave as though they are using boolean OR across the whole content and multiple words with quotes around behave as though they are using boolean AND across the whole content.
Although it seems to be inconstant on the quote side. What exactly is available to define the matching and tokenisation? Which field is being search by default (content?) and can this be configured?
great functionality , by the way...
In field "Query template" you have to choose an existing query template (created in tab "Query") that will be used with given "Query keywords" to check if the document match each line of the classifier.
You may for example want to categorize web pages according to their host. For instance if page have host "www.example.com" you may want to categorize these documents with value "category1" in field "category". To do so you will create a query template searching into field "host" only, and you will configure classifier as shown in attached screenshot.
Analyzer will be used if they are some configured on searched fields. Quotes handling will depend on what is configured in query template.
ok. I need to go away and think about your answer :-)
ok. I can see that you are using the query template for two different purposes. For defining the client query (primary use using all query definition tabs) and to define query parameters to control how a classifier search is performed at load time.
question: Which subset of parameters for defining a query do I have to set to support this classifier operation. I can see that I could use the default operator AND/OR , for instance.
I cannot get it to work at all. If I don't use a query field in the classifier then it must use a default set, but when I try to control it using a query parameter, Nothing!
Can you give me an minimal example showing how to define a query template that will be used to find the query keywords in the "content" field only. There must be something that I am not doing ??
You can create a "Search (field)" query template, add field "content" in "Searched fields" tab, and add for instance "url" in "Returned fields" tab.
This should work. You can test directly by using input "Enter the query" that your configured keywords in classifier match documents.
However you need to re-index totally your data to populate fields with classifier's values.
Thanks. I managed to get it to work, although I'm not sure what I did differently as I was already doing what you suggest :-). Also managed to use a search(pattern) query so I can make more complex classifier choices.
thanks very much for your valuable time
Sign up for the SourceForge newsletter:
You seem to have CSS turned off.
Please don't fill out this field.