From: Chris T. <ct...@mo...> - 2005-10-29 03:16:56
|
Hi, All most excellent points. We are indeed using near() in the implementation to perform something like = &=3D matches where the order in which the user has entered the tokens is si= gnificant; however, we're not advertising this to the user as a proximity m= atch. In spelling Tibetan in standard transliteration there are various lit= tle bits that are often dropped when the user is entering a search and we s= imply want to compensate for a variety of spelling techniques that users ma= y employ. We could certainly perform the unary test prior to using near() but it's no= t clear to me that the unary behavior of near() can't reasonably be interpr= eted as in the cases of &=3D and |=3D. ciao, Chris =20 >[MB] >> >> >But why would you ever want to pass a single keyword >> >to "near"? I don't see how that can be a sensible thing to >> >do. I can see this would make sense if $some-term had >> > the value "John Doe", but then that's not a single term. If >> > eXist's near() did indeed use to return matches when >> > near() was fed a single token as its second parameter >> > instead of the string of tokens that its signature specifies, >> >that would be a pretty bad bug; and code reliant on >> > bugs is always doomed. > >[WM] >> I'm not sure I understand you here, Michael. I would indeed >> regard it as a bug if near(., "John") stopped returning matches >> while near(., "John Doe") does. You often don't know in your >> application if the user entered a single or multiple search terms. >> Testing $some-term before passing it to near() would be quite >> expensive: you don't know on the application level how it will >> be tokenized by the database. I thus think it's better if >> near() continues to behave as before, i.e. silently fall back >> to a single term lookup if just one word is specified. At >> least, my applications depend on this behaviour. > >I can see that this behaviour is convenient if user input that might indee= d >consist of a single term is going to be passed into near() unchecked. But = it >still makes no sense to me, given the name of the function. How can a sing= le >token be said to be "near" itself (or not)? Maybe my inability to attach >any meaning to that is related to my long-standing doubt about this >function's name in the first place. What eXist's function does is generall= y >named in specialised freetext search applications "before" or "precedes", >and is generally accompanied by the reverse-directional "after" or >"follows", with near() being a third, non-directional mode of proximity >search. Since, for that reason, I privately think of eXist's near() as >before(), its all the harder for me to picture what a single term being >before itself means. > >I agree about the overhead, but it's something I have always taken as >inevitable in my applications. I call near() in only two circumstances. > >First, from proximity search forms specifically requesting input of two or >more terms and either confirmation of a default distance value or a >user-supplied one. User input with only one term supplied triggers a promp= t >for completion, and isn't forwarded to eXist. > >Secondly where freeform term entry allows more than one term to be entered >into a single field. The entered text is tokenised by my application using >the same rules as the eXist tokeniser (the latter having been for some >languages or scripts modified by me). If the number of tokens >1, indicati= ng >the user has entered a phrase, then the tokens are passed to near() with >proximity 1. If there is only one token, the input is passed instead to >match-any(). > >I realise that under the hood, this probably calls much the same code (sin= ce >of course match-any/match-all can take a list of tokens as well, though >without the proximity+sequence constraints). But even if I'd found out by >accident that near() would return a result from a single token, I don't >think I would have modified my applications to exploit that fact, precisel= y >because I would have anticipated that eventually near() would be fixed to >behave as its name (to me at least) indicates. Which is why I thought tha= t >what the OP reported was a bux fix, not a regression. > >A further, though somewhat Blue Sky dimension: in IR retrieval languages, = a >near() function can be used to filter intermediate results in a >query-processing pipeline (e.g. as part of relevance calculations applied = to >a candidate result set) even where the user has not explicitly stated >proximity criteria. I wouldn't like to try doing that in eXist's XQuery >implementation just yet, but I wouldn't even consider attempting it if >near() could return a result based on a single term. > >I suppose in the end, it all boils down to the fact that I'm a linguist, n= ot >a programmer. I subscribe to the linguist's Lebensl=C3=BCge, that meanings= are >determinate and determinable. Or, as my old French teacher used to say: "I= f >it's daft, it's wrong" Monopolar proximity, I'm afraid, remains daft in m= y >eyes. But, yes, I know the world is full of people who really can hear one >hand clapping. > >Michael Beddow > > > >------------------------------------------------------- >This SF.Net email is sponsored by the JBoss Inc. >Get Certified Today * Register for a JBoss Training Course >Free Certification Exam for All Training Attendees Through End of 2005 >Visit http://www.jboss.com/services/certification for more information >_______________________________________________ >Exist-open mailing list >Exi...@li... >https://lists.sourceforge.net/lists/listinfo/exist-open |