From: Demian K. <dem...@vi...> - 2012-06-26 12:53:28
|
Are you aware of the debugQuery feature in Solr? You might find that very helpful in tuning your relevance rankings. Just follow these steps: 1.) Turn on debug mode in VuFind. 2.) Do a search. 3.) Copy the Solr search URL from VuFind's debug output. 4.) Paste the copied URL into a browser. You may need to change "localhost" in the URL to the name of your Solr server. 5.) Add &debugQuery=true to the end of the query. 6.) Scroll down to the "explain" section of the resulting document. This will tell you exactly how all of the various relevance factors were combined to produce document scores, so you can see the consequences of tuning the various numbers. Relevance tuning relies on a certain amount of luck and magic. I think the trick here is to find the right level so that publication date breaks ties and pushes newer editions up but isn't such a big factor that you end up getting irrelevant new books ranked ahead of irrelevant older books. When testing this, you might want to try searches on classic books that haven't been updated in a while (assuming there are some) to make sure that you don't break those searches. As for the Wikipedia issue, see http://vufind.org/jira/browse/VUFIND-169; I think a solution (or at least an improvement) is technically possible... just a matter of somebody taking the time to implement it. At this point, it's probably best to wait until the release of VuFind 2.0 to avoid doing redundant work. In the meantime, have fun with the inappropriate entries. ;-) - Demian From: Chris Keene [mailto:chr...@gm...] Sent: Tuesday, June 26, 2012 5:52 AM To: vuf...@li... Subject: Re: [VuFind-Tech] Boosting the score of newer documents Hi The main users of our vufind is a medical school, obviously students finding the latest edition of a book is an important issue for them, and I was tasked with improving this. I found the email thread below between Oliver Goldschmidt and Demian Katz really helpful - thanks! I first tried the solution they came to, it worked, but it wasn't enough to get the latest edition to the top in my key test ('kumar clark' had to come up with the latest edition of the key medical text book). I tried a different approach. Here's what my searchspecs.yaml currently looks like: AllFields: DismaxFields: - title_short^750 - title_full_unstemmed^600 - title_full^400 - title^500 - title_alt^200 - title_new^100 - series^50 - series2^30 - author^300 - contents^10 - topic_unstemmed^550 - topic^500 - geographic^300 - genre^300 - allfields DismaxParams: # -[bf, recip(rord(publishDate),1e-3,1,1)] -[bf, ord(publishDate)^0.5] The key line is the last one above, if I've understood currently (and i confess I don't really understand this at all, which makes my slightly nervous making these changes) this essentially uses the very value of the published year field as a boost (2012 being a bigger boost than 1995 etc). So, I'm emailing this partly for inf, and partly for comment, if anyone can see any reason why this is not a good idea, or can think of a better approach. As an aside, the more I think about it, I think all discovery/catalogue systems should be treating newer editions of the same book as more relevant. testing a few systems here has shown it isn't the case ( http://pinterest.com/chriskeene/catalogue-relevancy/ ) Thanks Chris ps Probably the most amusing page on our catalogue, and one of the most popular, and the reason why I will be disabling the wikipedia functionality shortly http://sabre.sussex.ac.uk/vufindsmu/Author/Home?author=Sloane,%20Sarah. On 24 February 2012 14:53, Oliver Goldschmidt <o.g...@tu...<mailto:o.g...@tu...>> wrote: Now I have found a small side effect of that bosting parameter: empty queries are no longer possible. If I enable this boosting parameter, an empty query return no results. I dont think this is very bad, but one should know that... - Oliver Am 24.02.2012 15<tel:24.02.2012%2015>:40, schrieb Demian Katz: > Glad I could help! > > The YAML format is very sensitive about whitespace, so it's easy to make small errors that prevent it from recognizing certain elements. Perhaps that's what happened the first time. > > - Demian > ________________________________________ > From: Oliver Goldschmidt [o.g...@tu...<mailto:o.g...@tu...>] > Sent: Friday, February 24, 2012 9:26 AM > To: Demian Katz > Cc: vuf...@li...<mailto:vuf...@li...> > Subject: Re: [VuFind-Tech] Boosting the score of newer documents > > I just tried it again - weird thing, now it works as you supposed. > Putting this into searchspecs.yaml: > DismaxParams: > - [bf, recip(rord(publishDate),1e-3,1,1)] > > has in deed the same effect as my query using the {!boost} parameter. > > The newer version using ms() still does not work, but thats not really a > problem for me. > So my patch wont be needed :-) > > Thank you for helping, Demian! > > - Oliver > > Am 24.02.2012 15<tel:24.02.2012%2015>:17, schrieb Demian Katz: >> I thought the Dismax bf parameter was essentially the same as using {!boost} >> >> Can't you just add >> >> - [bf, recip(ms(NOW,manufacturedate_dt),3.16e-11,1,1)] >> >> in the DismaxParams section? >> >> Or am I missing something? >> >> - Demian >> ________________________________________ >> From: Oliver Goldschmidt [o.g...@tu...<mailto:o.g...@tu...>] >> Sent: Friday, February 24, 2012 9:13 AM >> To: Demian Katz >> Cc: vuf...@li...<mailto:vuf...@li...> >> Subject: Re: [VuFind-Tech] Boosting the score of newer documents >> >> Thanks for your fast reply, Demian. But I think, this will not work for >> my desired effect. I guess its not that simple... >> >> On the Solr help page referenced in my email is this example: >> http://localhost:8983/solr/select?q={!boost<http://localhost:8983/solr/select?q=%7b!boost> >> b=recip(ms(NOW,manufacturedate_dt),3.16e-11,1,1)}ipod >> This does not work for me. So I tried this in Vufind: >> http://solrhost:8983/solr/biblio/select/?q={!boost%20b=recip%28rord%28publishDate%29,1e-3,1,1%29}bibliothek&rows=20&start=0&indent=yes&qf=title_short<http://solrhost:8983/solr/biblio/select/?q=%7b!boost%20b=recip%28rord%28publishDate%29,1e-3,1,1%29%7dbibliothek&rows=20&start=0&indent=yes&qf=title_short>^750+title_full_unstemmed^600+title_full^400+title^500+title_alt^200+title_new^100+series^50+series2^30+author^300+author_fuller^150+contents^10+topic_unstemmed^550+topic^500+geographic^300+genre^300+allfiels_unstemmed^10+fulltext_unstemmed^10+allfields+fulltext+bklNAME&qt=dismax&fl=*%2Cscore&spellcheck=true&spellcheck.q=bibliothek&spellcheck.dictionary=basicSpell&hl=true&hl.fl=*&hl.simple.pre={{{{START_HILITE}}}}&hl.simple.post={{{{END_HILITE}}}}&wt=json&json.nl<http://json.nl>=arrarr >> This works pretty good. >> But I cannot set the {!boost...} parameter in searchspecs.yaml. If I put >> it into a [DismaxParams] Section, it did not work for me either. So this >> is what my patch fixes. >> >> So my question again: is anyone using something like that yet? >> >> - Oliver >> >> Am 24.02.2012 14<tel:24.02.2012%2014>:55, schrieb Demian Katz: >>> You can add boost queries to searchspecs.yaml -- use the Dismax bq (boost query) or bf (boost function) parameters in the DismaxParams section of each affected search handler. >>> >>> Here's an example from Villanova's local custom web search configuration: >>> >>> AllFields: >>> DismaxFields: >>> - title^750 >>> - description_unstemmed^350 >>> - description^300 >>> - keywords_unstemmed^250 >>> - keywords^200 >>> - url_keywords^50 >>> - fulltext_unstemmed^10 >>> - fulltext >>> DismaxParams: >>> - [bq, category:"Guides"^10] >>> - [bf, use_count^0.5] >>> QueryFields: >>> - title: >>> - [onephrase, 1000] >>> - [and, 750] >>> - [or, 10] >>> - description_unstemmed: >>> - [onephrase, 400] >>> - [and, 350] >>> - [or, ~] >>> - description: >>> - [onephrase, 350] >>> - [and, 300] >>> - [or, ~] >>> - keywords_unstemmed: >>> - [onephrase, 300] >>> - [and, 250] >>> - [or, ~] >>> - keywords: >>> - [onephrase, 250] >>> - [and, 200] >>> - [or, ~] >>> - url_keywords: >>> - [onephrase, 100] >>> - [and, 50] >>> - [or, ~] >>> - fulltext_unstemmed: >>> - [onephrase, 50] >>> - [and, 10] >>> - [or, ~] >>> - fulltext: >>> - [onephrase, 25] >>> - [and, 5] >>> - [or, ~] >>> >>> Note that the boosts may not always be applied correctly since Dismax is not always used. However, as of VuFind 1.3, the code will try to apply boosts even to non-Dismax queries. Due to some limitations of Solr, this won't always work perfectly, but I think you should be able to get pretty close to the desired effect. >>> >>> - Demian >>> ________________________________________ >>> From: Oliver Goldschmidt [o.g...@tu...<mailto:o.g...@tu...>] >>> Sent: Friday, February 24, 2012 8:51 AM >>> To: vuf...@li...<mailto:vuf...@li...> >>> Subject: [VuFind-Tech] Boosting the score of newer documents >>> >>> Hi, >>> >>> I was looking for a way to include the publishing date into boosting of >>> records (more recent ones should be ranked higher than olders, but not >>> simply switching sort order), but did not find a way to accomplish that >>> by modifiing searchspecs.yaml. Is it possible to do that by >>> configuration? Does anybody do that by now? >>> >>> I have set up a patch introducing a new parameter for query; setting >>> this parameter will do something like that: >>> http://wiki.apache.org/solr/SolrRelevancyFAQ#How_can_I_boost_the_score_of_newer_documents >>> >>> For me, only the first solution of this article (the one concerning Solr >>> 1.3) worked effectively, even though I have Solr 1.4. I can open a new >>> ticket and submit my patch there, if this might be helpful for anyone. >>> >>> - Oliver >>> >>> -- >>> Oliver Goldschmidt >>> TU Hamburg-Harburg / Universitätsbibliothek / Digitale Dienste >>> Denickestr. 22 >>> 21071 Hamburg - Harburg >>> Tel. +49 (0)40 / 428 78 - 32 91<tel:%2B49%20%280%2940%20%2F%20428%2078%20-%2032%2091> >>> eMail o.g...@tu...<mailto:o.g...@tu...> >>> -- >>> GPG/PGP-Schlüssel: >>> http://www.tub.tu-harburg.de/keys/Oliver_Marahrens_pub.asc >>> -- >>> Projekt DISCUS http://discus.tu-harburg.de >>> Projekt TUBdok http://doku.b.tu-harburg.de >>> >>> >>> ------------------------------------------------------------------------------ >>> Virtualization & Cloud Management Using Capacity Planning >>> Cloud computing makes use of virtualization - but cloud computing >>> also focuses on allowing computing to be delivered as a service. >>> http://www.accelacomm.com/jaw/sfnl/114/51521223/ >>> _______________________________________________ >>> Vufind-tech mailing list >>> Vuf...@li...<mailto:Vuf...@li...> >>> https://lists.sourceforge.net/lists/listinfo/vufind-tech >> -- >> Oliver Goldschmidt >> TU Hamburg-Harburg / Universitätsbibliothek / Digitale Dienste >> Denickestr. 22 >> 21071 Hamburg - Harburg >> Tel. +49 (0)40 / 428 78 - 32 91<tel:%2B49%20%280%2940%20%2F%20428%2078%20-%2032%2091> >> eMail o.g...@tu...<mailto:o.g...@tu...> >> -- >> GPG/PGP-Schlüssel: >> http://www.tub.tu-harburg.de/keys/Oliver_Marahrens_pub.asc >> -- >> Projekt DISCUS http://discus.tu-harburg.de >> Projekt TUBdok http://doku.b.tu-harburg.de >> > > -- > Oliver Goldschmidt > TU Hamburg-Harburg / Universitätsbibliothek / Digitale Dienste > Denickestr. 22 > 21071 Hamburg - Harburg > Tel. +49 (0)40 / 428 78 - 32 91<tel:%2B49%20%280%2940%20%2F%20428%2078%20-%2032%2091> > eMail o.g...@tu...<mailto:o.g...@tu...> > -- > GPG/PGP-Schlüssel: > http://www.tub.tu-harburg.de/keys/Oliver_Marahrens_pub.asc > -- > Projekt DISCUS http://discus.tu-harburg.de > Projekt TUBdok http://doku.b.tu-harburg.de > -- Oliver Goldschmidt TU Hamburg-Harburg / Universitätsbibliothek / Digitale Dienste Denickestr. 22 21071 Hamburg - Harburg Tel. +49 (0)40 / 428 78 - 32 91<tel:%2B49%20%280%2940%20%2F%20428%2078%20-%2032%2091> eMail o.g...@tu...<mailto:o.g...@tu...> -- GPG/PGP-Schlüssel: http://www.tub.tu-harburg.de/keys/Oliver_Marahrens_pub.asc -- Projekt DISCUS http://discus.tu-harburg.de Projekt TUBdok http://doku.b.tu-harburg.de ------------------------------------------------------------------------------ Virtualization & Cloud Management Using Capacity Planning Cloud computing makes use of virtualization - but cloud computing also focuses on allowing computing to be delivered as a service. http://www.accelacomm.com/jaw/sfnl/114/51521223/ _______________________________________________ Vufind-tech mailing list Vuf...@li...<mailto:Vuf...@li...> https://lists.sourceforge.net/lists/listinfo/vufind-tech |