From: Lachlan A. <lh...@us...> - 2003-09-14 11:00:49
|
Greetings Neal, Thanks for your work in pushing 3.2 out. What is the URL of the ToDo=20 list? As an alternative, you could modify the STATUS file in CVS,=20 which is the basis of Geoff's weekly posts. The only ToDo I would add concerns backlink weights. Below (and at=20 <http://www.mail-archive.com/htd...@li.../msg01881.htm= l>)=20 is my original email from June, and Geoff's reply. The basic problem=20 is that the score based on a document is (sensibly) divided by the=20 number of words in a document, but the score for links *to* the=20 document isn't. Before releasing 3.2, we should either (1) remove the division by document size or (2) change the weightings in defaults.cc to balance this. The potential disadvantage with (2) is breaking compatibility with old=20 configurations. Opinions? Cheers, Lachlan > The base score of documents I search for is typically 0.0001, while > the backlink factor is typically 2000. Since these are added, the > weight given to the document itself is approximately zero! > >Does anyone know how this came about?=20 Well, that makes some sense. We haven't "recalibrated" the scoring,=20 though we trimmed out the whole "words in the front get higher score"=20 bit. And since I assumed that somewhere along the 3.2 development,=20 we'd add in some sort of "proximity weighting," I didn't really worry=20 about it. =20 As far as changing the weightings, I don't think anyone minds as long=20 as it's explained up-front in release documentation. In particular,=20 now that you don't have to reindex to change weightings, it's an easy=20 change to your config file. =20 -Geoff =20 On Wed, 27 Aug 2003 00:04, Neal Richter wrote: > =09McGill University recently contacted the one of the HtDig Board > members to inquire about making some kind of financial arrangement > with HtDig to get 3.2 finished, tested and working with Phrase > Searching -- ie quoted strings. > > =09Please post your TODO list and I'll compile them and post them on > a web-page prioritized for release. We can then have a short > debate and get to work. > > =09My personal opinion is that we limit the TODOs to the absolutely > necessary (ie satisfy Geoff's weekly status email) and get it > working and call it 3.2. Everything else is a new release. --=20 lh...@us... ht://Dig developer DownUnder (http://www.htdig.org) |