From: <jm...@bo...> - 2003-04-22 11:51:55
|
<?xml version="1.0" ?> <html> <head> <title></title> </head> <body> <div align="left"><br/> </div> <div align="left"><font face="System"><span style="font-size:10pt">Hi Bill, Dave, Douglas</span></font></div> <div align="left"><br/> </div> <div align="left"><font face="System"><span style="font-size:10pt">Frankly, I am not sure about the right way to improve rank.</span></font></div> <div align="left"><font face="System"><span style="font-size:10pt">I have never worked seriously with it. The actual way</span></font></div> <div align="left"><font face="System"><span style="font-size:10pt">to compute it is basically the same as it was in earlier swish </span></font></div> <div align="left"><font face="System"><span style="font-size:10pt">versions (1.X) with one difference: those days rank was computed</span></font></div> <div align="left"><font face="System"><span style="font-size:10pt">at index time, I change it to do it at search time.</span></font></div> <div align="left"><br/> </div> <div align="left"><font face="System"><span style="font-size:10pt">IMO, computing rank at index time has some lacks:</span></font></div> <div align="left"><font face="System"><span style="font-size:10pt">- Storing big numbers in worddata (compression does not work</span></font></div> <div align="left"><font face="System"><span style="font-size:10pt">well on them). We can live with it, though</span></font></div> <div align="left"><font face="System"><span style="font-size:10pt">- Not good for an incremental index.</span></font></div> <div align="left"><br/> </div> <div align="left"><font face="System"><span style="font-size:10pt">Another thing that comes to my mind is that perhaps we are focusing</span></font></div> <div align="left"><font face="System"><span style="font-size:10pt">the problem with HTML files.</span></font></div> <div align="left"><br/> </div> <div align="left"><font face="System"><span style="font-size:10pt">I do not use rank.c at all in my indexes. I always index XML files.</span></font></div> <div align="left"><font face="System"><span style="font-size:10pt">I just add a field to them wich is similar to the rank based on the type</span></font></div> <div align="left"><font face="System"><span style="font-size:10pt">of the document (wich is also another field), the size, the date, etc...</span></font></div> <div align="left"><br/> </div> <div align="left"><font face="System"><span style="font-size:10pt">For example, the Spanish Constitution ranks higher than a Law. A Law</span></font></div> <div align="left"><font face="System"><span style="font-size:10pt">ranks higher than a Reglament. Also, the more recent documents</span></font></div> <div align="left"><font face="System"><span style="font-size:10pt">ranks higher than the older ones, etc... It is simple but very related to</span></font></div> <div align="left"><font face="System"><span style="font-size:10pt">the nature of the document. So, no generalization is possible.</span></font></div> <div align="left"><br/> </div> <div align="left"><font face="System"><span style="font-size:10pt">BTW, do not forget "phrase search". The rank is not well computed.</span></font></div> <div align="left"><br/> </div> <div align="left"><font face="System"><span style="font-size:10pt">And what about patents? Altavista, google may have patented all</span></font></div> <div align="left"><font face="System"><span style="font-size:10pt">the imaginable ways to get a reasonable rank.</span></font></div> <div align="left"><br/> </div> <div align="left"><font face="System"><span style="font-size:10pt">Forgot to mention... I am now indexing +150000 laws and reglaments</span></font></div> <div align="left"><font face="System"><span style="font-size:10pt">(european and spanish) in 40 minutes in a 1Ghz PIII. Searches are</span></font></div> <div align="left"><font face="System"><span style="font-size:10pt">made in less than 0.1 seconds.</span></font></div> <div align="left"><br/> </div> <div align="left"><font face="System"><span style="font-size:10pt">cu</span></font></div> <div align="left"><font face="System"><span style="font-size:10pt">Jose</span></font></div> <div align="left"></div> </body> </html> |