|
From: stack <st...@ar...> - 2005-11-02 00:19:18
|
kau...@cs... wrote: >Is it possible to rank/sort search results by relevance? Show first >results >where search term is in html title, or appears several times in text. >(versus those where search term appears once, late in text, or >in a link name). > > It should be doing this for you Kaisa. In general, are you not seeing the most significant links showing first in results? I just added a little FAQ on ranking with some notes on how nutch is doing it. I'll repeat the note here: By default, at query time, the following fields are boosted as follows: query.url.boost, 4.0f query.anchor.boost, 2.0f query.title.boost, 1.5f query.host.boost, 2.0f query.phrase.boost, 1.0f From the above, terms found in an URL are scored high with anchor text next, then title. You can change the above boosts by editing your nutch-site.xml but in general, the defaults seem to work well for most collections. Anchor text can make a large contribution to a document ranking score. You can see the anchor text for a page by browsing to the 'explain' then editing the URL to put in place 'anchors.jsp' instead of 'explain.jsp'. >Does nutchwax index link names within html files? If there's a link >http://www.something.net/storm.gif withing html , could I search for >'storm' >and get this image into result list? > > This is an interesting question Kaisa. I just took a look. It doesn't look like it (See below for how I figured this). Do you need this feature? Here's how I took a look see at what was in the a particular nutch segment: % ./bin/nutch segread -fix -nocontent -dump nutch-data/segments/debord2005-11-01-155531/ This dumps out what nutch has per resource. It will list the text it parsed from the document, list of outlinks found in the document, the page hash, etc. I compared what was in nutch to what was in the indexed ARC (I zcat'd the ARC). Yours, St.Ack >*Kaisa > > >------------------------------------------------------- >This SF.Net email is sponsored by the JBoss Inc. >Get Certified Today * Register for a JBoss Training Course >Free Certification Exam for All Training Attendees Through End of 2005 >Visit http://www.jboss.com/services/certification for more information >_______________________________________________ >Archive-access-discuss mailing list >Arc...@li... >https://lists.sourceforge.net/lists/listinfo/archive-access-discuss > > |