Thread: [Archive-access-cvs] archive-access/projects/nutch/xdocs faq.fml,1.16,1.17

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Update of /cvsroot/archive-access/archive-access/projects/nutch/xdocs
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv15920/xdocs

Modified Files:
	faq.fml 
Log Message:

* xdocs/faq.fml 
    Point to nutch FAQ.


Index: faq.fml
===================================================================
RCS file: /cvsroot/archive-access/archive-access/projects/nutch/xdocs/faq.fml,v
retrieving revision 1.16
retrieving revision 1.17
diff -C2 -d -r1.16 -r1.17
*** faq.fml	19 Nov 2005 01:21:58 -0000	1.16
--- faq.fml	21 Nov 2005 21:29:54 -0000	1.17
***************
*** 266,316 ****
          nutch/nutchwax (Or 'explain' the <code>explain</code> page)?</question>
          <answer>
!         <p>Nutch is built on Lucene.  To understand Nutch scoring, study
!         how Lucene does it.  The formula Lucene uses scoring can be found
!         at the head of the Lucene Similarity class in the
!         <a href="http://lucene.apache.org/java/docs/api/org/apache/lucene/search/Similarity.html">Lucene Similarity Javadoc</a>. 
!         Rougly, the score for a particular document in a set of query results,
!         <code>score(q,d)</code>, is the sum of the score for each term of a
!         query (<code>t in q</code>).  A terms' score in a document is itself the
!         sum of the term run against each field that comprises a document (e.g. 
!         <code>title</code> is one field, <code>url</code> is another. A 'document'
!         is a set of 'fields').  Per field, the terms' score is the product of
!         the following factors: Its <code>td</code> (term
!         freqency in the document), a score factor <code>idf</code> usually a factor
!         made up of frequency of term relative to amount of docs in index, an
!         index-time boost,
!         a normalization of count of terms found relative to size of document
!         (<code>lengthNorm</code>), a similar normalization is done for the term in
!         the query itself (<code>queryNorm</code>), and finally, a factor that
!         has a weight for how many instances of the total amount of terms a
!         particular document contains. Study the lucene javadoc to get more
!         detail on each of the equation components and how they effect
!         overall score.</p>
!         <p>The nutch <code>explain.jsp</code> page can be interpreted with the
!         Lucene scoring equation in mind.  First, notice how we move right as
!         we move from score total, to score per term, to score per field (Nothing
!         is shown if a term was not found in a particular field).
!         Next, studying a particular field scoring, it comprises a 
!         query component and then a field component (Score is product of
!         these two components).  The query component includes
!         query time -- as opposed to index time -- boost, an idf (that is same
!         for the query and field components), and then a queryNorm.  Similar for
!         the field component (fieldNorm is an aggregation of certain of the
!         Lucene equation components).</p>
! 
!         <p>The easiest way to influence scoring is to change query time boost
!         (will require edit of nutch-site.xml and redeploy of the nutchwax.war
!         file).  Query-time boost by default looks like this:
!         <pre>query.url.boost, 4.0f
! query.anchor.boost, 2.0f
! query.title.boost, 1.5f
! query.host.boost, 2.0f
! query.phrase.boost, 1.0f</pre></p>
! <p>From the list above, you can see that terms found in a document URL get
! the highest boost with anchor text next, etc.</p>
! <p>Anchor text makes a large contribution to a document ranking score.
! You can see the anchor text for a page by browsing to the 'explain' then
! editing the URL to put in place 'anchors.jsp' instead of 'explain.jsp'.
! </p>
          </answer>
      </faq>
--- 266,272 ----
          nutch/nutchwax (Or 'explain' the <code>explain</code> page)?</question>
          <answer>
!         <p>See <i>How is scoring done in Nutch? (Or, explain the
!         "explain" page?)</i> and <i>How can I influence Nutch scoring?</i> over on
!         the <a href="http://wiki.apache.org/nutch/FAQ">Nutch FAQ</a> page.</p>
          </answer>
      </faq>

Thread: [Archive-access-cvs] archive-access/projects/nutch/xdocs faq.fml,1.16,1.17

archive-access-cvs