Revision: 2657 http://archive-access.svn.sourceforge.net/archive-access/?rev=2657&view=rev Author: binzino Date: 2008-12-10 05:01:14 +0000 (Wed, 10 Dec 2008) Log Message: ----------- Removed use of floor() in calculating the book multiplier. Modified Paths: -------------- trunk/archive-access/projects/nutchwax/archive/src/plugin/scoring-nutchwax/src/java/org/archive/nutchwax/scoring/PageRankScoringFilter.java Modified: trunk/archive-access/projects/nutchwax/archive/src/plugin/scoring-nutchwax/src/java/org/archive/nutchwax/scoring/PageRankScoringFilter.java =================================================================== --- trunk/archive-access/projects/nutchwax/archive/src/plugin/scoring-nutchwax/src/java/org/archive/nutchwax/scoring/PageRankScoringFilter.java 2008-12-10 04:59:10 UTC (rev 2656) +++ trunk/archive-access/projects/nutchwax/archive/src/plugin/scoring-nutchwax/src/java/org/archive/nutchwax/scoring/PageRankScoringFilter.java 2008-12-10 05:01:14 UTC (rev 2657) @@ -56,17 +56,14 @@ * </p><p> * Applies a simple log10 multipler to the document score based on the * base-10 log value of the number of inlinks. For example, a page with - * 13,032 inlinks will have a score/boost of 5. The actual formula is + * 13,032 inlinks will have a score/boost of 5.115. The actual formula is * </p> * <code> - * initialScore *= ( floor( log10( # inlinks ) ) + 1 ) + * newScore = initialScore * ( log10( # inlinks ) + 1 ) * </code> * <p> - * We use floor() to get an integer value from the log10() function - * since we're only interested in order of magnitude. We then add 1 - * so that a page with < 10 inlins will have a multipler of 1, and - * thus stay the same, 10-100 gets a multipler of 2, 100-1000 is 3, and - * so forth. + * We add the extra 1 for pages with only 1 inlink since log10(1)=0 and we + * don't want a 0 multiplier. * </p> * <p> * The number of inlinks for a page is not taken from the <code>inlinks</code> @@ -115,8 +112,6 @@ public void setConf( Configuration conf ) { this.conf = conf; - - //this.ranks = getPageRanks( conf ); } public void injectedScore(Text url, CrawlDatum datum) @@ -181,7 +176,7 @@ return initScore; } - String keyParts[] = key.toString( ).split( "\\s+" ); + String keyParts[] = key.toString( ).split( "\\s+", 2 ); if ( keyParts.length != 2 ) { @@ -201,7 +196,7 @@ return initScore; } - float newScore = initScore * (float) ( Math.floor( Math.log( rank ) ) + 1 ); + float newScore = initScore * (float) ( Math.log( rank ) + 1 ); LOG.info( "PageRankScoringFilter: initScore = " + newScore + " ; key = " + key ); This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |