Revision: 2520
http://archive-access.svn.sourceforge.net/archive-access/?rev=2520&view=rev
Author: miklosh
Date: 2008-07-29 06:43:31 +0000 (Tue, 29 Jul 2008)
Log Message:
-----------
Update the README with info about DocIndexer, ImageProcessor and the web UI.
Modified Paths:
--------------
trunk/archive-access/projects/nutchwax/imagesearch/README.txt
Modified: trunk/archive-access/projects/nutchwax/imagesearch/README.txt
===================================================================
--- trunk/archive-access/projects/nutchwax/imagesearch/README.txt 2008-07-29 02:18:10 UTC (rev 2519)
+++ trunk/archive-access/projects/nutchwax/imagesearch/README.txt 2008-07-29 06:43:31 UTC (rev 2520)
@@ -47,13 +47,58 @@
Then install the "nutch-1.0-dev.tar.gz" tarball as normal.
+Indexing
+--------
+After performing the usual steps to import or fetch the files, invert
+the links, indexing has to be done using the DocIndexer:
+
+ $ bin/nutch org.archive.nutchwax.imagesearch.DocIndexer <index> <crawldb> <linkdb> <segment> ...
+
+DocIndexer is based on Nutch's indexer and has to be parameterized the
+same way as Nutch's indexer. The difference between the two indexers is
+that DocIndexer does an extra MapReduce step to determine the exact
+image version to be used for image URLs embedded in HTML pages.
+
+
+Thumbnail generation
+--------------------
+Image metadata and thumbnails have to be created by the ImageProcessor:
+
+ $ bin/nutch org.archive.nutchwax.imagesearch.ImageProcessor <segmentDir>
+
+This tool processes one segment at a time, making thumbnails for any
+images found in the segment and recording some metadata about them. The
+results of this operation are stored in a directory named "image_data"
+in the segment's directory.
+
+The ImageProcessor can be configured by the following properties:
+ o imagesearcher.thumbnail.quality: specifies the JPEG quality of
+ thumbnails (specified by an integer between 0 and 100)
+ o imagesearcher.thumbnail.maxSize: specifies the maximum width and
+ height of a thumbnail
+
+In order to have thumbnails shown in the search results on the web UI,
+ImageProcessor has to be run for every indexed segment. However,
+thumbnail generation is not needed for command-line searching.
+
+
Searching
---------
-After performing the usual steps to import or fetch the files, invert
-the links and index the documents, you can search the resulting indexes
-for images by:
+After performing the steps needed for index generation, you can search
+the resulting indexes for images by:
- bin/nutch org.archive.nutchwax.imagesearch.ImageSearcherBean product
+ $ bin/nutch org.archive.nutchwax.imagesearch.ImageSearcherBean product
This calls the ImageSearcherBean to execute a simple keyword search for
"product".
+
+
+Web deployment
+--------------
+The web application for image searching can be built by invoking the
+following command in this contrib's directory:
+
+ $ ant imagesearch-war
+
+This will generate a WAR file named "imagesearch.war" in the "build"
+directory of Nutch, which can be deployed as usual.
This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site.
|