From: <mi...@us...> - 2008-07-29 06:43:22
|
Revision: 2520 http://archive-access.svn.sourceforge.net/archive-access/?rev=2520&view=rev Author: miklosh Date: 2008-07-29 06:43:31 +0000 (Tue, 29 Jul 2008) Log Message: ----------- Update the README with info about DocIndexer, ImageProcessor and the web UI. Modified Paths: -------------- trunk/archive-access/projects/nutchwax/imagesearch/README.txt Modified: trunk/archive-access/projects/nutchwax/imagesearch/README.txt =================================================================== --- trunk/archive-access/projects/nutchwax/imagesearch/README.txt 2008-07-29 02:18:10 UTC (rev 2519) +++ trunk/archive-access/projects/nutchwax/imagesearch/README.txt 2008-07-29 06:43:31 UTC (rev 2520) @@ -47,13 +47,58 @@ Then install the "nutch-1.0-dev.tar.gz" tarball as normal. +Indexing +-------- +After performing the usual steps to import or fetch the files, invert +the links, indexing has to be done using the DocIndexer: + + $ bin/nutch org.archive.nutchwax.imagesearch.DocIndexer <index> <crawldb> <linkdb> <segment> ... + +DocIndexer is based on Nutch's indexer and has to be parameterized the +same way as Nutch's indexer. The difference between the two indexers is +that DocIndexer does an extra MapReduce step to determine the exact +image version to be used for image URLs embedded in HTML pages. + + +Thumbnail generation +-------------------- +Image metadata and thumbnails have to be created by the ImageProcessor: + + $ bin/nutch org.archive.nutchwax.imagesearch.ImageProcessor <segmentDir> + +This tool processes one segment at a time, making thumbnails for any +images found in the segment and recording some metadata about them. The +results of this operation are stored in a directory named "image_data" +in the segment's directory. + +The ImageProcessor can be configured by the following properties: + o imagesearcher.thumbnail.quality: specifies the JPEG quality of + thumbnails (specified by an integer between 0 and 100) + o imagesearcher.thumbnail.maxSize: specifies the maximum width and + height of a thumbnail + +In order to have thumbnails shown in the search results on the web UI, +ImageProcessor has to be run for every indexed segment. However, +thumbnail generation is not needed for command-line searching. + + Searching --------- -After performing the usual steps to import or fetch the files, invert -the links and index the documents, you can search the resulting indexes -for images by: +After performing the steps needed for index generation, you can search +the resulting indexes for images by: - bin/nutch org.archive.nutchwax.imagesearch.ImageSearcherBean product + $ bin/nutch org.archive.nutchwax.imagesearch.ImageSearcherBean product This calls the ImageSearcherBean to execute a simple keyword search for "product". + + +Web deployment +-------------- +The web application for image searching can be built by invoking the +following command in this contrib's directory: + + $ ant imagesearch-war + +This will generate a WAR file named "imagesearch.war" in the "build" +directory of Nutch, which can be deployed as usual. This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |