Thread: [Archive-access-discuss] Indexing and searching

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Hello

I have a problem indexing new jobs with haddop and nutchwax. The forum 
archive of this list doesn't works so I can't find information about it.

I index a couple of jobs crawled whith Heritrix to try NutchWax search 
and it seems to work.

I search a word in Nutchwax Search and the results are showed. But when 
I click the title or "other versions" the url was wrong. It's something 
like http://www.myurl.com/null/*/http//www.urlcrawled.com. Surfing 
examples on internet archive web I think is that "null" in path may be 
collection name used at index time, I'm right? Why null?

There's any way to list the collections used indexing?

After try it i decided to add new jobs. When I try to index new jobs 
using the same command an error appears  because the indexes directory 
in the output dir exists. How can I add jobs to this index?

Thanks

Natalia

Thread: [Archive-access-discuss] Indexing and searching

archive-access-discuss