|
From: Natalia T. <nt...@ce...> - 2006-06-22 10:02:28
|
Hello I have a problem indexing new jobs with haddop and nutchwax. The forum archive of this list doesn't works so I can't find information about it. I index a couple of jobs crawled whith Heritrix to try NutchWax search and it seems to work. I search a word in Nutchwax Search and the results are showed. But when I click the title or "other versions" the url was wrong. It's something like http://www.myurl.com/null/*/http//www.urlcrawled.com. Surfing examples on internet archive web I think is that "null" in path may be collection name used at index time, I'm right? Why null? There's any way to list the collections used indexing? After try it i decided to add new jobs. When I try to index new jobs using the same command an error appears because the indexes directory in the output dir exists. How can I add jobs to this index? Thanks Natalia |