From: thorstuff <tho...@be...> - 2003-08-17 01:23:44
|
Thanks for the reply Jim, The process does not start with any files in the directory it is supposed to be indexing. Instead it just starts with a bunch of new servers. Don Griffey |
From: thorstuff <tho...@be...> - 2003-08-17 04:28:14
|
Follow up to previous post. By completely removing the rpms AND deleting the directory /var/lib/htdig I can force the program to begin indexing the pages listed in the file created by using the procedure in FAQ 5.25. It indexes all of the pages in the file as nearly as I can tell then moves on to index a bunch on adobe pages. After several minutes of adobe files I pressed CTRL-C and the program went to htmerge, rejected all of the adobe stuff and then believe it or not it worked from the website again. So the key to starting over seems to be dumping the db files with a re-install as far as that goes. Now, what do I do to get it to just index the files in the directory and not all of the adobe and whatever else it might try to index? Thanks again. Don Griffey |
From: Jim C. <li...@yg...> - 2003-08-17 05:06:42
|
On Saturday, August 16, 2003, at 10:28 PM, thorstuff wrote: > By completely removing the rpms AND deleting the directory > /var/lib/htdig I can force the program to begin indexing the pages > listed in the file created by using the procedure in FAQ 5.25. It > indexes all of the pages in the file as nearly as I can tell then > moves on to index a bunch on adobe pages. After several minutes of > adobe files I pressed CTRL-C and the program went to htmerge, rejected > all of the adobe stuff and then believe it or not it worked from the > website again. So the key to starting over seems to be dumping the db > files with a re-install as far as that goes. > > Now, what do I do to get it to just index the files in the directory > and not all of the adobe and whatever else it might try to index? Have you tried setting max_hop_count to 0? This should prevent htdig from trying to follow any of the links it finds in the documents you are providing. See the following for more info on the max_hop_count attribute. http://www.htdig.org/attrs.html#max_hop_count Jim |
From: thorstuff <tho...@be...> - 2003-08-17 14:36:32
|
Hello again Jim, >Have you tried setting max_hop_count to 0? This should prevent htdig from trying to follow any of the links it finds in >the documents you are providing. See the following for more info on the max_hop_count attribute. That worked just fine. htdig indexed the site and stopped. I really appreciate it. Don Griffey |
From: Jim C. <li...@yg...> - 2003-08-17 05:02:49
|
On Saturday, August 16, 2003, at 07:23 PM, thorstuff wrote: > The process does not start with any files in the directory it is > supposed to be indexing. Instead it just starts with a bunch of new > servers. I think the most likely cause for this type of behavior is either that htdig is not using the config file that you intend or you are running an update dig against databases that contain documents from a previous dig. The former can be tested by explicitly providing the config file with the -c command line options. The latter can be tested by removing the database files and trying another indexing run. Jim |