I'm attempting to use htdig to index a secure intranet site, and I am
running into a problem with the indexing of ms word files. It seems
like the logic of the program cannot handle any files other than html,
pdf, ps and txt when running in local_urls_only mode, there is a check
for those extensions in Document.cc, if they are not found it returns
Document_not_local to Retriever.cc, which marks the file as not found.
It seems like this is fixed in 3.2.0, but that is still in beta. Could
the 4.8 FAQ entry be updated on the dig site with something like, "If
you are using 3.1.6 along with local_urls_only you will not be able to
index files other than html, pdf, ps or txt. You must use the 3.2.0
series." Maybe someone else won't have to spend the time looking up
this bug again then.
If this topic has been covered to death, sorry, the SF mailing list
search is currently down for me so I couldn't search on this topic. I
was also impressed by how understandable the code is for this project,
it was incredibly easy to find the relevant parts in the code that dealt
with the errors I was having.
Lake Agassiz Regional Library - Moorhead MN larl.org
Josh Stompro | Office 218.233.3757 EXT-139
LARL Network Administrator | Cell 218.790.2110