Ainhoa,
My first instinct would now be to check the parser output - try adding another  v  to your config, (and possibly restricting your indexing to just this one file) and check the log output - it may be that htdig does not like the output from your PERL script. www.htdig.org  explains what the output means. I seem to recall you saying that you had already tested that it ran on its own, but possibly there is something not right there, or a typo in the config that neither of us can see.
 
Regards,
Mike
 
 

From: Ainhoa L [mailto:ainhoitxu@gmail.com]
Sent: Monday, February 11, 2008 9:33 AM
To: Brockington,MJ,Michael,JPGA4X R
Cc: htdig-general@lists.sourceforge.net
Subject: Re: [htdig] Htdig and MHT files

Hi Mike,
Yes you were right, I was missing that part and I didn't even noticed!
I changed the config file and wrote this:

application/pdf->text/html /usr/local/apache/htdocs/htdig-3.1.6/contrib/parsepdf.pl \

application/vnd.wap.xhtml+xml->text/html /opt/vin/mht2html.pl

vnd.wap.xhtml+xml was the MIME type for my mht documents.
So I run dig and everything seems to go fine, having at the end:


0/http://172.26.0.169/testdig/
1/http://172.26.0.169/testdig/About_comments_eex3.mht
2/http://172.26.0.169/testdig/aster.pdf
3/http://172.26.0.169/testdig/beepmacro.mht
4/http://172.26.0.169/testdig/index.txt
5/http://172.26.0.169/testdig/test.html
 
(I am doing this in a test folder)
 
But when I go to the search page, it won't find words inside the mht files. It works for the pdf, txt and html ones, but can't find the words that are in the mht ones.
 
I suppose I am missing something here... do I need to setup any other settings for the search engine?
 
Thanks a lot for all your help,
 
Ainhoa