Hi guys,


this is my first adventure with htdig. With one of my colleagues we have installed successfully htdig 3.2.0b6. As most of users, I think, we need the search engine to search in other languages, in my case, in Hungarian.

I read on the official site, that htdig is not able to search on characters encoded on more than 8 bits.


In other words I needed to search for example the character ő what is encoded in html: & # 3 3 7 ;    First of all, the characters that have the code bigger than 255 were not well processed and in the search result they didn’t appear as character, it appears their html code.  The second problem is that I cannot search on them. I set up the accent algorithm and it doesn’t recognize a similarity between o and ő. So the only way, I find this words is to enter in the search field & # 3 3 7; but this is not a good solution. Is this from the same reason? Htdig cannot find them. Is there any other solution? Or I must wait for newer version.


My second question is related to search algorithms.

I want to make my search not sensible to diacritics. I used accents algorithm in combination with substring:

   substring:1 accents:1

The real problem is:

Let’s consider the word: “mambómámbo”

I think that it is normal that if I search for: „bomam” the algorithm to find the searched word, because it can be converted to „bómám” according to accents algorithm, and that is a substring of the searched word. But the search algorithm doesn’t find it. It is very interesting that if I search for “mambomambo” the searched word was found. (it was only the accents algorithm used), or if I search for “mambó” the word is found, too (substring algorithm used). It seems, that htdig cannot use a cooperation of the two algorithm. Probably is some problem with my configuration, because I saw other sites that use htdig and this problem doesn’t appear.


If anybody know a solution for any of my (two) problems, please answer. These are my final probs before integrating htdig in my website.

Thank you in advance!