Thank you for your help.
I adopted a temporary solution for the problem. I reinterpreted the response
of the htdig from java source code and I replaced the bad character pairs.
But I think the patch is a better solution, so I think I will try it :). And
I wait for htdig 4.0 with UTF facilities :).
Thank you again.
[mailto:htdig-general-bounces@...] On Behalf Of Andreas
Sent: Thursday, June 22, 2006 12:20 AM
Subject: Re: [htdig] UTF again
On Mon, Jun 19, 2006 at 03:14:54PM +0300, Kintzel Levente wrote:
> I know that htdig doesn't support UTF8 characters (only 8 bits
> My question is that "doesn't support" what does it means exactly?
> That means that the search doesn't work well for characters with accents
> special characters?
Yes. In other words: If you are seraching for a word with accent(s) or
umlaut(s) you will not get hits for pages where these words are UTF-8
> Or htdig cannot return the indexed pages with correct
> content if it contains UTF chars?
No. You get the hits but the UTF-8 chars are iso-8859-x interpreted
> More exactly, my web pages contain UTF8 characters, and I want to user
> for search. Let's suppose that it is OK if it doesn't search for accented
> characters, only for simple characters, but the returned pages contains
> characters. Where an UTF character was before, now there are two
> Is it a consequence of the fact that htdig cannot handle UTF characters,
> is it a configuration problem made by me?
I've written a patch for this problem. The patch simply looks if the
page is UTF-8 encoded by looking at the content-type meta tag. If so,
all doublebyte chars are converted to their 8bit counterpart. All other
chars are replaced by a quotationmark "?".
The search from and the htdig templates (header, match, nomatch, etc)
must be 8bit encoded.
I've tested this for german umlauts but it should also works for other
You can find the patch in the htdig patch archive:
! Andreas Jobs Network Operating Center !
! Ruhr-Universitaet Bochum !
! The only way to clean a compromised system is to flatten and rebuild. !
All the advantages of Linux Managed Hosting--Without the Cost and Risk!
Fully trained technicians. The highest number of Red Hat certifications in
the hosting industry. Fanatical Support. Click to learn more
ht://Dig general mailing list: <htdig-general@...>
ht://Dig FAQ: http://htdig.sourceforge.net/FAQ.html
List information (subscribe/unsubscribe, etc.)