From: Anders Bolager <anders@vo...> - 2001-11-13 18:22:16
I had a quick look at the htdig TODO list on the website, and it mentions
UTF-8 support. Some of our pages are in Korean (using UTF-8) and we would
like to be able to provide htdig searching for our Korean users. How far
has this feature gotten?
I just wanted to ask (like many before), if htdig indexes UTF-8 encoded
I ask, because I thought, that htdig isn´t capabale of doing this and the
This site uses htdig AND utf-8! Any idea, how this can be???
+++ GMX - Mail, Messaging & more http://www.gmx.net +++
Bitte lächeln! Fotogalerie online mit GMX ohne eigene Homepage!
From: Andreas Johansson <andreas.johansson@we...> - 2005-11-18 15:26:38
Hi, I'm currently integrating ht://Dig search functionality into some
sites when I just realized that UTF-8 is not supported by ht://Dig and
we are currently releasing localized sites for Kina, Japan, Russia,
Poland.....and more...so we just converted to UTF-8.
I'm now looking for a good Open Source search crawler that supports UTF-8.
Preferably a search engine as good as ht://Dig, if there is one... ;-)
Anyone have any idea?
Andreas Johansson, Sweden
One of the sites I'm managing is in Spanish. The ht://Dig=20
search there works perfectly (I just had to change the=20
locale and make the htfuzzy database, etc.) However, the=20
program manager for the site has a strange request - she=20
wants people to be able to type search terms without=20
accents, and have the search find accented terms.
So right now it works fine in the other direction, "ni=C3=B1os"=20
finds "ni=C3=B1os" or "ninos", but she wants "ninos" to find=20
"ni=C3=B1os". Clearly it would be crazy to ask the search to=20
match the term to all possible accented versions of itself=20
(=C3=B1=C3=AF=C5=84=C3=B4=C5=9B), so is there a way to get htdig to ignor=
when it's indexing, so that instead of indexing "ni=C3=B1os", it=20
indexes "ninos"? (And would this inevitably erase the=20
accents in the excerpts as well?)
- Nada O'Neal
From: Jim Cole <greyleaf@yg...> - 2001-11-16 00:47:20
Anders Bolager's bits of Tue, 13 Nov 2001 translated to:
>I had a quick look at the htdig TODO list on the website, and it mentions
>UTF-8 support. Some of our pages are in Korean (using UTF-8) and we would
>like to be able to provide htdig searching for our Korean users. How far
>has this feature gotten?
Currently there is no support for UTF-8. My understanding is that
UTF-8 support is more of a wish-list item than a todo. The active
developers are very busy with other issues and no one else has
stepped forward to work on UTF-8 support.