According to Thomas Netousek:
> I am running htdig-3.2.0-0.b3.4 from the RedHat linux 7.1 distribution and
> I am indexing documents
> which have all types of funny characters like e.g. single quotes spelled
> as ’
> I have seen other reports about the parser failing for &amp, so I am
> wondering if this could
> be sort of a similar problem ?
> Btw, I am also running htdig-3.1.5 on another machine with translate_...
> set to true and it works
> like a charm there.
I believe 3.2.0b3 will not translate numeric entities where the number
is larger than 255. 3.1.5 does, but it's a bug, because it only used 8
bit characters internally, so it only keeps the bottom 8 bits of this
number. Because in 3.2.0b3 the numeric entity isn't converted, the
"&" goes into the excerpt literally, and so it's turned into an &
entity on output so it should display literally as "&", so you will see
the numeric entity. Given the 8-bit character set limitations in both
3.1 and 3.2, I thing that 3.2's behaviour is the lesser of two evils
when it comes to handling numeric entities above 255.
Gilles R. Detillieux E-mail: <grdetil@...>
Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba Phone: (204)789-3766
Winnipeg, MB R3E 3J7 (Canada) Fax: (204)789-3930