From: Ionut N. <io...@ef...> - 2001-11-23 21:08:45
|
On Fri, 2001-11-23 at 20:18, Gilles Detillieux wrote: > According to Ionut Nistor: > > > > htdig does supports (afaik) 3 translations: > > > > 1. lg & gt (< >) > > 2. amp (&) > > 3. quot (") > > That's actually 4 - lt & gt are 2 separate entities. htdig also handles > ™ (153) in the 3.1.x code, which I think is non-standard, and the > full ISO Latin 1 set of entities from 160 to 255 ( - ÿ) in > both 3.1.x and 3.2 betas. Did not know about those - I assume they are done by default as there is no config option for them. > > > > Is it possible/desirable ? > > I'm inclined to agree that the 2nd approach is better. htdig currently I thought so too. It would seem logical not to have numerous conversions in the htdig, but to handle escapes instead (they ca > The problem with not translating is it would make word matches more > difficult, when words have accented character entities embedded in them. Maybe trying to escape HTML/XHTML style the words would help ? No, problems will probably appear with some other (non-HTML) files that were indexed and which may use some other encodings. > The entities would probably have to be translated to Unicode or UTF-8 > for word matching, and search words would have to be similarly encoded. > All of this would entail major rewriting of htdig and htsearch! So, yes, > it is desirable, and possible if we have the volunteers to do it (which > we don't right now), but not simple and straightforward. Hmm.. you are right. Will think about it too - also look through the code a bit (I really don't know much about how it works - just installed it first 2 weeks ago). > > The current approach works for the most part, but is not ideal. Support > for the ' entity would be easy to add, but all the other new entities > in XHTML define characters above 255, so they won't work in the current > 8-bit only, locale-specific approach. Yup, that's why the 2nd approach would seem simple - though the maching becomes a problem. Thanks for the reply. Ionut Nistor io...@ef... |