From: Iosif F. <ife...@ne...> - 2001-11-28 08:55:40
|
Just an addition to make aware the problem to whoever would attack the translations pitfalls: it's rather long since I worked out a patch to make htdig fit in our needs. I never got the time to really get involved and put some serious work in this; however, the problem is till there and if anyone will get involved, maybe it would be worth being aware of it. I'm speaking for Romanian as language used in indexed documents. Since there always are problems with ISO-8859-2 chars, many users actually choose not to use them at all. In consequence, spellings with accented chars or with there unaccented counterparts are used in mixed fashion. The approach we took was to simply transform _all_ accented chars in their unaccented counterparts before indexing; the same of course before searching. Without the ability to do that, I'm afraid that our indexing wouldn't have been as successfull as it proved to be. It's true that, in this way, we cannot search for example only for the accented words, not showing the others - but users proved to be much more resilient in getting some more (slightly missed) hits, than not getting the relevant ones... Even if kept only as an option, the possibility to work like that definitely should be present in future versions of htdig. Thank you. Iosif Fettich |