|
From: Gilles D. <gr...@sc...> - 2002-04-22 22:24:29
|
According to Thilo Bauer: > Gilles, > > I send this to you via private communication, cause I think we should > not discuss about development within the htdig-general mailing list. > > Please don't think negative about my last contribution here! Well, there is the htdig-dev mailing list for discussions about development. I'm cc'ing that list. You should consider subscribing. A lot of discussions are sort-of "borderline" so they can be on either htdig-general or htdig-dev. We tend not to make too much of a fuss about posting to the "wrong" list, but prefer when the right one is used (for obvious reasons). Also, as some discussions may evolve from configuration issues to development issues, or vice-versa, we sometimes move a discussion from one list to another. > I know, that the developer team has done a good job and they are > struggling with many details. But, as you said, the dependency on > locales is an old problem and should be solved now. ... > Actually I don't know how to join the developer team. But maybe > you don't really need my help here. Otherwise let me know about > how to join the team. First, subscribe to htdig-dev. See http://www.htdig.org/mailarchive.html Then, you can maybe let the other developers know a bit about what your skills are, and what parts of htdig, beside locale support, you want to work on. If you want CVS access, you may want to sign up for a SourceForge user ID. Otherwise, you can always submit changes as patches. You'll also want to get familiar with the ht://Dig documentation and FAQ, as well as the source code. You may also find it helpful to look through the htdig-dev archives to see what topics have been discussed by developers in the recent past. > With a little help of the team and having a > closer look on the code (more than I actually did), I think I'm able > to get an acceptable solution. I think it is just easy to get rid of the > dependencies on locales in htdig. AND I think, I formerly posted a > hint to the right way. > > "How to do" alternatives: > > 1. Setup a FAQ, which describes how to setup missing locales by "mklocale" > > I just posted a mail on how to do this with OS X Server. > Unfortunately this seems only to work with BSD style environments. > I didn't find a solution to do the same job with Cygwin. This command > isn't implemented here. The same may be true with other operating > systems. This may be quite helpful, if we stick to locale-dependent code. I've slowly been adding to FAQ 5.8 and 4.10 as more and more questions on the subject come up, but there's always room for more info. The problem, as you pointed out, is that the setup of locales is very system-dependent. > 2. Build a new htdig version > > This seems to be a more portable solution: > > I assume, that htdig relys on "libc.a" and using ANSI functions > like "tolower()", "isprintable()", and so on. Having a look on the > source, this seems correct. > > With the knowledge about how the UNIX command "mklocale" > works and how to setup or retreive any source file for this command, > there seems to be evidence, that this behaviour could easily simulated > within a "htdig-private" library. > > To get at least ISO 8859-1 (or similar) behaviour, one has to exchange > the libc-calls mentioned above by relativeley simple C functions, which > will exactly do the same what the standard libs calls do, when retreiving > the values of the functions above. They check for locale descriptions > to achieve the correct mapping and retrieval for "tolower", "isprintable" > and so on. > > This is true: check this simply by setting up a customized locale with > "mklocale" on a BSD style system, like I did it. Of course, everybody > struggling with locales knows about this fact! > > The expressions how to map the characters into their corresponding > results with "tolower", etc. are descriped with in the source files for > "mklocale". > > So what in the world could be easier to setup my own functions > "tolower", etc.? The expressions can be found within the corresponding > source files for mklocale. > > Sure, one has to introduce new configuration file attributes to > be able to setup htdig-internal "locales" for the different ISO definitions > by a simple mapping inside the "htdig-private" expressions coded > within the new library. > > But, I think that's all. > > And I think this is THE portable solution to get rid of locale dependencies > from the underlying operating system. > > What do you think? See http://www.geocrawler.com/archives/3/8822/2001/3/200/5331122/ for something I proposed just over a year ago, which would allow you to supplement or override the definitions from broken locales. Yes, the changes wouldn't be enormous, as much of the locale-specific stuff is fairly localized. The problem is that as long as we stick to 8-bit character support, any solution will lack the generality we ultimately want. So, we can come up with an interim solution to get around locales, while still sticking to 8-bit characters, but then this new code would likely need to be thrown out or updated when we finally support a full character set like UTF-8 or Unicode (which would be a much bigger job). I'm also concerned about not losing any functionality we have in the code right now. When we switch to UTF-8 or Unicode (or some other all-encompassing character set), we can dump the locale support safely because we no longer need to configure htdig for a specific, limited character set. However, as long as we're just patching in an alternative to locales, but sticking to a limited 8-bit character set, we need to have full configurability for all 8-bit character sets potentially supported now using locales. To me, that would mean either keeping locale support, but allowing the user to override this, or coming up with complete mapping tables for all 8-bit character sets currently available in the locale tables of most OSes. If you're not proposing this, but full UTF-8 or Unicode support, either I'm overestimating the complexity of patching this into ht://Dig, or you're taking on a big project. However, if you feel up to the challenge, more power to you. -- Gilles R. Detillieux E-mail: <gr...@sc...> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/ Dept. Physiology, U. of Manitoba Winnipeg, MB R3E 3J7 (Canada) |