|
From: Ger H. <ge...@ho...> - 2008-04-01 23:17:21
|
On Tue, Apr 1, 2008 at 5:28 PM, Bill Y <ws...@me...> wrote: > Now that I have time from the Spam Conference, I have comments: [...] > 2: Why use 7-zip? It's just another dependency we're adding [...] > Therefore, don't do it. Okay. I have them all anyway following a 'make distribution'. Just a personal preference, nothing more. I'll put up the tar.gz versions as well next time around. No sweat. > > config_BillY.h -- configures ANY and ALL features of crm114 > > config_BillY.h -- _defines_ system library feature set > > availability on a PER-ITEM (define, type, struct, call) basis > > Um, one file, two purposes? Or two files? Two purposes. Yes, that may look confusing, but that's what ./configure produces, because on a ./configure platform you tell ./configure what you want built in (which classifiers, etc.) and it puts it all in one config.h Have a look. It's not _that_ ugly. ;-) OTOH, I can, for config_BillY.h, split it up into two: one for 'desirables' (like the classifier selection) and one for 'portability items'. Makes it a little harder for me to maintain, but if you like it that way, okay, I can do it. > I'd rather have portable code that you don't have to .configure. How am I going to say this? ... The easy answer for you personally: YOU don't have to. Use config_BillY.h and the HAVE_XXX def's in there and you should be good on any vanilla UNIX box like you always were; some stuff, like madvice()/msync()/touch() [see below] may be cut out and replaced by the old code that worked till now anyway, but that doesn't harm or otherwise hurt your use and further development of CRM114. When you add something new, copy&paste like you always did and you should be very fine indeed. The hard answer: <discarded an explanation that won't hit the neurocortex anyhow. Here's hoping...> Take a look at OpenSSL one day when you feel like it and check out their e_os.h + e_os2.h. Just page through it. That's portable code for you. And _then_ go look at one of their hash routines, say, SHA-1, and have a look at _that_. Then track down the macros, calls, types, etc. in there and their relationship with e_os/e_os2. OpenSSL is used the world all over by a great majority of users/developers. _Someone_ must have had the same thoughts as you by now, once they've seen this. Then why is it still like this after so many years? And, no, it's not like that because it's about crypto and everycryptobody is raving mad and paranoid to boot and we're not _doing_ crypto anyhow. Did I ever mention 'long' or 'int' (or several others) being non-portable for specific purposes, like, say, where bit-specific operations are performed? Say, a hash calculation? Changing the bitwidth of the int/long/what-have-you, will CHANGE the behaviour of such an implementation, causing wrap-arounds and overruns (which are some of the KEY ingredients for calculating hashes in practice: statistical tests WILL produce different results on the topic of spread, avalanche, etc. when the bit-width changes, e.g. when you stick to int/long. At the very least, the generated hash values will not match across systems. And that also means the collision-avoidance bahaviour may subtly change. Which starts to count once you're going for microgroom on any of the production classifiers in there. A thought to ponder? Then there's all the mmap()/madvice()/msync()/etc. stuff but I'll save that for a later day. And besides, it's not like you haven't .configure'd this stuff already before. It's just that ./configure is new to your workflow, but the code is already sprinkled with platform (hardware/OS/CPU) dependent #ifdef checks to Make It Work for several other folks out there. So far, so good. You're looking for portability across the board without the configuration hassle? One Code To Fit Them All? When you listen to the rep sheet presented by the Java folks, you'll GO for the machine virtualization thing like you're Born Again, Haleluja! And they'll tell you it'll be even faster execution times than compiled languages like C/C++ and the numbers will speak for them. Too bad, but there's still a little dark evil something waiting for you then. It's got patience, my dear. Because this globe is crawling with interns - interns of all ages. Interns who never were taught or simply did not attend lessons (even when physically present) because girls, parties and people in general are _much_ more interesting, don't you agree? ;-) I'm quite sure you cringe at all this, but this is why people run to virtualized languages like Java and C#, because there's just too few out there willing to learn how to cope with writing portable C/C++ and too many finding out they got stuff that doesn't port to their own system. So it's both push and pull towards virtualization. Which really is a VERY old concept, only this time around the hardware is fast enough not to cause 'virtual' to be a nuisance from the start. Anyway, I can go on for ages about this, but I get (and have been) paid serious bucks because people needed cross-portable software machinery and I've seen enough people who've written code all their (long) life, but never produced one bit that was portable beyond the OS/etc. combo they'd started developing on. I didn't get paid because I make it sound so complex, but because others proved they simply were incapable of doing it. Sometimes, they get to the 90% mark and then hit the brick wall. Hard. If you want a setup that's prepared for the 95 or 99% mark, you need to do something special: that is: realize up front what can happen to you when you drop your stuff in uncharted territory. Keep _that_ thought when you're going for portability. This one's for free. Summarizing: I don't want you to worry or bother with the portability stuff, because it requires a different mindset and it's not helpful in any way to what _I_ need most: your extensive expertise with CRM114 and the knack to find and create stuff (classifiers, etc.) that works beautifully on the weirdest data inputs. All I'm asking is that you just copy&paste like you always do and if you wonder about another HAVE_XXX macro in there: those can be tracked down and analyzed easily, one by one. Why have them in there? Same as why you now have #ifdef WIN32 in several places. Because it works for you without, doesn't for others. Just a different way, which has better potential to hit the 99% mark up there that #ifdef/#else/#endif DIY like it is today. Nothing wrong with it, I'm just pushing the envelope and that's when it starts to ache. And if the ./configure stuff doesn't work for someone, there's enough expertise on this ML to fix that. And when that _still_ doesn't work for them, there's the config_BillY.h + makefile for them too. Only _they_ might have to change their platform where they run crm114 on to something else. > > > > > crm114_sysincludes.h -- here's where all the voodoo magic is: > > additional autodetecting and mix&merge with config_BillY.h driver > > Driver? I don't know of any driver. Sorry, my wording for stuff like this. crm114_sysincludes.h carries the brunt of the portability stuff. It's generic, so it needs some instructions what to do right now and what not to. That's what the config.h (in case of ./configure) / config_BillY.h (yours WTHOUT the ./configure) / config_Win32.h (because that platform does not do ./configure) is for: it 'drives' the decisions in sysincludes: headers to load, functions to define, types to map, etc.. 's all. -- Met vriendelijke groeten / Best regards, Ger Hobbelt -------------------------------------------------- web: http://www.hobbelt.com/ http://www.hebbut.net/ mail: ge...@ho... mobile: +31-6-11 120 978 -------------------------------------------------- |