|
From: Rugxulo <ru...@gm...> - 2013-05-03 08:21:52
|
Hi, On Fri, May 3, 2013 at 1:48 AM, Eric Auer <e....@jp...> wrote: > > please explain the hack / patch: Is the only thing that > you changed that the kernel is compiled for those CPUs? > Are there actually any differences between them? I can > imagine that OpenWatcom makes 186 and 286 the same and > everything above 386 the same. The person to really ask would be Bart, but (IIRC) he's busy these days. IIRC, wcc.exe is the 16-bit compiler and wcc386.exe is the 32-bit one. I'm honestly not sure if wcc.exe supports any 386 stuff at all (at least not 32-bit segments). IIRC, wcc386.exe only "tunes" code for higher processors, so everything "should" still work for 386, even if using -6. A very quick check on a very small file shows no extended (E[ABCD]X) registers used at all with "wcc -3 -za", only like two instances of minor stuff, e.g. "shl bx,2" (aka, 80186). Hmmm, I do see a "movzx ax,byte ptr _blah" in there, too, and "sete al" (both 80386). And some (extended) "imul" stuff (186? 386?). I'm no cpu expert, but very little (if anything) is to be gained with 486 or 586 opcodes vs. plain vanilla 386. The only optimizations worth doing, IMO, are simple register reordering and avoiding pipeline stalls. In other words, going from 8086 to 186 would show some minor improvements. 286 is just introducing pmode, so pure calculations probably won't show any difference unless dependent upon RAM. 386 to 486 to 586 instructions are basically all the same (minus the order of instructions). Most compilers don't use BSWAP, XADD, CPUID, RDTSC, etc. 686 adds a fair bit stuff but doesn't always help speed (e.g. CMOV). Actually, even some 186 stuff (ENTER, LEAVE) or string instructions (STOSB, SCASB) can be slower than simpler alternatives after the 486, so it's often (but not always) avoided by compilers, e.g. GCC. 32-bit registers "usually" do 32-bit math faster than similar 32-bit calculations in 16-bit. But a lot of other stuff can affect speed too (calling convention, libraries, OS calls, malloc). (Besides, too many variants of x86 these days, hard to target any one effectively. Probably best to not worry about it unless direly important or prepared to profile heavily.) BTW, I'm not sure if I consider it wise or worth the effort to compile a 386 FreeDOS kernel. Actually, IIRC, the RUFUS USB installer uses a 386 build, which although not out of the question (as anything with USB support is most likely 386+) is probably totally useless (and bad if copying to older machines, which again ... probably rare, but ....) > Unless the kernel would > contain heavy mathematical processing for which it is > obvious that above-386 optimizes better ;-) I would highly doubt it, but I've not studied the kernel in depth. You know 1000x more than I do, Eric. I think most kernels try to avoid FPU, but here I assume you just meant general integer stuff. Are we targeting real 8086 or just 8086-compatibles? In other words, what's faster on an 8086 might be (relatively) slower on a 486 than different ways of doing the same thing. > You could tell the compiler to produce Assembly output (instead > of binary) and compare the text. wcc.exe only outputs .OBJ directly (for speed?), so you have to use wdis.exe on the resulting .OBJ for (dis)assembly. > Or you could use some > debug, disassembler (ndisasm?) or hex editor to compare > before you UPX things, but of course that is more work. BTW, just because it UPXes smaller doesn't mean much. For one, you can't really predict what will compress better, and it always changes. Secondly, the main other thing to be concerned with is cluster size (and waste). So a 33 kb UPX'd kernel using 64 kb of "actual" storage because of 32 kb clusters would be inefficient, but 45 kb not so much (or at least harder to fix). > Thanks for comparing :-) Maybe this is more a topic for > the kernel list. Note that if "a few" bytes are really > only 10 or so, all this is probably more an "academic" > exercise. Things get more exciting once you can save at > least a cluster of disk space or a paragraph of RAM :-) BTW, I did an F5 the other day to see how much RAM a clean boot would use. It still gave me approx. 500 kb (where as normally with only XMS I get approx. 600-620 kb, depending on TSRs loaded). That's hardly what I'd call terribly bloated (despite one guy's complaint recently). I'm sure it could be shrunk more, esp. if you don't need the full .BAT language (GCOM, anyone?). >> I hacked the 2041 kernel batch and make files included on the FD 1.1 iso to >> allow the kernel to be built by OpenWatcom as 8086, 186, 286, 386, 486, >> 586, or 686. The resulting 686 kernel boots fine in VirtualBox 4.2.12 in >> OSX 10.8.3 on my 2012 Mac Book Air 13" 4GB. The resulting kernel is a few >> bytes smaller compressed by upx than kernel installed by the FD 1.1 iso. >> I'm going to continue testing. No source changes were made. Not sure how >> the changes affect the nasm built files. There aren't very many NASM files, IIRC, only like two. And since OpenWatcom doesn't call NASM (nor WASM) for its inline assembly or "pragma aux", it won't affect those. Depending on your NASM version, you can't really change the cpu output, but if you find an option (e.g. "-O9v" or "-Ox" [default in latest]) you want to globally use, either "set NASM=-O9v" or "set NASMENV=-O9v". (Can't remember which is latest, probably just plain %NASM%.) There's nothing preventing anyone from hardcoding higher (e.g. 686) instructions in the kernel (preferably with CPUID testing so not to bork on older cpus). But it's not likely to gain much (without heavy profiling first to find the bottlenecks). Though FPU / SIMD is almost certainly out of the question. (What else is there? There's too many instructions these days, e.g. BMI, RDRAND, FMA4. Blech.) |