From: Philipp K. K. <pk...@sp...> - 2012-06-20 13:06:52
|
In the smallopts branch, I am working, among other things on improvements in code generation for bitwise and in the z80-related ports. One common operation is if (v & 0x7fffffff) as the last use of v, especially in the float library, so I wanted to optimize those. Let's assume the simpler if (v & 0x7fff) for the rest of the discussion. If v is in hl, sdcc from the smallopts branch generates cp a, a adc hl, hl The result of the & is then the inverse of the zero flag: The cp a,a clears the carry, the adc hl, hl then does a bitwise left shift, the leftmost bit is shifted out, thus the result in hl zero iff (v & 0x7fff) is zero. This optimization works well. When v is in de, for the Rabbits, I wanted to generate cp a, a rl de which should works as above, but it doesn't: Tests fail, but I don't know why. Maybe I don't understand how rl works, maybe there is some simulator problem, but reading the documentation and simulator code didn't help me so far. Lee, could you have a look? The rl optimization can be enabled by changing the 0 to 1 in line 7222 of gen.c: 0 && IS_RAB && AOP (left)->aopu.aop_reg[offset]->rIdx == E_IDX && AOP (left)->aopu.aop_reg[offset + 1]->rIdx == D_IDX && isPairDead (PAIR_DE, ic))) When enabled it is used in _fseq.asm, _fsgt.asm, _fslt.asm, _fsneq.asm. Philipp |
From: Leland M. <ljm...@so...> - 2012-06-20 18:32:48
|
> When v is in de, for the Rabbits, I wanted to generate cp a, a rl de which should works as above, I've checked in a small fix to ucsim on the smallopts branch that affects the flags for several r2k instructions: rl de, rr de, and rr hl I'm currently running the regression tests for the r2k. Running the regression tests for just one targets seems to take a few hours. Mostly of the time is spent in the sdcc process, so I don't think there is much spending it up. -Leland |
From: Philipp K. K. <pk...@sp...> - 2012-06-21 08:48:03
|
Am 20.06.2012 20:32, schrieb Leland Morrison: > > When v is in de, for the Rabbits, I wanted to generate cp a, a rl de > which should works as above, > > I've checked in a small fix to ucsim on the smallopts branch that > affects the flags for > several r2k instructions: rl de, rr de, and rr hl Thanks. Should this fix be picked to trunk? It is a bug fix in the Rabbit simulator, and will thus affect the r2k and r3ka ports only. It does not affect the code generated by sdcc in any way. > > I'm currently running the regression tests for the r2k. Running the > regression tests > for just one targets seems to take a few hours. Mostly of the time is > spent in the sdcc process, > so I don't think there is much spending it up. On which system does it take that long and which command do you use to run them? On the 3 year old Core i5 with Debian GNU/Linux I use at university, running the ucr2k regression test takes less than 4 minutes (and all other ports take less time) using make -j 6: real 3m59.020s user 13m17.198s sys 0m34.294s Philipp |
From: Lee M. <ljm...@so...> - 2012-06-21 19:03:48
|
> Should this fix be picked to trunk? It is a bug fix in the > Rabbit simulator, and will thus affect the r2k and r3ka ports only. > It does not affect the code generated by sdcc in any way. This does fix a bug; I'm neutral w.r.t. putting it into the trunk. It affects the zero, sign, and parity flags; the behavior was correct for the carry flag -- the flag most people writing assembly code will use. So I would give it low or medium priority. > On which system does it take that long and which command do you use to > run them? The system is a 2-year old dual-core amd processor running cygwin on Windows 7. It has 8GB installed, so DRAM shouldn't be an issue. I'm guessing the i5 processor has 4 cores, since time/user below is 3.3x the time/real (from your measurement), but that would mean a 2 core should take 8 minutes, still a difference of more than 5x. I ran "make test-ucr2k" and timed it. That took 50 minutes by wall-clock time (which should correspond to "real" below"). So, running "make test-z80" takes that long for each of z80, z180, gbz80, r2k, and now r3ka: several hours. I think this is a cygwin issue, but I haven't figured out why cygwin in particular should be so slow. As a side note, I do not have the "STX" library install. sdcc/configure does the setup to use STL instead -- would that account for that large of a difference? Thanks, Leland > > On the 3 year old Core i5 with Debian GNU/Linux I use at university, > running the ucr2k regression test takes less than 4 minutes (and all > other ports take less time) using make -j 6: > > real 3m59.020s > user 13m17.198s > sys 0m34.294s > > Philipp > > |
From: Lee M. <ljm...@so...> - 2012-06-21 22:48:56
|
>> On the 3 year old Core i5 with Debian GNU/Linux I use at university, >> running the ucr2k regression test takes less than 4 minutes (and all >> other ports take less time) using make -j 6: >> >> real 3m59.020s >> user 13m17.198s >> sys 0m34.294s >> >> Philipp >> >> > Using STX instead of STL is a major improvement (about 20% as shown below). Note: lots of ports disabled (I don't have gputils/gpsim install on the WinXP machine). sdcc --version SDCC : gbz80/z80/z180/r2k/r3ka/hc08/s08 3.2.0 #7952 (Jun 21 2012) (CYGWIN) but for a 5 year old dual-core Windows XP machine: time make ; building the sdcc trunk ; gcc-4 / STL real 15m8.891s user 15m19.882s sys 4m17.821s ; gcc-4 / STX real 7m3.078s user 6m51.582s sys 3m34.691s ; gcc-4 / STX / -j 6 ;real 8m25.688s ;user 14m46.187s ;sys 3m33.119s cd support/regression time make test-ucr2k ; STL real 49m37.344s user 51m7.988s sys 10m58.836s ; STX real 40m45.141s user 42m19.268s sys 10m59.909s ; with STX, "make -j 3 test-ucr2k" real 22m45.547s user 44m3.052s sys 11m19.701s I used -j 3 instead of -j 6 because that was sufficient to keep the CPU use at 100%. I'm pretty sure the additional processes would increase the real time because of overhead and cache limitations. -Lee |
From: Philipp K. K. <pk...@sp...> - 2012-06-22 12:53:36
|
How long does it take to run the tests when using --oldralloc? Philipp |
From: Lee M. <ljm...@so...> - 2012-06-22 19:44:10
|
On 6/22/2012 8:53 AM, Philipp Klaus Krause wrote: > How long does it take to run the tests when using --oldralloc? For the Windows XP system, STX && --oldralloc, "make -j 3 test-ucr2k": real 11m16.453s user 22m28.462s sys 11m25.821s |
From: Lee M. <ljm...@so...> - 2012-06-22 20:00:53
|
Also, for same system (cygwin on Windows XP): time make -j 3 test-host real 10m23.344s user 17m58.011s sys 12m37.565s So, the slowness probably isn't with SDCC itself, but some interaction of the build system and the cygwin environment. -Leland On 6/22/2012 8:53 AM, Philipp Klaus Krause wrote: > How long does it take to run the tests when using --oldralloc? > > Philipp > > |