Removing the line "if ( det == 0 ) return" below causes a slow down in the resultant code. Removing this line causes the use of the stack much more and hl much less in the rest of the function.
I didn't need the variable det anymore and I wanted to remove that code, but I've left it in because it resultant code is faster. The ideal is to have it run just as fast without the unneeded code.
I've attached the two different asm files with the "return" line removed and included.
/// GPL 2.0 or greater
#include <stdlib.h>
#include <stdint.h>
#include <stdbool.h>
#define PRECISION 64
#define VISIBLE_WIDTH 32
#define MAX_ENEMIES 90
int16_t div16i16x16( int16_t a, int16_t b );
typedef struct {
uint16_t x;
uint16_t y;
} maths_point;
typedef struct {
maths_point position;
} EnemyData;
EnemyData enemyData[MAX_ENEMIES];
uint8_t enemyCount = 0;
void EnemyInit() {
enemyCount = 90;
}
void EnemyRender( const void *heights ) {
maths_point position;
maths_point direction;
maths_point plane;
int16_t posX = position.x;
int16_t posY = position.y;
int16_t gridX = position.x / 16;
int16_t gridY = position.y / 16;
int16_t planeX = plane.x;
int16_t planeY = plane.y;
int16_t dirY = direction.y;
int16_t dirX = direction.x;
int16_t planeXdirY = planeX * dirY;
int16_t planeYdirX = planeY * dirX;
int16_t planeDirDiff = planeXdirY - planeYdirX;
int16_t det = 0;
for( uint8_t sorted = 0; sorted < enemyCount; sorted++ )
{
EnemyData *enemy = &enemyData[sorted];
#define MIN_DIST 8*4
if ( abs( enemy->position.x - gridX ) > MIN_DIST ) continue;
if ( abs( enemy->position.y - gridY ) > MIN_DIST ) continue;
int16_t spriteX = ((uint16_t)enemy->position.x) * 16 - posX;
int16_t spriteY = ((uint16_t)enemy->position.y) * 16 - posY;
if ( det == 0 ) {
det = planeDirDiff / PRECISION;
if ( det == 0 ) return; /// <-------- Remove for large slowdown
}
int16_t planeYspriteX = planeY * spriteX;
int16_t planeXspriteY = planeX * spriteY;
int32_t planeSpriteDiff = ((int32_t)(planeXspriteY)) - planeYspriteX;
if ( labs(planeSpriteDiff) >= 32766 ) continue;
int16_t transformY = planeSpriteDiff / PRECISION;
int16_t dirYspriteX = dirY * spriteX;
int16_t dirXspriteY = dirX * spriteY;
int32_t dirSpriteDiff = ((int32_t)(dirYspriteX)) - dirXspriteY;
if ( labs(dirSpriteDiff) >= 32766 ) continue;
int16_t transformX = dirSpriteDiff / PRECISION;
#define SPRITE_X_MULT 4
#define SPRITE_VISIBLE_WIDTH ( VISIBLE_WIDTH / 2 * SPRITE_X_MULT )
int16_t spriteScreenX = SPRITE_VISIBLE_WIDTH + div16i16x16( SPRITE_VISIBLE_WIDTH * transformX + transformY / 2, transformY );
}
}
sdcc -mz80 slowdown.c -c
sdcc -v
SDCC : mcs51/z80/z180/r2k/r2ka/r3ka/sm83/tlcs90/ez80_z80/z80n/ds390/pic16/pic14/TININative/ds400/hc08/s08/stm8/pdk13/pdk14/pdk15/mos6502 4.2.9 #13706
Since the generated code works, this isn't a bug.
This is an effect of at least the following aspects of optimization in SDCC
1) The register allocator falls back to heuristics when it can't generate provably optimal code. The more possibilitites it is allowed to consider, the better the code tends to get.
2) The default optimization target considers code size much more important than speed.
So if you want the code to be fast, you should choose optimization options accordingly. E.g. when using --opt-code-speed --max-allocs-per-node 25000 I see better code being generated. The first option changes the optimizer bias towards speed. The second allows optimizations (including hte register allocator) to consider more possibilitites. In general the effect of the second option is: Higher numbers tend to result in longer compile times (and higher memory usage of SDCC during compilation) and better code being generated.
Ticket moved from /p/sdcc/bugs/3494/
Can't be converted: