if a non-volatile bit is toggled SDCC already does the
proposed optimization.

The code snippet below shows SDCC is almost there but it
also exposes a bug:

bit my_bit; /* if volatile loop reversal is disabled */

void f_opt ()
unsigned char i;
for (i = 0; i < 256; i++)
my_bit = !my_bit;

compiles to:

mov r2,#0x00
cpl _my_bit
djnz r2,00103$

which should have been:

cpl _my_bit
sjmp 00103$