I was frightened, that lib1funcs.asm:mulsi3 uses
bit by bit loop for int32*int32 => int32 computation.
It significantly decreases speed of my algorithms.
This should be changed to three MULXU.W
at least for H8S. The instruction MULXU.W
takes 20 cycles on H8S/2000 and on H8S/2600
it takes 4 cycles. It is far faster then loop.
I am not sure about H8H. It can be no reason
to use MULXU.W on that platform.
If nobody is working on that, I can prepare
and send patch when I will have one spare hour.