Optimizing drawLinePosDxDy() just a hair, by pre-computing a slightly different
'xf' and then delaying the decrement to the end of the loop. It's slightly
faster in all cases now. This relies on the fact that [(yy + 1) * dxdy] is
equal to [yy * dxdy + dxdy]. Chopping that "+ dxdy" off essentially.