I have finally found where the problem lies with wait.
It should assemble to give an incf to the lower byte as well as the upper byte.
Even if you used a byte value(rather than a word) for any of the delays if the delay happens to be zero you are going to get a delay of 255 (units) rather than zero.

This is the assembled code for the microsecond subroutine

Delay_US
    incf    SysWaitTempUS_H, F
DUS_START
    decfsz    SysWaitTempUS, F
    goto    DUS_START
    decfsz    SysWaitTempUS_H, F
    goto    DUS_START
    return

it should assemble to give:
Delay_US
    incf    SysWaitTempUS_H, F
        incf     SysWaitTempUS.F
        bcf     STATUS,C
DUS_START
    decfsz    SysWaitTempUS, F
    goto    DUS_START
    decfsz    SysWaitTempUS_H, F
    goto    DUS_START
    return