ChaN ([1]) states:
"Because the data transfer is driven by serial clock generated by host controller, the host controller must continue to read data, send a 0xFF and get received byte, until a valid response is detected. The DI signal must be kept high during read transfer (send a 0xFF and get the received data). The response is sent back within command response time (NCR), 0 to 8 bytes for SDC, 1 to 8 bytes for MMC"
Current emulation code assume NCR=0 and return the response (R1) just after the command. Thing is some drivers discard the very first byte after sending the command because might be garbage, e.g., Fuzix [2] and Linux kernel [3].
I propose to use NCR=1 for better compatibility.
[1] http://elm-chan.org/docs/mmc/mmc_e.html
[2] https://github.com/EtchedPixels/FUZIX/blob/337485699af7f84a63bc73e6e3841c22b3ab47e0/Kernel/dev/devsd.c#L143
[3] https://github.com/torvalds/linux/blob/04cbfba6208592999d7bfe6609ec01dc3fde73f5/drivers/mmc/host/mmc_spi.c#L269