In order to understand how to write the driver I tried to understand how the hardware works. The knowledge base was the GPL driver supplied by Emagic and Martijn Sipkema respectively. I could figure out most of the hardware related stuff by consulting the specifications of the card components:
which are all freely available, and of course the code provided by the sources mentioned above. Still there remains some guesswork which I can't verify, since I don't have an electronic laboratory at hand. Unfortunately, the visual inspection of the PCB itself didn't reveal any more detail to me.
(Excerpt from Emagic's GPLed AW8 Windows driver)
Timeslot: | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | ... A1_INPUT: SD4: <_ADC-L_>-2-L---<_ADC-R_>---0---< WS0: _______________/---------------\_ A1_OUTPUT: SD0: <_1-L___>-------<_1-R___>-------< WS1: _______________/---------------\_ SD2: >-------<_2-L___>-------<_2-R___> WS2: -------\_______________/--------- A2_OUTPUT: SD1: <_3-L___>-------<_3-R___>-------< WS3: _______________/---------------\_ SD3: >-------<_4-L___>-------<_4-R___> WS4: -------\_______________/---------
ACON1: (0x00L<<16) // WS0_CTRL, WS0_SYNC: input TSL1, I2S (0x04L<<12) // WS1_CTRL, WS1_SYNC: output TSL1, I2S (0x04L<<8) // WS2_CTRL, WS2_SYNC: output TSL1, I2S (0x08L<<4) // WS3_CTRL, WS3_SYNC: output TSL2, I2S (0x08L) // WS4_CTRL, WS4_SYNC: output TSL2, I2S ACON2: (0L<<27) // A1_CLKSRC: BCLK1 (1L<<22) // A2_CLKSRC: BCLK1 (0L<<21) // INVERT_BCLK1 (0L<<20) // INVERT_BCLK2 (1L<<19) // BCLK1_OEN: input (0L<<18) // BCLK2_OEN: output
ADC SAA7367 ''format 1'' (ok, this is different from the excerpt above - for explanation see capture endian):
sck -_-_-_-_............._-_-_-_-_..........._-_-_-_-_- BCLK1 sws _--------------------____________________---------- WS0 (Input) sdo ___<18bit-R >_________<18bit-L >_________<18bit-R SD4
DAC TDA1305 I2S-style:
bck -_-_-_-_............._-_-_-_-_..........._-_-_-_-_- BCLK1 ws -____________________--------------------__________ WS1,WS2,WS3,WS4 data ___<20bit-L >________<20bit-R >________<20bit-R SD0,SD2,SD1,SD3
WS0, SD4, TSL1 - Analog/ digital in WS1, SD0, TSL1 - Analog out #1, digital out WS2, SD2, TSL1 - Analog out #2 WS3, SD1, TSL2 - Analog out #3 WS4, SD3, TSL2 - Analog out #4
i.e.
WS0(I) ADC/DAIO.sws
SD4(I) ADC/DAIO.sdo
WS1(O) DAC1/DAIO.ws
SD0(O) DAC1/DAIO.data
WS2(O) DAC2.ws
SD2(O) DAC2.data
WS3(O) DAC3.ws
SD1(O) DAC3.data
WS4(O) DAC4.ws
SD3(O) DAC4.data
A1_CLKSRC: BCLK1 A2_CLKSRC: BCLK1 INVERT_BCLK1 INVERT_BCLK2 BCLK1_OEN: input BCLK2_OEN: output
For capture streams, SAA7146 is the I2S slave device while the peripheral device (ADC, DAIO) is the master and generates the WS signal. ADC SAA7367 spec. says that I2S clock must be exactly 64fs (same for DAIO -> is this an I2S spec.?), i.e. 64 bits are 'clocked in' per audio sample. (By the way, AW8 system clock is 256fs, A1/2 clock source in GPL driver is BCLK1, I don't know where the divider 4 comes in). I.e. the peripheral device generates a falling edge on WS line after left and right channel samples have been transmitted to slave, and this resets the TSL-pointer.
NOTE: SAA7146a spec. says rising edge on WS0/4 input resets TSL1/2 - how does this correspond with the I2S spec., which says that a transmission starts with falling WS edge? Maybe WS signal is inverted? (see also capture endian).
If the TSL works in full duplex mode, all playback streams (handled by this TSL) must be handled in 8 timeslots as well. In case the TSL handles only playback streams, SAA7146a is I2S master and generates WS signals itself. Here we are free to use up to 16 timeslots, which is the maximum supported per TSL.
Depending on the sample format, the 'capturing' TSL can handle either 2 or 4 playback channels
2 for 20bit and 4 for 16bit samples. Remember, for the 'capturing' TSL a superframe consists of exactly 8 timeslots.
slot 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 sws ________________---------------- WS sdo <16b-L >________<16b-R >________ SD sws --------________________-------- WS sdo ________<16b-L >________<16b-R > SD slot 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 sws ________________---------------- WS sdo <20bit-L >______<20bit-R >______ SD
16bit single channel capture is not supported
Samples are transferred in DWORDs. If we sample only 2 bytes within 8 timeslots, the DWORD buffer is only half full when stored to FIFO, and besides that I understand the place pointer does not reset at EOS but simply wraps around after 4 bytes have been stored. I.e. it doesn't seem to be possible to store contiguous samples of only one 16bit channel in the FIFO within only 8 timeslots.
16bit format output channels are interleaved 1L2L1R2R
The DAC TDA1305 spec says that there must be at least 20 clock cycles (bits) per sample on the I2S bus. This means that even if we have only 16 significant bits per sample, we must wait for 20 bits to be transmitted before we change the WS line. Since we can only select bytes for playback from the current DWORD buffer and can't roll back to previous buffer contents, we can't interleave 4 channels like 1L1R2L2R. Fortunately ALSA provides routing facilities, hence the channel interleaving can be fixed with an appropriate configuration in .asoundrc (or simply swap plugs ;-)).
Note: Channel routing can be handled in ALSA layer (see Playback devices).
HW monitoring can feedback exactly 4 bytes per super frame
Feedback buffer pointer is reset at EOS and the buffer is 4bytes wide. It is possible to feed either two 16bit or one 32bit channel from the feedback buffer. Feedback buffers are local to an audio-interface -> it is not possible to use A1 feedback buffer from A2 and vice versa. Earliest possible monitoring of an audio sample is probably one timeslot after it has been sampled -> TSL could be setup as follows:
slot 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 capture: sws ----------------________________ WS (input) sdo <18bit-R>_______<18bit-L>_______ SD monitor playback: sws ________----------------________ WS sdo ________[mon-R ]________[mon-L ] SD (monitor for slot 0,1,4,5) or sdo ________[mon-R ]________<16b-R >________[mon-L ] SD (monitor for slot 4,5)
On the other hand, if (in the same timeslot) a captured byte is stored to a certain feedback buffer position after this position is read out for playback, we are max. 1 sample late (per channel) if WS lines of capture and playback channels start synchronously (for 48kHz this adds approx. 20us latency)
slot 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 capture: sws ----------------________________ WS (input) sdo <18bit-R>_______<18bit-L>_______ SD monitor playback: sws ----------------________________ WS sdo [mon-R ]________[mon-L ]________ SD possibly playback bytes that were stored to feedback buffer in the previous 'timeslot run'.
SAA7146 supports 2-period buffers per DMA only
Interrupts can be generated at a certain filling level (1st period) as well as at an upper limit (2nd period). Filling level must be 2^(5+n) bytes.
Minimum period size is 64bytes
See SAA7146 spec. p.24.
Sample clock is provided globally for all sub streams
There is one global clock source on board (TSA6060 PLL). It might be possible to choose different clock rates for each audio-interface on SAA7146, but I didn't take a closer look at this.
ADC supports fs=18-50kHz, DAC supports fs=25-48kHz
Since there is only one global clock source, fs is restricted to 25-48kHz.
In the case of 2 channel 16bit playback, it seems that slightly incorrect data is applied to the SD line
This is because we do have a 4-byte word-length on I2S but do only provide the 2 MSBs (16bit) per channel. Since SAA7146 TSL always selects an SD line for output, i.e. it is not possible to not playback anything in a given timeslot (TSLx DOD always selects an SD line), after playing back channel-1 bytes in slot 0 and 1, some crap will be played back in slot 2 and 3 (data depends on driver implementation details). Channel-2 is then played back in slot 4 and 5 while slot 6 and 7 will again be filled with crap.
Note: I don't really consider this a problem: I2S transmitts MSB first - i.e. if we clock out 4 bytes of data, the 3rd and 4th byte make a maximum error of 2^-15% (0.003% or -90dB) of the current samples max. amplitude.
SAA7146 supports endian swapping (4-bytes swap) of DWORD-Buffer, i.e. before samples are transmitted to host memory. SAA7146 register ACON1:Ax_SWAP=1 will store the sampled bytes with MSB at the lower end of the DWORD-Buffer (BIG_ENDIAN).
Format supported?:
2 channels:
S16_LE S16_BE l1 r1 l1 r1 l1 r1 l1 r1 / \ / \ / \ / \ / \ / \ / \ / \ i2s order (MSB first): A B C D E F G H A B C D E F G H DWORD-Buffer index: 3 2 1 0 3 2 1 0 3 2 1 0 3 2 1 0 required host-order: C D A B G H E F D C B A H G F E DW-Buff Ax_SWAP=0: A B C D E F G H (yes) A B C D E F G H no DW-Buff Ax_SWAP=1: D C B A H G F E no D C B A H G F E yes S32_LE S32_BE _ l1_ _ r1_ _ l1_ _ r1_ / \ / \ / \ / \ i2s order (MSB first): A B C D E F G H A B C D E F G H DWORD-Buffer index: 3 2 1 0 3 2 1 0 3 2 1 0 3 2 1 0 required host-order: A B C D E F G H D C B A H G F E DW-Buff Ax_SWAP=0: A B C D E F G H yes A B C D E F G H no DW-Buff Ax_SWAP=1: D C B A H G F E no D C B A H G F E yes
I tried to verify this but got entirely opposite results:
S16_LE, Ax_SWAP=0: **yes** S16_BE, Ax_SWAP=1: **(yes)** (channel swap) S32_LE, Ax_SWAP=0: **(yes)** (channel swap) S32_BE, Ax_SWAP=1: **(yes)** (channel swap)
Reading the specs of SAA7367 (ADC) and SAA7146 over again led me to the conclusion that the AW8 HW configuration is such that ADC (SAA7367) I2S supplies right channel first on rising edge of WS signal.
Why?:
=> SAA7367 must be configured as I2S master (SLAVE low)
=> SAA7367 is configured for format 1
Following this assumption shows the picture below which corresponds exactly to the 'measured' results above:
S16_LE S16_BE r1 l1 r1 l1 r1 l1 r1 l1 / \ / \ / \ / \ / \ / \ / \ / \ i2s order (MSB first): C D A B G H E F C D A B G H E F DWORD-Buffer index: 3 2 1 0 3 2 1 0 3 2 1 0 3 2 1 0 required host-order: C D A B G H E F D C B A H G F E DW-Buff Ax_SWAP=0: C D A B G H E F yes C D A B G H E F no DW-Buff Ax_SWAP=1: B A D C F E H G no B A D C F E H G (yes) S32_LE S32_BE _ r1_ _ l1_ _ r1_ _ l1_ / \ / \ / \ / \ i2s order (MSB first): E F G H A B C D E F G H A B C D DWORD-Buffer index: 3 2 1 0 3 2 1 0 3 2 1 0 3 2 1 0 required host-order: A B C D E F G H D C B A H G F E DW-Buff Ax_SWAP=0: E F G H A B C D (yes) E F G H A B C D no DW-Buff Ax_SWAP=1: H G F E D C B A no H G F E D C B A (yes)
Again, ALSA provides routing facilities by means of which we can easily swap the channels back (see Capture devices)
TODO: what about DAIO? What format does DAIO use? -> maybe different channel swapping is required depending on input-mode (digital/analog).
SAA7146 allows for more flexibility regarding byte-swapping for playback: any byte from the 4-byte wide DWORD-Buffer can be selected by SAA7146 register TSLx:BSEL for playback.
2 channels
S16_LE S16_BE r1 l1 r1 l1 r1 l1 r1 l1 / \ / \ / \ / \ / \ / \ / \ / \ host-order: C D A B G H E F D C B A H G F E DWORD-Buffer index: 3 2 1 0 3 2 1 0 3 2 1 0 3 2 1 0 i2s order (MSB first): A B C D E F G H A B C D E F G H BSEL: 1 0 3 2 1 0 3 2 0 1 2 3 0 1 2 3 S32_LE S32_BE _ l1_ _ r1_ _ l1_ _ r1_ / \ / \ / \ / \ host-order: A B C D E F G H D C B A H G F E DWORD-Buffer index: 3 2 1 0 3 2 1 0 3 2 1 0 3 2 1 0 i2s order (MSB first): A B C D E F G H A B C D E F G H BSEL: 3 2 1 0 3 2 1 0 0 1 2 3 0 1 2 3
4 channels
S16_LE S16_BE r1 l1 r2 l2 r1 l1 r2 l2 / \ / \ / \ / \ / \ / \ / \ / \ host-order: C D A B G H E F D C B A H G F E l2 l1 r2 r1 l2 l1 r2 r1 / \ / \ / \ / \ / \ / \ / \ / \ required aw8-order: E F A B G H C D F E B A H G D C (swap by alsa?) DWORD-Buffer index: 3 2 1 0 3 2 1 0 3 2 1 0 3 2 1 0 i2s order (MSB first): A B E F C D G H A B E F C D G H BSEL: 1 0 3 2 1 0 3 2 0 1 2 3 0 1 2 3
For 4-channel play back we have a channel ordering like l1,l2,r1,r2. ALSA expects channel order l1,r1,l2,r2. Again, ALSA plug-layer can fix this with an appropriate configuration in .asoundrc (see Playback devices)