Re: Please read and speak up! (was RE: [Squeak-VMdev] RE: Pending 3.2 and VM sources @ sourceforge)

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Hi Lex,

> Does "cat < /dev/dsp > testfile" and then "cat testfile > /dev/dsp" work?
> ie, is recording working at all on your Linux box?

Actually it's better than that.  On my PowerBook (OSS/dmasound and
AWACS low-level driver):

  cat < /dev/dsp > /dev/dsp

works just fine (and as it says somewhere on opensound.com, the above
makes me a perfectly serviceable "cheesy little delay line" ;-).

FWIW, SNDCTL_DSP_GETCAPS reports DSP_CAP_DUPLEX for dmasound/AWACS, so I'd
expect to be able to open the same device twice.

On Intel (nm256av + ac97 drivers):

  cat < /dev/dsp > /dev/dsp

fails (resource not available), but

  cat < /dev/dsp > /dev/dsp1

works just fine.  DSP_CAP_DUPLEX is not reported for the nm256, so I
shouldn't expect to be able to open the same device twice -- and /dev/dsp
and /dev/dsp1 do indeed seem to be different devices with dsp1 being the
output half and dsp being the input half (but which also knows how to turn
into a "shadow" of dsp1 when it's opened for writing).

The above is all pretty much consistent with what it says in the latest
OSS API documentation.  For AWACS I can also open /dev/dsp O_RDWR (as
claimed in the OSS doc for DSP_CAP_DUPLEX devices) and then perform both i
and o through the same fd.  (This locks both directions into the same
format/rate/channels, but that's a bug in the OSS API and not an intrinsic
limitation in the AWACS LL driver.)

> If so, and thus the question is the Squeak code, you might look at the
> OSS code on SourceForge.

This is the code I started with.  I fell over the recording problems while
testing the merge from SF last week.  (FWIW, sound is the last of the
merges: everything else was done 10 days ago. ;-)  I've spent the
intervening time learning more than I ever really wanted to about
low-latency sound programming in general, OSS in particular and the
internals of its low-level drivers in gorey detail.  (But I'm a much
wiser, and therefore happier, hacker now. ;-)

(Once I got the output timing sorted out properly Squeak immediately
dropped to between 0.1 and 0.5 % CPU while playing the stereoBachFugue --
which makes me realise just how _much_ dsp could potentially be done in
Squeak without ever risking underruns...)

> It's not full duplex, but that's because
> the last time I checked full-duplex didn't work in the image, either.

Preferences at: #canRecordWhilePlaying.

(Tip-of-the-week #2: `Preferences at: #stopSoundWhenDone put: true' will
make Squeak coexist much more amicably with any other sounds apps that
might be interested in opening /dev/dsp from time to time.)

> Anyway, getting full duplex to work should be a lot easier than getting
> all the format conversions to work.

That's what I thought... but the format conversions turned out to be
utterly trivial and timing issues are turning out to be a horrendous
nightmare.

> There are some functions squeakToCard()
> and cardToSqueak() which might be handy, even if you rewrite all the rest
> of it.

Oops -- I think the format conversions were the first thing I decided had
to go. ;-)  They now don't contain any tests at all: there is one
convertor for every possible combination of { squeak, driver } x { size,
end, sign, channels } (which comes to 24 converters after you've deleted
the meaningless ones -- and just 24 lines of code when you've figured out
the right macros ;-) where each one is a trivial, tight, ultrafast loop.
Appropriate input/output converters are chosen when the device format is
queried/set after opening dsp.  Overhead: one indirect fn call per
buffer's worth of i or o.

I decided the rest had to go when I started trying to do full duplex.  In
short:

1.  Everything now respects *rigorously* the OSS API.
2.  The image was spending most of its time in tight loops because of
    the (inconsistent) way it uses the semaphores, the arbitrary 100
    frame minimum in output, and the arbitrary min read size for input
    based on the buffer allocated in the image.
3.  The conversion routines are now optimal (modulo any weirdo string
    handling insns that the compiler might not be clever enough
    to emit).
4.  select() is basically useless for full-duplex sound i/o.
5.  Different cards have different capabilities so any one of two
    (and a half ;) mutually incompatible mechanisms might be needed to
    achieve full-duplex in all cases where it's possible.
6.  The old code had more bubble gum and scotch tape holding it together
    than it had actual implementation.

And maybe this is a teeny bit extreme, but...

7.  Doing things "right" in the Squeak code panics the latest Linux
    kernel: the perfect incentive to go fix the kernel!  (The problem
    is obvious and seems trivial to fix, so I'm sucking in bitkeeper
    repositories as I type... ;-)

> Incidentally, the formats are different for recording and playback.

The only difference that I've spotted is that recording is always mono
whereas playback has the choice.  (So initiating recording when stereo
playback is already active, with a driver like dmasound which imposes the
same # channels for full-duplex [rather stupidly since the LL drivers can
happily cope with different formats in duplex] will cause
SNDCTL_DSP_CHANNELS to answer `2' when 1 channel is requested and the
support code to subsequently select a 2->1 channel converter correctly.  
Seems to work.  If you have pathological cases, I'm interested!)

> posted a changeset that updates the comments for this, but I don't know
> if they got included in any image.  I can dig up those comments again if
> it would help -- it took me a few hours to trace it all down!

I'll get back to you if/when I suspect something in the image smells
funny.  So far I can happily record and playback in half-duplex, so the
formats don't seem to be a problem.  The real headache is getting the
timing just right with full-duplex: the fragment size can be different (so
there's no single, simple relationship between read/write timing) and
select() is utterly useless because (i) it says "write me" when the image
refuses to write less than 100 frames, and (ii) it says "read me" when as
little as one frame is available -- both of which tend to keep the
player/recorder processes in such tight loops that the image has a hard
time getting screen updates out, let alone noticing incoming keyboard
interrupts.

Regards,

Ian

PS: There's a performance bug in the player process.  Rather than

       [   [self isSpaceAvailable] whileFalse: [self twiddleThumbs].
           lotsOfSounds do: [sound playInto: buffer].
           self outputNoisesFrom: buffer.
       ] rinseAndRepeat.

    I think it might make more sense to pregenerate a lead time's worth
    of samples into the buffer before deciding whether to twiddleThumbs.
    Otherwise we're effectively wasting a fragment's worth of latency
    and/or bandwidth when lotsOfSounds and/or sample frequency is large
    w.r.t. CPU speed.  Just a thought...