Menu

#209 Microphone has incredible latency; should be configurable

closed
nobody
sphinx4 (76)
5
2012-09-21
2010-04-28
No

The TargetDataLine in edu.cmu.sphinx.frontend.util.Microphone is opened with a buffer size of audioBufferSize=160000 bytes.
Enough for 5 seconds at 16kHz and 16 bit samples.
As the buffer size is passed on to the operating system, larger buffers lead to higher latency of the TargetDataLine.

In my experiments, decreasing audioBufferSize lead to severely improved responsiveness of the microphone.
Going down to 480 bytes (=15ms, thus leaving enough slack for half a frame) did not have any negative side-effects.

I hence propose to make audioBufferSize configurable and possibly to give it a much lower default than the current 5 seconds.

Note: This has nothing to do with recognition running in real-time or not: There's a Microphone.RecordingTread that continuously reads from the TargetDataLine.

Discussion

  • Nickolay V. Shmyrev

    Hi Timo, thanks for the report!

    Yes, it could be smaller, but what about at least 200 ms? The issue here as I see that if system will not be fast enough in decoding it will start loosing frames. 200 ms seems reasonble for response delay.

    Or we can make it configurable

     
  • Timo Baumann

    Timo Baumann - 2010-05-03

    Hi Nicolay,

    Microphone starts a Microphone.RecordingThread, which continually polls the microphone and pushes audio to Microphone.audioList (which is a BlockingQueue) which is then polled by Microphone.getData(). In other words, only if your VM doesn't schedule the recording thread often enough, then it will start loosing audio; but at least code-wise there should not be a problem with slow decoding.

    Whether the VM will actually stop the recording thread for too long when the decoding thread is busy, I don't know. I have currently only experimented with setups where decoding is faster than real-time anyways (and was wondering why I was still waiting about a second for my results).

    I agree that 15ms is very short (and possibly too short for general use) and 200ms will be enough for most people. At the same time, I need my microphone to be as snappy as possible (there are still delays from decoding, SDS processing, output, etc. further down in the pipeline), so having it configurable would be great for experimental users.

     
  • Nickolay V. Shmyrev

    Fixed in trunk, thanks a lot

     

Log in to post a comment.