Menu

#94 ORANGE: VAD feature causes Kaldi to stop listening

1.0
open
nobody
None
2015-11-11
2015-11-06
No

Apparently, the way Cairo's VAD is implemented is that
(a) it waits until it believes speech has ended for 500ms (speech time-out)
(b) it stores an audio file of the speech snippet
(c) it verifies whether Sphinx actually produced a valid hypothesis, in which case, the logs say

DataEndSignal encountered!

(d) if, however, no valid hypothesis was created, Sphinx keeps on listening and adds to the audio stream
(e) until it finally receives a valid hypothesis after storing yet another audio file

Interestingly, the audio files (e) contains the wave form of (b) as its beginning. The attached examples are from call ID 397d7b59ea1255c9b140599d72077805@141.31.8.74

2 Attachments

Discussion

  • David Suendermann-Oeft

    • Attachments has changed:

    Diff:

    --- old
    +++ new
    @@ -1 +1,2 @@
     201511061909270438.wav (13.5 kB; audio/x-wav)
    +201511061909270767.wav (23.9 kB; audio/x-wav)
    
     
  • David Suendermann-Oeft

    The way Kaldi is currently integrated in Cairo's VAD mechanism is unfortunately not compatible with the above described feature. Kaldi stops listening after (b) and makes Cairo return its hypothesis to JVXML which moves to the next step, while the VAD is still active with the former recognition. This clash can cause the call go nowhere as examplified by call ID 397d7b59ea1255c9b140599d72077805@141.31.8.74.

     
  • David Suendermann-Oeft

    Another call with three excellent examples of this feature is d8a3b2ca4ba0f7087b24a989dd4aef83@141.31.8.61, which, however, did not cause the call crash (likely since speech was over b/f Kaldi was re-instantiated).

     

    Last edit: David Suendermann-Oeft 2015-11-06
  • David Suendermann-Oeft

    Two of today's 4444 production calls seem to have died b/c of this issue:

    397d7b59ea1255c9b140599d72077805@141.31.8.74
    0313cc1c29447eec57de7b38e701a87d@141.31.8.74

     
  • David Suendermann-Oeft

    • summary: VAD feature causes Kaldi to stop listening --> ORANGE: VAD feature causes Kaldi to stop listening
     
  • David Suendermann-Oeft

    I created views (vad, vad2) to monitor this issue. In 1104 production calls, 122 were affected by the issue:

    create table tmp select * from vad2;
    select count(1) from completeCalls a, tmp b where a.callId = b.callId and extension in (1111, 2222, 3333, 4444);
    select count(1) from completeCalls extension in (1111, 2222, 3333, 4444);

    However, calls do not necessarily die. Also, the issue does likely not exist with the Sphinx implementation, but with the Kaldi one. Looking at the Kaldi production extension 4444 after Nov 4, we have 33 calls, out of which 7 faced the issue and 2 died:

    create table tmp select * from vad2;
    select count(1) from completeCalls where extension in (4444) and timestamp > 20151104;
    select count(1) from completeCalls a, tmp b where a.callId = b.callId and extension in (4444) and a.timestamp > 20151104;
    select count(1) from completeCalls a, tmp b where a.callId = b.callId and extension in (4444) and a.timestamp > 20151104 and code is null;

    These are not yet reliable statistics, but it is definitely worth monitoring

     
  • Patrick L. Lange

    Now, triggering SpeechEndSignal when DataEndSignal is encountered to prevent Kaldi crash when two/more utterances are detected by VAD within one dialog turn. This change is commited but not deployed to production yet.

     
  • Patrick L. Lange

    I discovered a bug produced by the timeout code. When a timeout happens no dataend signal is produced and recognitions afterwards also do not produce DataEndSignals.

     
  • Patrick L. Lange

    I noticed I was starting recognizer.recongize() twice. Now I only do it once but had to remove DataEndSignal to make it work. Recognition and timeouts are functional. I would like to deploy this and see if Kaldi still crashes because of double recognition.

     

Log in to post a comment.

MongoDB Logo MongoDB