Please see CAIRO log message below:
2015-08-24 22:39:23,230 DEBUG ~.~.~.MrcpRequestProcessorImpl [Thread-13]: MrcpRequestProcessorImpl.processRequest()...
2015-08-24 22:39:23,250 DEBUG ~.~.~.~.~.RTPSpeechSynthChannel [Thread-10]: java.lang.InterruptedException
java.lang.InterruptedException
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2017) ~[?:1.7.0_71]
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2052) ~[?:1.7.0_71]
at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) ~[?:1.7.0_71]
at org.speechforge.cairo.server.tts.RTPSpeechSynthChannel$SendThread.run(RTPSpeechSynthChannel.java:161) [cairo-20110208.042746-5.jar:?]
2015-08-24 22:39:23,252 DEBUG ~.~.~.MrcpRequestProcessorImpl [Thread-13]: MrcpRequestProcessorImpl got response from request handler.
2015-08-24 22:39:23,258 DEBUG ~.~.~.~.~.RTPSpeechSynthChannel [Thread-10]: draining prompt queue...
2015-08-24 22:39:23,271 DEBUG ~.~.~.~.~.RTPSpeechSynthChannel [Thread-10]: taking next prompt from prompt queue...
2015-08-24 22:39:23,278 DEBUG ~.~.~.SESSION [IoThreadPool-1]: WRITTEN:
MRCP/2.0 86 1440444274032 200 COMPLETE
Channel-Identifier:14f61692dcf@speechsynth
2015-08-24 22:39:26,030 DEBUG ~.~.~.SESSION [IoThreadPool-2]: READ:
MRCP/2.0 78 STOP 1440444274034
Channel-Identifier:14f61692dce@speechrecog
2015-08-24 22:39:26,036 DEBUG ~.~.~.MrcpRequestProcessorImpl [Thread-56]: MrcpRequestProcessorImpl.processRequest()...
2015-08-24 22:39:26,039 DEBUG ~.~.~.~.SipListenerImpl [EventScannerThread]:
------------- RECEIVED A SIP REQUEST ---------------
Received a BYE SIP request
Pre-existing Dialog: gov.nist.javax.sip.stack.SIPDialog@83df40b
Content-Length: 0
Via: SIP/2.0/UDP 141.31.8.62:5090;branch=z9hG4bK-383134-dff008e011fb84a739954b69bbe616ab
CSeq: 2 BYE
From: sip:cairogate@speechforge.org;tag=1338931022
To: sip:cairo@141.31.8.60;tag=1318522150
Call-ID: f9e16913e9301d31b64b2928c43961ae@141.31.8.62
Max-Forwards: 70
No content in the request.
2015-08-24 22:39:26,049 DEBUG ~.~.~.~.~.MrcpRecogChannel [Thread-56]: Stop recognition called, mrcp channel state: 1 rtp channel state: 1
2015-08-24 22:39:26,058 DEBUG ~.~.~.~.ProcessorStarter [JMF thread: SendEventQueue: com.sun.media.processor.unknown.Handler]: controllerUpdate(): ControllerEvent received: javax.media.StopEvent[source=com.sun.media.processor.unknown.Handler@27ff9f78,previous=Started,current=Prefetched,target=Prefetched,mediaTime=javax.media.Time@5d474abb]
2015-08-24 22:39:26,061 DEBUG ~.~.~.~.SipAgent [EventScannerThread]: ------------- SENDING A SIP RESPONSE ---------------
Sending a SIP Response. Status: 200, OK
Content-Length: 0
Via: SIP/2.0/UDP 141.31.8.62:5090;branch=z9hG4bK-383134-dff008e011fb84a739954b69bbe616ab
CSeq: 2 BYE
From: sip:cairogate@speechforge.org;tag=1338931022
To: sip:cairo@141.31.8.60;tag=1318522150
Call-ID: f9e16913e9301d31b64b2928c43961ae@141.31.8.62
No content in the response.
2015-08-24 22:39:26,073 INFO ~.~.~.~.~.MrcpRecogChannel [Thread-56]: Stopping recognition. Mrcp channel state is RECOGNIZING and rtp channel state is is 1
2015-08-24 22:39:26,093 DEBUG ~.~.~.~.ProcessorStarter [JMF thread: SendEventQueue: com.sun.media.processor.unknown.Handler]: controllerUpdate(): ControllerEvent received: javax.media.ControllerClosedEvent[source=com.sun.media.processor.unknown.Handler@27ff9f78]
2015-08-24 22:39:26,101 DEBUG ~.~.~.~.~.RTPRecogChannel [Thread-56]: Closing processor...
2015-08-24 22:39:26,118 DEBUG ~.~.~.~.ProcessorStarter [JMF thread: SendEventQueue: com.sun.media.processor.unknown.Handler]: controllerUpdate(): ControllerEvent received: javax.media.StopEvent[source=com.sun.media.processor.unknown.Handler@58933ff7,previous=Started,current=Prefetched,target=Prefetched,mediaTime=javax.media.Time@4736efa7]
2015-08-24 22:39:29,119 DEBUG ~.~.~.~.~.PBDSReplicator [Thread-56]: stop()
2015-08-24 22:39:29,126 DEBUG ~.~.~.~.~.PBDSReplicator [Loop thread: com.sun.media.parser.RawBufferParser$FrameTrack@6d9aa40b]: stop()
2015-08-24 22:39:29,127 DEBUG ~.~.~.~.~.PBDSReplicator [Thread-56]: setTransferHandler(): bufferTransferHandler is null!
2015-08-24 22:39:29,140 DEBUG ~.~.~.~.~.PBDSReplicator [Thread-56]: DONE set settransferhandler()...
2015-08-24 22:39:29,145 DEBUG ~.~.~.~.~.PBDSReplicator [Thread-56]: disconnect()
2015-08-24 22:39:29,154 DEBUG ~.~.~.~.ProcessorStarter [JMF thread: SendEventQueue: com.sun.media.processor.unknown.Handler]: controllerUpdate(): ControllerEvent received: javax.media.ControllerClosedEvent[source=com.sun.media.processor.unknown.Handler@58933ff7]
2015-08-24 22:39:29,155 WARN ~.~.~.MrcpRequestProcessorImpl [Thread-56]: processRequest(): got exception from request handler:
java.lang.NullPointerException
at org.speechforge.cairo.rtp.server.RTPStreamReplicator.removeReplicant(RTPStreamReplicator.java:82) ~[cairo-rtp-20110208.040900-3.jar:?]
at org.speechforge.cairo.server.recog.RTPRecogChannel.closeProcessor(RTPRecogChannel.java:270) ~[cairo-20110208.042746-5.jar:?]
at org.speechforge.cairo.server.recog.MrcpRecogChannel.stop(MrcpRecogChannel.java:320) ~[cairo-20110208.042746-5.jar:?]
at org.mrcp4j.server.delegator.RecogOnlyRequestDelegator.stop(RecogOnlyRequestDelegator.java:108) ~[mrcp4j-0.2.jar:0.2]
at org.mrcp4j.server.delegator.RecogOnlyRequestDelegator.handleRequest(RecogOnlyRequestDelegator.java:76) ~[mrcp4j-0.2.jar:0.2]
at org.mrcp4j.server.MrcpRequestProcessorImpl.processRequest(MrcpRequestProcessorImpl.java:81) [mrcp4j-0.2.jar:0.2]
at org.mrcp4j.server.MrcpProtocolHandler$EventThread.run(MrcpProtocolHandler.java:80) [mrcp4j-0.2.jar:0.2]
2015-08-24 22:39:29,170 DEBUG ~.~.~.SESSION [IoThreadPool-1]: WRITTEN:
MRCP/2.0 86 1440444274034 501 COMPLETE
Channel-Identifier:14f61692dce@speechrecog
2015-08-24 22:39:29,178 DEBUG ~.~.~.SESSION [IoThreadPool-1]: READ:
MRCP/2.0 78 STOP 1440444274035
Channel-Identifier:14f61692dce@speechrecog
2015-08-24 22:39:29,184 DEBUG ~.~.~.MrcpRequestProcessorImpl [Thread-57]: MrcpRequestProcessorImpl.processRequest()...
2015-08-24 22:39:29,191 DEBUG ~.~.~.SESSION [IoThreadPool-2]: WRITTEN:
MRCP/2.0 86 1440444274035 405 COMPLETE
Channel-Identifier:14f61692dce@speechrecog
2015-08-24 22:39:34,076 INFO ~.~.~.~.SipListenerImpl [EventScannerThread]: Got a dialog terminated event
2015-08-24 22:39:35,996 DEBUG ~.~.~.~.RTPConsumer [RTPEventHandler]: ReceiveStreamEvent received: javax.media.rtp.event.InactiveReceiveStreamEvent[source = RTPManager
SSRCCache com.sun.media.rtp.SSRCCache@673e47ea
Dataport 42050
Controlport 42051
Address 141.31.8.60
RTPForwarder com.sun.media.rtp.util.PacketForwarder@29e3889a
RTPDemux com.sun.media.rtp.RTPDemultiplexer@191d4bf2]
2015-08-24 22:39:58,051 DEBUG ~.~.~.~.SipListenerImpl [EventScannerThread]: Got a transaction terminated event
The above log snippet is immediately preceded by the following:
958466 Exception in thread "Thread-31" java.lang.NullPointerException
958467 at org.speechforge.cairo.server.recog.sphinx.SphinxRecEngineJSGF.waitForResult(SphinxRecEngineJSGF.java:199)
958468 at org.speechforge.cairo.server.recog.sphinx.SphinxRecEngineJSGF.access$100(SphinxRecEngineJSGF.java:52)
958469 at org.speechforge.cairo.server.recog.sphinx.SphinxRecEngineJSGF$RecogThread.run(SphinxRecEngineJSGF.java:249)
That is, this is a duplicate of https://sourceforge.net/p/halef/tickets/82. I will close the other case in keep this one open.
We are now able to track this issue in the DB:
select * from haleflogs where message like '%NullPointerException%';
This is also happening with Kaldi:
Exception in thread "Thread-185" java.lang.NullPointerException
at org.speechforge.cairo.server.recog.sphinx.KaldiRecEngineWFST.vad(KaldiRecEngineWFST.java:411)
at org.speechforge.cairo.server.recog.sphinx.KaldiRecEngineWFST.access$200(KaldiRecEngineWFST.java:84)
at org.speechforge.cairo.server.recog.sphinx.KaldiRecEngineWFST$VADThread.run(KaldiRecEngineWFST.java:422)
on TJR199
And another one on TJR212:
One can now view NPE stats in the DB by running:
drop table tmpId;
create table tmpId (select callId from hli where message like '%NullPointerException%');
select * from cr where cairoCallId in (select callId from tmpId);
Yesterday, we had 7 NULL calls in production:
create table tmpI (select cairoCallId from cr where date = 20150827 and halefcheck = 'PASSED' and c = 0 and extension = 2222);
Five of them had NPEs:
select * from hli where callId in (select * from tmpI) and message like '%nullp%';
69285d5dc994661579c70c117a1382b9@141.31.8.62: no caller input
273c17682e0b45d766b05765d556e585@141.31.8.62: Halef stops listening after "sorry I did not catch that does lunch on Tuesday work for you"
9648c5ac8e92918e39edda2ced7dfec2@141.31.8.62: Halef stops listening after "it would be nice if we could..."
943fd2b9b5b5180683878a20c0dd9113@141.31.8.62: Halef stops listening after "it would be nice if we could..."
8fb7e93bc6b7474376722b8dac862693@141.31.8.62: input is just noise
I now placed a successful call into 7704. So, Halef is still up. @Patrick, there are 14 NPEs today:
select * from calls where callId in (select * from tmpId) and timestamp > '20150929';
Are they the cause of the issue?
Yours,
DSO
select count(1) from completeCalls2 where timestamp > '20150927';
214 calls this week so far
select count(1) from completeCalls2 where timestamp > '20150927' and extension in (1111,2222,3333,4444);
169 in production
select count(1) from completeCalls2 where timestamp > '20150927' and extension in (1111,2222,3333,4444) and code is null;
131 of null production calls (!)
select count(1) from completeCalls2 where timestamp > '20150927' and extension in (1111,2222,3333,4444) and code is null and halefcheck = 'PASSED';
60 of null production calls (!) passing to Halef
select * from completeCalls2 where timestamp > '20150927' and extension in (1111,2222,3333,4444) and code is null and halefcheck = 'PASSED' and cairoCallId is not null;
49 of these calls made it into Halef
create table tmpnpe as select callId from npe where databasedate > '20150927';
38 NPEs this week
select * from completeCalls2 where timestamp > '20150927' and extension in (1111,2222,3333,4444) and halefcheck = 'PASSED' and cairoCallId is not null and cairoCallId in (select * from tmpnpe);
34 of those into production Exts
select code from completeCalls2 where timestamp > '20150927' and extension in (1111,2222,3333,4444) and halefcheck = 'PASSED' and cairoCallId is not null and cairoCallId in (select * from tmpnpe) and code is null;
24 of them producing incomplete calls.
These are, however, complete calls featuring NPEs:
| 6198dcf84cef5a147a0d724095981bce@141.31.8.62 |
| e08405830d8001612ad879133b09bf0e@141.31.8.62 |
| 08d1dfbc48331b1a66be8034a90c6dd0@141.31.8.62 |
| ebf44e881db7410ff5fdf43384826d89@141.31.8.62 |
| 747a82e2a6d33e4c98216db6785ac465@141.31.8.62 |
| 26fb430a98e8267434a7232e4736faf0@141.31.8.62 |
| d9bebbbdee3a08a22b9d12bfd2406e25@141.31.8.62 |
| afc3212933c95076db26b58f6ff97fcb@141.31.8.62 |
| 83e6744f7e466e9c6c9e32cdbbc0f9fe@141.31.8.62 |
| 0f7573c3703d6c7cdba0179a33da1ef5@141.31.8.62 |
select cairocallid from completeCalls2 where timestamp > '20150927' and extension in (1111,2222,3333,4444) and halefcheck = 'PASSED' and cairoCallId is not null and cairoCallId in (select * from tmpnpe) and code is not null;
It would be interesting to understand whether NPEs always lead to Halef dying or not. What about the above examples?
Just called into 7702. After the third speech input, I got (on TJR200, Cairo logs)
1568784 2015-10-14 20:00:14,696 DEBUG ~.~.~.MrcpRequestProcessorImpl [Thread-770]: getNextEvent(): waiting to take event from queue...
1568785 2015-10-14 20:00:14,701 DEBUG ~.~.~.~.~.~.RawAudioProcessor [Thread-776]: adding DataStartSignal...
1568786 2015-10-14 20:00:14,724 DEBUG ~.~.~.~.~.~.SphinxRecEngine [Thread-777]: recognize without hotword mode enabled
1568787 2015-10-14 20:00:14,746 DEBUG ~.~.~.~.~.~.SpeechDataMonitor [Thread-777]: >>>>>>>>>>>>>>> DataEndSignal encountered!
1568788 2015-10-14 20:00:14,751 DEBUG ~.~.~.~.~.~.SpeechDataRecorder [Thread-777]: >>>>>>>>>>>>>>> DataEndSignal encountered!
1568789 2015-10-14 20:00:14,758 DEBUG ~.~.~.~.~.~.SphinxRecEngine [Thread-777]: ---------- Recognizer result is NULL -------
1568790 Exception in thread "Thread-777" java.lang.NullPointerException
1568791 at org.speechforge.cairo.server.recog.sphinx.SphinxRecEngineJSGF.waitForResult(SphinxRecEngineJSGF.java:199)
1568792 at org.speechforge.cairo.server.recog.sphinx.SphinxRecEngineJSGF.access$100(SphinxRecEngineJSGF.java:52)
1568793 at org.speechforge.cairo.server.recog.sphinx.SphinxRecEngineJSGF$RecogThread.run(SphinxRecEngineJSGF.java:249)
The call waited here until I hung up.
What is this? Is it possible that a NPE is thrown b/c the recognizer returned an empty string?
Another NPE of the type "Recognizer result is NULL" in call 7792f6f2aa58e7cc2c7d7b6afe4da297@141.31.8.62
And another one: cb58b7aad743ebb54c5f4387556a689e@141.31.8.62
It is weird. I just produced another three NPEs in a row:
26a02c3394937661bc8f57738b570a67@141.31.8.62
48ecd3ed6065aed351ca2e2c0fd11b89@141.31.8.62
308614133a3b5c774054ad8d33ad67b5@141.31.8.62
and, after a restart of Halef, the call went through smoothly:
37f2f1b093d90831e41860d26144e4ce@141.31.8.62
I also observed, that in those calls affected by NPEs, the recognizer was much more sensitive, effectively picking up background noise as speech, while, after restart, I had to speak clearly to trigger a speech event.
This NPE currently does not seem to be logged in the DB (see the above calls as examples). However, the message ' ---------- Recognizer result is NULL -------' does get logged.
create table tmp as select callId from hli where message='---------- Recognizer result is NULL -------';
--262
select count(1) from completeCalls where callId in (select * from tmp) and code is null;
--221
84% of the calls exhibiting this issue are "not completed";
This is related to sphinx code. The result object is null in these cases. I am interested if we see this in the kaldi-enabled version because this code is used as VAD.