I called into Ext 7709 twice and hung up after the second system prompt. When I called in for the third time, Halef did not respond anymore (not even a ring tone), and no logs appeared in JVXML or Cairo log files. This is the last entry in the JVXML logs:
[java] 2015-08-21 20:37:39,125 DEBUG sip.SipListenerImpl [EventScannerThread]:
[java] ------------- RECEIVED A SIP REQUEST ---------------
[java] Received a BYE SIP request
[java] Pre-existing Dialog: gov.nist.javax.sip.stack.SIPDialog@433a83f3
[java] Content-Length: 0
[java]
[java] Via: SIP/2.0/UDP 141.31.8.161:5060;branch=z9hG4bK372c649d
[java]
[java] Max-Forwards: 70
[java]
[java] From: "SNFC CNTRL CA" <sip:my phone number@141.31.8.161>;tag=as5532b3c7
[java]
[java] To: <sip:7709@141.31.8.62:5090>;tag=904415333
[java]
[java] Call-ID: 7c8f6a7b2c7560df05334abb1ddb53d2@141.31.8.161:5060
[java]
[java] CSeq: 103 BYE
[java]
[java] User-Agent: Asterisk PBX 13.4.0
[java]
[java] X-Asterisk-HangupCause: Normal Clearing
[java]
[java] X-Asterisk-HangupCauseCode: 16
[java]
[java] No content in the request.
[java] 2015-08-21 20:37:39,141 INFO sip.SipServer [EventScannerThread]: Got a bye request
[java] 2015-08-21 20:37:39,148 INFO interpreter.JVoiceXmlSession [EventScannerThread]: initiating a hangup event
[java] 2015-08-21 20:37:39,160 DEBUG event.JVoiceXmlEventHandler [EventScannerThread]: notifying event 'connection.disconnect.hangup'...
[java] 2015-08-21 20:37:39,165 INFO event.JVoiceXmlEventHandler [EventScannerThread]: notified event 'connection.disconnect.hangup'
[java] 2015-08-21 20:37:39,172 DEBUG sip.SipAgent [EventScannerThread]: ------------- SENDING A SIP REQUEST ---------------
[java] Sending a BYE SIP Request
[java] Content-Length: 0
[java]
[java] Via: SIP/2.0/UDP 141.31.8.62:5090;branch=z9hG4bK-323730-753a09e3bf712feeaf86c850a11e012d
[java]
[java] CSeq: 2 BYE
[java]
[java] From: <sip:cairogate@speechforge.org>;tag=888636703
[java]
[java] To: <sip:cairo@141.31.8.60>;tag=590061143
[java]
[java] Call-ID: fe6d63cb6b5509348804e09a647e3452@141.31.8.62
[java]
[java] Max-Forwards: 70
[java]
[java] No content in the request.
[java] 2015-08-21 20:37:39,181 DEBUG sip.SipAgent [EventScannerThread]: ------------- SENDING A SIP REQUEST ---------------
[java] Sending a BYE SIP Request
[java] Content-Length: 0
[java]
[java] Via: SIP/2.0/UDP 141.31.8.62:5090;branch=z9hG4bK-323730-a1579f58e2ac3f061847b46668f51958
[java]
[java] CSeq: 1 BYE
[java]
[java] From: <sip:7709@141.31.8.62:5090>;tag=904415333
[java]
[java] To: "SNFC CNTRL CA" <sip:my phone number@141.31.8.161>;tag=as5532b3c7
[java]
[java] Call-ID: 7c8f6a7b2c7560df05334abb1ddb53d2@141.31.8.161:5060
[java]
[java] Max-Forwards: 70
[java]
[java] No content in the request.
~
Normally, calls would have a longer series of SIP messages including responses including received ones as well as messages of JVXML completing other activities such as parsing the VXML page etc.
In order to monitor what is going on, @Patrick, would you please implement a heart beat writing a log message every few seconds, so we have a prove that JVXML is still alive?
Thanks,
DSO
Another instance just happened:
Last edit: David Suendermann-Oeft 2015-08-24
Good calls (e.g. cairoCallID 873ab82f2c4610ff24fccd02a69ef61f@141.31.8.62; asteriskCallId 38c893017f578584499595cb57b26cbd@64.40.49.128) normally have the following SIP messages being circulated at the end of call:
Bad calls (e.g. cairoCallId 8049adceb54f4417f60bdc8b468c1adb@141.31.8.62, asteriskCallId 36af0b7b2d3cb90a2587ccfa6a78b391@64.40.49.128) are missing 4 to 6
Last edit: David Suendermann-Oeft 2015-08-24
For the above calls, these are the ranges of corresponding lines in Asterisk full logs:
38c893017f578584499595cb57b26cbd@64.40.49.128: 67605 to 67678
36af0b7b2d3cb90a2587ccfa6a78b391@64.40.49.128: 67679 to 67765
Another call (the first one above):
3a18902e0b2646b8359a0b6549aadec0@64.40.49.128: 44724 to 44797
These are past production calls affected by the issue where there is no evidence for a call in the Halef logs, but the call actually passed from Asterisk:
+----------+----------+
| date | count(1) |
+----------+----------+
| 20150704 | 3 |
| 20150705 | 15 |
| 20150709 | 1 |
| 20150730 | 1 |
| 20150803 | 7 |
| 20150806 | 17 |
| 20150807 | 14 |
| 20150811 | 2 |
| 20150813 | 9 |
| 20150817 | 1 |
| 20150819 | 53 |
| 20150826 | 7 |
+----------+----------+
select date, count(1) from cr where cairoCallId is NULL and halefcheck = 'PASSED' and extension in (1111, 2222, 3333, 4444) group by date;
Two more cases of this bug so far this week, one on Tue, one on Wed. Restarting Halef resolved the issue:
select count(1) from completeCalls2 where timestamp > '20150927';
214 calls this week so far
select count(1) from completeCalls2 where timestamp > '20150927' and extension in (1111,2222,3333,4444);
169 in production
select count(1) from completeCalls2 where timestamp > '20150927' and extension in (1111,2222,3333,4444) and code is null;
131 of null production calls (!)
select count(1) from completeCalls2 where timestamp > '20150927' and extension in (1111,2222,3333,4444) and code is null and halefcheck = 'PASSED';
60 of null production calls (!) passing to Halef
select timestamp, asteriskCallId from completeCalls2 where timestamp > '20150927' and extension in (1111,2222,3333,4444) and code is null and halefcheck = 'PASSED' and cairoCallId is null;
11 calls which produce no evidence in the Halef logs:
| 2015-09-29 22:38:34 | 6d84c8da3897197306d7ee2a64f251a1@64.40.49.128 |
| 2015-09-29 22:39:38 | 15c5b1191f7731963fb2145176ac7b22@64.40.49.128 |
| 2015-09-29 22:40:39 | 466536402e98141c1f1871002fa731b9@64.40.49.128 |
| 2015-09-29 22:41:55 | 55f8734c3c3c5d1301fe06a77701925f@64.40.49.128 |
| 2015-09-29 22:43:01 | 07c5d0783fab989a79c624cb07da0b88@64.40.49.128 |
| 2015-09-29 22:43:59 | 661d25ed661cdb1464887f804367050f@64.40.49.128 |
| 2015-09-30 03:32:29 | 3e43fa0b7c88857f0fe0adf031016268@64.40.49.128 |
| 2015-09-30 03:33:45 | 054a7f6223dca3817f63740f229c362d@64.40.49.128 |
| 2015-09-30 03:39:26 | 75b4fb853a4306c36eb6ff1668111066@64.40.49.128 |
| 2015-09-30 03:40:50 | 3fa9999a64164c2a727139571b8a6ee2@64.40.49.128 |
| 2015-09-30 21:24:17 | 7dff43e13d5733a759d950e100c6a89e@64.40.49.128 |
select * from started where databasedate > '20150927';
shows the following relevant Halef reboots:
| 2015-09-29 22:38:34 | 6d84c8da3897197306d7ee2a64f251a1@64.40.49.128 |
| 2015-09-29 22:39:38 | 15c5b1191f7731963fb2145176ac7b22@64.40.49.128 |
| 2015-09-29 22:40:39 | 466536402e98141c1f1871002fa731b9@64.40.49.128 |
| 2015-09-29 22:41:55 | 55f8734c3c3c5d1301fe06a77701925f@64.40.49.128 |
| 2015-09-29 22:43:01 | 07c5d0783fab989a79c624cb07da0b88@64.40.49.128 |
| 2015-09-29 22:43:59 | 661d25ed661cdb1464887f804367050f@64.40.49.128 |
| 8367944 | 2015-09-29 22:44:49 | 141.31.8.62 | 2015-09-29 22:39:15 | org.jvoicexml.JVoiceXmlMain | INFO | VoiceXML interpreter 0.7.6.EA started !. |
| 2015-09-30 03:32:29 | 3e43fa0b7c88857f0fe0adf031016268@64.40.49.128 |
| 2015-09-30 03:33:45 | 054a7f6223dca3817f63740f229c362d@64.40.49.128 |
| 2015-09-30 03:39:26 | 75b4fb853a4306c36eb6ff1668111066@64.40.49.128 |
| 2015-09-30 03:40:50 | 3fa9999a64164c2a727139571b8a6ee2@64.40.49.128 |
| 8452724 | 2015-09-30 03:41:49 | 141.31.8.62 | 2015-09-30 03:36:16 | org.jvoicexml.JVoiceXmlMain | INFO | VoiceXML interpreter 0.7.6.EA started !. |
The following single occurrence may simply be a caller hanging up before being connected:
| 2015-09-30 21:24:17 | 7dff43e13d5733a759d950e100c6a89e@64.40.49.128 |
Today I found 2-3 instances of the RED BUG that caused JVXML to freeze with the following Cairo log message ("RTPEventHandler" java.lang.NullPointerException):
==================
No content in the request.
2016-02-03 15:52:40,990 DEBUG ~.~.~.SESSION [IoThreadPool-1]: OPENED
2016-02-03 15:52:41,003 DEBUG ~.~.~.SESSION [IoThreadPool-1]: OPENED
2016-02-03 15:52:41,019 DEBUG ~.~.~.~.RTPConsumer [RTPEventHandler]: ReceiveStreamEvent received: javax.media.rtp.event.NewReceiveStreamEvent[source = RTPManager
SSRCCache com.sun.media.rtp.SSRCCache@a028ee5
Dataport 42050
Controlport 42051
Address 10.61.82.112
RTPForwarder com.sun.media.rtp.util.PacketForwarder@45877a54
RTPDemux com.sun.media.rtp.RTPDemultiplexer@319e7ee3]
2016-02-03 15:52:41,025 DEBUG ~.~.~.~.RTPConsumer [RTPEventHandler]: - Recevied new RTP stream: ULAW/rtp, 8000.0 Hz, 8-bit, Mono
2016-02-03 15:52:41,031 DEBUG ~.~.~.~.~.RTPStreamReplicator [RTPEventHandler]: Creating realized processor...
2016-02-03 15:52:44,131 DEBUG ~.~.~.~.~.RTPStreamReplicator [RTPEventHandler]: Internal Processor realized.
2016-02-03 15:52:44,137 DEBUG ~.~.~.~.~.PBDSReplicator [RTPEventHandler]: stream details: contentType=raw, format=ULAW/rtp, 8000.0 Hz, 8-bit, Mono
2016-02-03 15:52:44,143 DEBUG ~.~.~.~.ProcessorStarter [JMF thread: SendEventQueue: com.sun.media.processor.unknown.Handler]: controllerUpdate(): ControllerEvent received: javax.media.TransitionEvent[source=com.sun.media.processor.unknown.Handler@5b3dcede,previous=Realized,current=Prefetching,target=Started]
2016-02-03 15:52:44,149 DEBUG ~.~.~.~.ProcessorStarter [JMF thread: SendEventQueue: com.sun.media.processor.unknown.Handler]: controllerUpdate(): ControllerEvent received: javax.media.PrefetchCompleteEvent[source=com.sun.media.processor.unknown.Handler@5b3dcede,previous=Prefetching,current=Prefetched,target=Started]
2016-02-03 15:52:44,154 DEBUG ~.~.~.~.ProcessorStarter [JMF thread: SendEventQueue: com.sun.media.processor.unknown.Handler]: controllerUpdate(): ControllerEvent received: javax.media.StartEvent[source=com.sun.media.processor.unknown.Handler@5b3dcede,previous=Prefetched,current=Started,target=Started,mediaTime=javax.media.Time@2c1309bd,timeBaseTime=javax.media.Time@15e46de6]
2016-02-03 15:52:44,159 DEBUG ~.~.~.~.ProcessorStarter [JMF thread: SendEventQueue: com.sun.media.processor.unknown.Handler]: Starting data source...
2016-02-03 15:52:44,545 DEBUG ~.~.~.~.RTPConsumer [RTPEventHandler]: SessionEvent received: javax.media.rtp.event.NewParticipantEvent[source = RTPManager
SSRCCache com.sun.media.rtp.SSRCCache@a028ee5
Dataport 42050
Controlport 42051
Address 10.61.82.112
RTPForwarder com.sun.media.rtp.util.PacketForwarder@45877a54
RTPDemux com.sun.media.rtp.RTPDemultiplexer@319e7ee3]
2016-02-03 15:52:44,551 DEBUG ~.~.~.~.RTPConsumer [RTPEventHandler]: - A new participant has just joined: root@ip-10-61-82-112
2016-02-03 15:52:45,996 DEBUG ~.~.~.~.RTPConsumer [RTPEventHandler]: SessionEvent received: javax.media.rtp.event.NewParticipantEvent[source = RTPManager
SSRCCache com.sun.media.rtp.SSRCCache@a028ee5
Dataport 42050
Controlport 42051
Address 10.61.82.112
RTPForwarder com.sun.media.rtp.util.PacketForwarder@45877a54
RTPDemux com.sun.media.rtp.RTPDemultiplexer@319e7ee3]
2016-02-03 15:52:46,002 DEBUG ~.~.~.~.RTPConsumer [RTPEventHandler]: - A new participant has just joined:
2016-02-03 15:52:46,007 DEBUG ~.~.~.~.RTPConsumer [RTPEventHandler]: ReceiveStreamEvent received: javax.media.rtp.event.StreamMappedEvent[source = RTPManager
SSRCCache com.sun.media.rtp.SSRCCache@a028ee5
Dataport 42050
Controlport 42051
Address 10.61.82.112
RTPForwarder com.sun.media.rtp.util.PacketForwarder@45877a54
RTPDemux com.sun.media.rtp.RTPDemultiplexer@319e7ee3]
2016-02-03 15:52:46,012 DEBUG ~.~.~.~.RTPConsumer [RTPEventHandler]: Source description: SOURCE_DESC_CNAME=
2016-02-03 15:52:49,132 DEBUG ~.~.~.~.SipListenerImpl [EventScannerThread]: Got a transaction terminated event
2016-02-03 15:53:44,281 DEBUG ~.~.~.~.RTPConsumer [RTPEventHandler]: ReceiveStreamEvent received: javax.media.rtp.event.InactiveReceiveStreamEvent[source = RTPManager
SSRCCache com.sun.media.rtp.SSRCCache@a028ee5
Dataport 42050
Controlport 42051
Address 10.61.82.112
RTPForwarder com.sun.media.rtp.util.PacketForwarder@45877a54
RTPDemux com.sun.media.rtp.RTPDemultiplexer@319e7ee3]
2016-02-03 15:53:44,287 DEBUG ~.~.~.~.~.RTPStreamReplicator [RTPEventHandler]: Closing RTP processor for SSRC=583718469
2016-02-03 15:53:44,293 DEBUG ~.~.~.~.ProcessorStarter [JMF thread: SendEventQueue: com.sun.media.processor.unknown.Handler]: controllerUpdate(): ControllerEvent received: javax.media.StopEvent[source=com.sun.media.processor.unknown.Handler@5b3dcede,previous=Started,current=Prefetched,target=Prefetched,mediaTime=javax.media.Time@2daa5cdb]
Exception in thread "RTPEventHandler" java.lang.NullPointerException
at org.speechforge.cairo.rtp.server.RTPStreamReplicator.streamInactive(RTPStreamReplicator.java:160)
at org.speechforge.cairo.rtp.RTPConsumer.update(RTPConsumer.java:265)
at com.sun.media.rtp.RTPEventHandler.processEvent(RTPEventHandler.java:62)
at com.sun.media.rtp.RTPEventHandler.dispatchEvents(RTPEventHandler.java:96)
at com.sun.media.rtp.RTPEventHandler.run(RTPEventHandler.java:115)
2016-02-03 15:53:44,299 DEBUG ~.~.~.~.ProcessorStarter [JMF thread: SendEventQueue: com.sun.media.processor.unknown.Handler]: controllerUpdate(): ControllerEvent received: javax.media.ControllerClosedEvent[source=com.sun.media.processor.unknown.Handler@5b3dcede]
This error does definitely not happen after every call. Out of 1668 calls to Cairo in the cloud since Nov 18, there were only 120 such instances, so, only 7% of the calls are affected.
root@ip-10-61-82-112:~/scripts# cat /export/Apps/patrick/halef-cairo2/cairo-VM/scripts/nohup.out | grep 'RTPEventHandler["] java.lang.NullPointerException' | wc
120 600 8280
root@ip-10-61-82-112:~/scripts# cat /export/Apps/patrick/halef-cairo2/cairo-VM/scripts/nohup.out | grep '^Call' | sort -u | wc
1668 3336 93408
Exception in thread "RTPEventHandler" java.lang.NullPointerException
at org.speechforge.cairo.rtp.server.RTPStreamReplicator.streamInactive(RTPStreamReplicator.java:160)
at org.speechforge.cairo.rtp.RTPConsumer.update(RTPConsumer.java:265)
at com.sun.media.rtp.RTPEventHandler.processEvent(RTPEventHandler.java:62)
at com.sun.media.rtp.RTPEventHandler.dispatchEvents(RTPEventHandler.java:96)
at com.sun.media.rtp.RTPEventHandler.run(RTPEventHandler.java:115)
2016-02-13 01:04:39,199 DEBUG ~.~.~.~.ProcessorStarter [JMF thread: SendEventQueue: com.sun.media.processor.unknown.Handler]: controllerUpdate(): ControllerEvent received: javax.media.ControllerClosedEvent[source=com.sun.media.processor.unknown.Handler@128d38f3]
What also happened the second time in a row is that the Red Bug was associated with Tomcat being down:
is the last Cairo log message after restarting Halef and placing a call.
I am not sure who worte this code but I am not surprised that log messages can change the behaviour of the code anymore. I am not even sure how this function can not end in a null pointer exception.