There is a situation in which people call Asterisk, get connected to Halef which, however is dead, and the connection times out with no response. This situation is currently not caught by the monitoring script since such calls do not make it into completeCalls (which requires evidence of the call in the Halef logs). This situation happened in 18% of cases:
select count(1) from completeCalls2 where cairoCallId is null;
with a critical incident yesterday (68% failure)
select count(1) from completeCalls2 where timestamp > '20150819' and cairoCallId is null;
@Patrick, please use the new view
completeCalls2
for monitoring all calls, including the ones not being logged by Halef at all.
Also, you would want to use completeCalls2 in the TAR portal for easier monitoring (perhaps with toggling such calls on/off).
Diff:
Diff:
Monitoring script is now using completeCalls2 to check for calls not making it into HALEF.
Currently, the monitoring script is sending e-mails even when the system is busy. Would you please indicate the nature of e-mails (busy vs. crash) in the e-mail's subject?