Menu

#74 ORANGE: Call hangs up due to exhausted recognizer pool

1.0
open
None
2015-08-31
2015-08-19
No

Example: call ID 1fe19f9aadd645eed3e73baf6f6070c1@141.31.8.62

plays the welcome prompt and then hangs up since the ASR initialization is failing:

2015-08-19 20:08:42,256 DEBUG ~.~.~.~.~.RTPRecogChannel [Thread-1396]: Borrowing recognition engine from object pool...
java.util.NoSuchElementException: Pool exhausted
at org.apache.commons.pool.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:1110)
at org.speechforge.cairo.server.recog.RecogInterface.activateRecEngine(RecogInterface.java:68)
at org.speechforge.cairo.server.recog.RTPRecogChannel.recognize(RTPRecogChannel.java:191)
at org.speechforge.cairo.server.recog.MrcpRecogChannel.recognize(MrcpRecogChannel.java:215)
at ...

Discussion

  • David Suendermann-Oeft

    This issue is found 149 times in the DB:

    select count(1) from haleflogs where message like '%java.util.NoSuchElementException: Pool exhausted%';

     
  • David Suendermann-Oeft

    All the 148 calls found suffering from this problem ended up not completed:

    create table tmpC2 as select distinct b.cairoCallId from haleflogs a, callsExt b where message like '%java.util.NoSuchElementException: Pool exhausted%' and a.id between b.minId and b.maxId and a.machineIp = b.cairoMachineIp;
    select code, count(1) from completeCalls where callId in (select * from tmpC2) group by code;

     
  • David Suendermann-Oeft

    The problem is happening throughout extensions:

    +-----------+----------+
    | extension | count(1) |
    +-----------+----------+
    | 1010 | 2 |
    | 1111 | 32 |
    | 2222 | 36 |
    | 3333 | 19 |
    | 4444 | 22 |
    | 7703 | 6 |
    | 7704 | 3 |
    | 7709 | 25 |
    | 7711 | 3 |
    +-----------+----------+

    select extension, count(1) from completeCalls where callId in (select * from tmpC2) group by extension;

    on different Halef instances:

    +-------------+----------+
    | cairoIp | count(1) |
    +-------------+----------+
    | 141.31.8.62 | 123 |
    | 141.31.8.74 | 25 |
    +-------------+----------+

    select cairoIP, count(1) from completeCalls where callId in (select * from tmpC2) group by cairoIP;

    and consistently over time. We had 71 such calls in August so far of altogether 357 accounting for 20%.

     
  • David Suendermann-Oeft

    It looks as if the hang-up caused by this issue happens either before or after the first recognition:

    +---+----------+
    | b | count(1) |
    +---+----------+
    | 0 | 69 |
    | 1 | 79 |
    +---+----------+

    select cnt-1 b, count(1) from (select callId, count(1) cnt from events where callId in (select * from tmpC2) and event in ('VXML URL', 'Recognition Hypothesis') group by callId) a group by b;

     
  • David Suendermann-Oeft

    • assigned_to: Patrick Lange
     
  • David Suendermann-Oeft

    Yesterday, there were 16 production calls:

    select count(1) from completeCalls where timestamp > '2015-08-19 10:00:00' and extension = '2222';

    out of which 6 were not completed:

    select count(1) from completeCalls where timestamp > '2015-08-19 10:00:00' and extension = '2222' and code is null;

    all of them associated with the pool being exhausted:

    select count(1) from completeCalls where timestamp > '2015-08-19 10:00:00' and extension = '2222' and callId in (select * from tmpC2);

     
  • David Suendermann-Oeft

    You can select log information of individual calls by running either of the following:

    select * from hli where callId = '1fe19f9aadd645eed3e73baf6f6070c1@141.31.8.62';

    (displaying only the leading 100 chars of log messages)

    select * from hli2 where callId = '1fe19f9aadd645eed3e73baf6f6070c1@141.31.8.62';

    (displaying the full message)

     
  • David Suendermann-Oeft

    +-----------------------------+----------+
    | round(timestamp/1000000, 0) | count(1) |
    +-----------------------------+----------+
    | 20150619 | 1 |
    | 20150620 | 2 |
    | 20150701 | 35 |
    | 20150704 | 2 |
    | 20150705 | 1 |
    | 20150727 | 3 |
    | 20150729 | 17 |
    | 20150730 | 2 |
    | 20150731 | 14 |
    | 20150803 | 2 |
    | 20150804 | 4 |
    | 20150805 | 6 |
    | 20150806 | 14 |
    | 20150807 | 3 |
    | 20150808 | 1 |
    | 20150811 | 2 |
    | 20150813 | 5 |
    | 20150814 | 5 |
    | 20150815 | 2 |
    | 20150817 | 15 |
    | 20150818 | 5 |
    | 20150819 | 7 |
    +-----------------------------+----------+

    drop table tmpC2;
    create table tmpC2 as select distinct b.cairoCallId from haleflogs a, callsExt b where message like '%java.util.NoSuchElementException: Pool exhausted%' and a.id between b.minId and b.maxId and a.machineIp = b.cairoMachineIp;
    select round(timestamp/1000000, 0), count(1) from completeCalls where callId in (select * from tmpC2) group by round(timestamp/1000000, 0);

     
  • David Suendermann-Oeft

    Patrick applied a potential fix. So far, the issue has not appeared again. @Patrick, please explain what you did.

     
  • Patrick L. Lange

    I force the RTPChannel to return a potentially existing engine back to the pool before fetching a new one.

    cairo-VM/src/cairo/cairo-server/trunk/src/main/java/org/speechforge/cairo/server/recog/RTPChannel.java

    192 _recInterface.returnRecEngine();
    193 _recInterface.activateRecEngine(appType, grammarLocation, hotword);

     
  • David Suendermann-Oeft

    • summary: Call hangs up due to exausted recognizer pool --> Call hangs up due to exhausted recognizer pool
     
  • David Suendermann-Oeft

    There was a call today with the bug (5c048ecdab8fe4f514e0b1753fa433e0@141.31.8.61), but it was the Kaldi Halef TJR199/201. So, has the issue been fixed for good?

     
  • David Suendermann-Oeft

    We are currently live on TJR214/215 with Ext 4444 and suffered three NULL calls so far:

    drop table tmpI1;
    create table tmpI1 (select cairoCallId from cr where date = 20150831 and halefcheck = 'PASSED' and c = 0 and extension = 4444);

    It turns out that all of them were caused by the pool exhausted bug:

    drop table tmpC2;
    create table tmpC2 as select distinct b.cairoCallId from haleflogs a, callsExt b where message like '%java.util.NoSuchElementException: Pool exhausted%' and a.id between b.minId and b.maxId and a.machineIp = b.cairoMachineIp;

    select * from tmpI1 where cairoCallId in (select * from tmpC2);

    The bug fix was not propagated into TJR214/215 yet. Remarkably, there are no NPEs as of yet:

    select * from hli where callId in (select * from tmpI1) and message like '%nullp%';

    Huh. Is this bug and https://sourceforge.net/p/halef/tickets/83/ interrelated by any chance?

     
  • David Suendermann-Oeft

    • summary: Call hangs up due to exhausted recognizer pool --> ORANGE: Call hangs up due to exhausted recognizer pool
     

Log in to post a comment.

MongoDB Logo MongoDB
Gen AI apps are built with MongoDB Atlas
Atlas offers built-in vector search and global availability across 125+ regions. Start building AI apps faster, all in one place.