ORANGE: Call hangs up due to exhausted recognizer pool

Open Source VXML-Based Spoken Dialog System

Brought to you by: langep, suendermann, vikram-r

#74 ORANGE: Call hangs up due to exhausted recognizer pool

Milestone: 1.0

Status: open

Owner: Patrick L. Lange

Labels: None

Updated: 2015-08-31

Created: 2015-08-19

Creator: David Suendermann-Oeft

Private: No

Example: call ID 1fe19f9aadd645eed3e73baf6f6070c1@141.31.8.62

plays the welcome prompt and then hangs up since the ASR initialization is failing:

2015-08-19 20:08:42,256 DEBUG ~.~.~.~.~.RTPRecogChannel [Thread-1396]: Borrowing recognition engine from object pool...
java.util.NoSuchElementException: Pool exhausted
at org.apache.commons.pool.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:1110)
at org.speechforge.cairo.server.recog.RecogInterface.activateRecEngine(RecogInterface.java:68)
at org.speechforge.cairo.server.recog.RTPRecogChannel.recognize(RTPRecogChannel.java:191)
at org.speechforge.cairo.server.recog.MrcpRecogChannel.recognize(MrcpRecogChannel.java:215)
at ...

Discussion

David Suendermann-Oeft - 2015-08-19

This issue is found 149 times in the DB:

select count(1) from haleflogs where message like '%java.util.NoSuchElementException: Pool exhausted%';

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

David Suendermann-Oeft - 2015-08-19

All the 148 calls found suffering from this problem ended up not completed:

create table tmpC2 as select distinct b.cairoCallId from haleflogs a, callsExt b where message like '%java.util.NoSuchElementException: Pool exhausted%' and a.id between b.minId and b.maxId and a.machineIp = b.cairoMachineIp;
select code, count(1) from completeCalls where callId in (select * from tmpC2) group by code;

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

David Suendermann-Oeft - 2015-08-19

The problem is happening throughout extensions:

+-----------+----------+
| extension | count(1) |
+-----------+----------+
| 1010 | 2 |
| 1111 | 32 |
| 2222 | 36 |
| 3333 | 19 |
| 4444 | 22 |
| 7703 | 6 |
| 7704 | 3 |
| 7709 | 25 |
| 7711 | 3 |
+-----------+----------+

select extension, count(1) from completeCalls where callId in (select * from tmpC2) group by extension;

on different Halef instances:

+-------------+----------+
| cairoIp | count(1) |
+-------------+----------+
| 141.31.8.62 | 123 |
| 141.31.8.74 | 25 |
+-------------+----------+

select cairoIP, count(1) from completeCalls where callId in (select * from tmpC2) group by cairoIP;

and consistently over time. We had 71 such calls in August so far of altogether 357 accounting for 20%.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

David Suendermann-Oeft - 2015-08-20

It looks as if the hang-up caused by this issue happens either before or after the first recognition:

+---+----------+
| b | count(1) |
+---+----------+
| 0 | 69 |
| 1 | 79 |
+---+----------+

select cnt-1 b, count(1) from (select callId, count(1) cnt from events where callId in (select * from tmpC2) and event in ('VXML URL', 'Recognition Hypothesis') group by callId) a group by b;

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

David Suendermann-Oeft - 2015-08-20

assigned_to: Patrick Lange
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

David Suendermann-Oeft - 2015-08-20

Yesterday, there were 16 production calls:

select count(1) from completeCalls where timestamp > '2015-08-19 10:00:00' and extension = '2222';

out of which 6 were not completed:

select count(1) from completeCalls where timestamp > '2015-08-19 10:00:00' and extension = '2222' and code is null;

all of them associated with the pool being exhausted:

select count(1) from completeCalls where timestamp > '2015-08-19 10:00:00' and extension = '2222' and callId in (select * from tmpC2);

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

David Suendermann-Oeft - 2015-08-20

You can select log information of individual calls by running either of the following:

select * from hli where callId = '1fe19f9aadd645eed3e73baf6f6070c1@141.31.8.62';

(displaying only the leading 100 chars of log messages)

select * from hli2 where callId = '1fe19f9aadd645eed3e73baf6f6070c1@141.31.8.62';

(displaying the full message)

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

David Suendermann-Oeft - 2015-08-24

+-----------------------------+----------+
| round(timestamp/1000000, 0) | count(1) |
+-----------------------------+----------+
| 20150619 | 1 |
| 20150620 | 2 |
| 20150701 | 35 |
| 20150704 | 2 |
| 20150705 | 1 |
| 20150727 | 3 |
| 20150729 | 17 |
| 20150730 | 2 |
| 20150731 | 14 |
| 20150803 | 2 |
| 20150804 | 4 |
| 20150805 | 6 |
| 20150806 | 14 |
| 20150807 | 3 |
| 20150808 | 1 |
| 20150811 | 2 |
| 20150813 | 5 |
| 20150814 | 5 |
| 20150815 | 2 |
| 20150817 | 15 |
| 20150818 | 5 |
| 20150819 | 7 |
+-----------------------------+----------+

drop table tmpC2;
create table tmpC2 as select distinct b.cairoCallId from haleflogs a, callsExt b where message like '%java.util.NoSuchElementException: Pool exhausted%' and a.id between b.minId and b.maxId and a.machineIp = b.cairoMachineIp;
select round(timestamp/1000000, 0), count(1) from completeCalls where callId in (select * from tmpC2) group by round(timestamp/1000000, 0);

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

David Suendermann-Oeft - 2015-08-24

Patrick applied a potential fix. So far, the issue has not appeared again. @Patrick, please explain what you did.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Patrick L. Lange - 2015-08-24

I force the RTPChannel to return a potentially existing engine back to the pool before fetching a new one.

cairo-VM/src/cairo/cairo-server/trunk/src/main/java/org/speechforge/cairo/server/recog/RTPChannel.java

192 _recInterface.returnRecEngine();
193 _recInterface.activateRecEngine(appType, grammarLocation, hotword);

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

David Suendermann-Oeft - 2015-08-24

summary: Call hangs up due to exausted recognizer pool --> Call hangs up due to exhausted recognizer pool
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

David Suendermann-Oeft - 2015-08-24

There was a call today with the bug (5c048ecdab8fe4f514e0b1753fa433e0@141.31.8.61), but it was the Kaldi Halef TJR199/201. So, has the issue been fixed for good?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

David Suendermann-Oeft - 2015-08-31

We are currently live on TJR214/215 with Ext 4444 and suffered three NULL calls so far:

drop table tmpI1;
create table tmpI1 (select cairoCallId from cr where date = 20150831 and halefcheck = 'PASSED' and c = 0 and extension = 4444);

It turns out that all of them were caused by the pool exhausted bug:

drop table tmpC2;
create table tmpC2 as select distinct b.cairoCallId from haleflogs a, callsExt b where message like '%java.util.NoSuchElementException: Pool exhausted%' and a.id between b.minId and b.maxId and a.machineIp = b.cairoMachineIp;

select * from tmpI1 where cairoCallId in (select * from tmpC2);

The bug fix was not propagated into TJR214/215 yet. Remarkably, there are no NPEs as of yet:

select * from hli where callId in (select * from tmpI1) and message like '%nullp%';

Huh. Is this bug and https://sourceforge.net/p/halef/tickets/83/ interrelated by any chance?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

David Suendermann-Oeft - 2015-08-31

summary: Call hangs up due to exhausted recognizer pool --> ORANGE: Call hangs up due to exhausted recognizer pool
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Log in to post a comment.

Gen AI apps are built with MongoDB Atlas

Atlas offers built-in vector search and global availability across 125+ regions. Start building AI apps faster, all in one place.

Try Free →