Halef / Tickets / #84 Cassandra recognition results are entirely off

David Suendermann-Oeft - 2015-08-25

I pulled the above mentioned recording from the server (201508250128420867.wav) and sent it through the Cassandra test web page and, guess what? The recognition hypothesis was

{ YES THIS IS CORRECT }

So, there is a discrepancy between how Cairo calls Cassandra and how I am doing it through the test page.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

David Suendermann-Oeft - 2015-08-26

I created Ext 7751 which is pointing at Patrick's dev Halef instance TJR211/12. Unfortunately, the Kaldi hypotheses are currently not logged there, nor do the recorded audio utterances show up. @Patrick? At any rate, I called in, and this is what I got (utterance -> hypothesis)

hello this is the david pizza service -> { WELL IS THE THE PROFESSOR IS }
is that for delivery or take-out -> { IS THAT FOR DISAGREE ABOUT THE ABOUT }
is this for delivery -> { IS THE FOR THE }
can you please tell me your name -> { UNIVERSITY'S TELL YOUR NAME }
what is your address -> { ONE IS TO ATTRACT }
can you tell me what your address is please -> { CHANGED ONLY WHAT YOUR ADDRESSES FEES }
now please tell me your phone number -> { APPEARS TELL YOUR PHONE NUMBER }
now what do you want on your pizza's topping -> { WELL WHAT YOU WANT TO PROTEST TOPIC }
which toppings would you like on your pizza -> { WHEN SHOPPING WHICH I LIKE GOING SHOPPING }
allright we will deliver the pizza within thirty minutes to your home place -> { OLD LIKE TO GO TO THE PROFESSOR GIVES A WITH THE FEATHER TO MANAGE TO YOUR FRIENDS }
we will deliver the pizza within thirty minutes to your home location -> { <unk> OR YOUR SELF WITH THE STUDYING IT'S TO ON THE TEACHING }</unk>

Still not great, but seems a little better

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

David Suendermann-Oeft - 2015-08-26

Just sent the last recording to the web service and got the following recognition result:

{ WE WILL DELIVER YOUR FEET THAT WITHIN THIRTY MINUTES TO YOUR HOME LOCATED }

This is much, much closer to the expected results. Why is there such a discrepancy?

The file name on TJR 211 is 201508260238080104.wav

@Alex, please take a look ASAP.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

David Suendermann-Oeft - 2015-08-26

assigned_to: David Suendermann-Oeft --> Alexei V. Ivanov
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Alexei V. Ivanov - 2015-08-26

@Alex, please take a look ASAP.

which toppings would you like on your pizza -> { WHEN SHOPPING WHICH I LIKE GOING SHOPPING }
we will deliver the pizza within thirty minutes to your home location -> { <unk> OR YOUR SELF WITH THE STUDYING IT'S TO ON THE TEACHING }</unk>

Just sent the last recording to the web service and got the following recognition result:
{ WE WILL DELIVER YOUR FEET THAT WITHIN THIRTY MINUTES TO YOUR HOME LOCATED }

It definitely looks like a problem with the signal buffer ordering. I.e. the sequence of audio chunks that are sent to the recognizer becomes shuffled. A very characteristic behavior pointing to that is "toppings"=> "SHOPPING WHICH I LIKE GOING SHOPPING"

Dumping the contents of the buffer that the server sees at its end will provide a necessary verification of the above hypothesis.

There is a possibility to add an extra step when the client and the server compute an integral characteristic of some sort (e.g. a specific energy or md5 sum) of their respective buffers and exchange these values post-recognition for the communication channel verification purposes.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

David Suendermann-Oeft - 2015-08-26

Thanks, Alex. Patrick told me you are storing the audio files on the Cassandra server as well. Would you please check how these look like and how they differ from those stored on the Cairo server? Please send me some examples from the Cassandra server.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Alexei V. Ivanov - 2015-08-26
  
  Dear David, Patrick,
  
  Please find attached the buffer dumps that I have obtained from the
  CASSANDRA server on TJR1001 with the test sessions via
  test_bin_file.html while recognizing the file 201410291608300064.wav
  (also in the attachment).
  
  I have made 4 successive recognitions and apparently:
  
  The system has been returning the same recognition result ( { WHEN I
  WANT TO STUDY });
  
  The dumps are identical to the original file. I tested it with 'diff
  ./201410291608300064.wav ./test.data_expX'
  
  In order to extract the signal buffer dump on the server one needs to do
  the following:
  
  on tjr1001 'rm -rf ~/halef-cassandra/CASSANDRA_STRM2/test.data';
  
  establish a new client-server connection and perform a recognition;
  
  take ~/halef-cassandra/CASSANDRA_STRM2/test.data from tjr1001 as the
  signal buffer dump.
  
  After that the buffer can be compared to the client buffer version.
  
  May I ask you to perform a similar experiment with a live recognition
  from Cairo server?
  
  Thank you,
  AI
  
  On 08/26/2015 05:56 PM, David Suendermann-Oeft wrote:
  
  Thanks, Alex. Patrick told me you are storing the audio files on the
  Cassandra server as well. Would you please check how these look like
  and how they differ from those stored on the Cairo server? Please send
  me some examples from the Cassandra server.
  
  [tickets:#84] http://sourceforge.net/p/halef/tickets/84/ Cassandra
  recognition results are entirely off
  
  Status: open
  Milestone: 1.0
  Created: Tue Aug 25, 2015 11:23 PM UTC by David Suendermann-Oeft
  Last Updated: Wed Aug 26, 2015 10:33 AM UTC
  Owner: Alexei V. Ivanov
  
  We have set up Cassandra on the ITL dev machines (Cairo instance
  TJR199), but the recognition results are completely off. For example,
  in callId 2c816b2d8054e1e3e881d25425d916a9@141.31.8.61, I said "yes,
  this is correct" which was recognized as "the school". While such
  results are theoretically possible, the WER is clearly above 50% on
  average suggesting there is a major issue.
  
  Sent from sourceforge.net because you indicated interest in
  https://sourceforge.net/p/halef/tickets/84/
  
  To unsubscribe from further messages, please visit
  https://sourceforge.net/auth/subscriptions/
  
  Related
  
  Tickets: #84
  
  201410291608300064.wav
  
  alternate
  
  test.data_exp1
  
  test.data_exp2
  
  test.data_exp3
  
  test.data_exp4
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

David Suendermann-Oeft - 2015-08-26

Dear Alex,

Would you please check the buffer dumps of Cassandra which were generated at the time of recognition? For example when the recent audio file 201508251904000497.wav was generated as part of call ID 01da31a4a0360086fd3ea1f470257e63@141.31.8.61 at 2015-08-25 17:56:14 database time? At runtime, the hypothesis was { WHAT EPIPHYTES } while the test page produces { WHAT ABOUT THE EDUCATION } (I said "what about your address").

Yours,

DSO

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Alexei V. Ivanov - 2015-08-26

Would you please check the buffer dumps of Cassandra which were generated at the time of recognition?

That is not supported. The buffer dump file has the same name all the time. It always gets overwritten by the most recent recognition session.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

David Suendermann-Oeft - 2015-08-27

Dear Alex, please enable storing of past speech buffers as separate files on the Cassandra server. This is essential for troubleshooting of past calls.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

David Suendermann-Oeft - 2015-08-27

Current Kaldi test extensions are 7723, 7731, among others, for you to test.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Alexei V. Ivanov - 2015-08-28

Dear All,

please enable storing of past speech buffers as separate files on the Cassandra server

This functionality is now awailable for the server on TJR1001. A raw audio stream, corresponing to every individual recognition is now stored in

~/halef-cassandra/CASSANDRA_STRM2/processed_streams/stream_{date}-{time}.raw

This is essential for troubleshooting of past calls.

Sure, that is an essential feature for ASR-related troubleshooting.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

David Suendermann-Oeft - 2015-10-13

Patrick,

You resolved this bug last week, however, there still seem to be discrepancies between what Halef recognizes in live mode vs. batch mode. An example:

CallId 1b813b4b19ff76c1bf57b977913eb8c6@141.31.8.61
recording 201510140000040604.wav

resulted in

{ THAT PERSON NOW I WOULD LIKE TO }

in live mode, but in

{ THAT PERMITTED TO NOW I WOULDN'T LIKE TO }

in batch mode.

In the same call,

recording 201510140000220858.wav

has

{ I LIKE TO TALK ABOUT IS THAT THE IDEA }

vs.

{ I LIKE TO TALK ABOUT IN THE NOTE AND TO }

Yours,

DSO

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

David Suendermann-Oeft - 2015-10-13

assigned_to: Alexei V. Ivanov --> Patrick Lange
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Patrick L. Lange - 2015-10-14

I will compare the three files if they are the same. 1. Halef recording, 2. my recording of what I send 3. recording of the Kaldi server

However, this is atm not possible. I think we only store the latest recording of 2) and 3).

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- David Suendermann-Oeft - 2015-10-14
  
  Alex changed the Kaldi server to store historic recordings
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

David Suendermann-Oeft - 2015-10-14

Looking at a call Keelan pointed out:

922f4f06211353f2d5c494648ca22e1f@141.31.8.61
201510140819380887.wav

was recognized as

{ URN PLANT THE PLANT }

in Halef, but calling the command line tool, I am getting

$ kaldi 201510140819380887.wav
Establishing the connection took 849172561 nanoseconds
{ ACTUALLY I DIDN'T USE THE }
Recognition took 2640942734 nanoseconds

When I ran it again, I got yet another hypothesis:

$ kaldi 201510140819380887.wav
Establishing the connection took 914194415 nanoseconds
{ REGARDING AND DIDN'T USE THE }
Recognition took 2764824757 nanoseconds

Apparently, hypotheses are not consistent.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

David Suendermann-Oeft - 2015-10-14

The EPIPHYTES example is even worse:

846b7507bbde434b243e3e8c260697a7@141.31.8.61
201510140838230289.wav

Halef recognized

{ EPIPHYTES }

while the command line tool returns

$ kaldi 201510140838230289.wav
Establishing the connection took 907055569 nanoseconds
{ EPIPHYTES CAN MAKE A LINE TO THE UM THE THE }
Recognition took 2472632539 nanoseconds

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Cassandra recognition results are entirely off

Open Source VXML-Based Spoken Dialog System

Milestone

Searches

Help

#84 Cassandra recognition results are entirely off

Related

Discussion

Related