Menu

#84 Cassandra recognition results are entirely off

1.0
open
None
2015-10-14
2015-08-25
No

We have set up Cassandra on the ITL dev machines (Cairo instance TJR199), but the recognition results are completely off. For example, in callId 2c816b2d8054e1e3e881d25425d916a9@141.31.8.61, I said "yes, this is correct" which was recognized as "the school". While such results are theoretically possible, the WER is clearly above 50% on average suggesting there is a major issue.

Related

Tickets: #84

Discussion

  • David Suendermann-Oeft

    I pulled the above mentioned recording from the server (201508250128420867.wav) and sent it through the Cassandra test web page and, guess what? The recognition hypothesis was

    { YES THIS IS CORRECT }

    So, there is a discrepancy between how Cairo calls Cassandra and how I am doing it through the test page.

     
  • David Suendermann-Oeft

    I created Ext 7751 which is pointing at Patrick's dev Halef instance TJR211/12. Unfortunately, the Kaldi hypotheses are currently not logged there, nor do the recorded audio utterances show up. @Patrick? At any rate, I called in, and this is what I got (utterance -> hypothesis)

    hello this is the david pizza service -> { WELL IS THE THE PROFESSOR IS }
    is that for delivery or take-out -> { IS THAT FOR DISAGREE ABOUT THE ABOUT }
    is this for delivery -> { IS THE FOR THE }
    can you please tell me your name -> { UNIVERSITY'S TELL YOUR NAME }
    what is your address -> { ONE IS TO ATTRACT }
    can you tell me what your address is please -> { CHANGED ONLY WHAT YOUR ADDRESSES FEES }
    now please tell me your phone number -> { APPEARS TELL YOUR PHONE NUMBER }
    now what do you want on your pizza's topping -> { WELL WHAT YOU WANT TO PROTEST TOPIC }
    which toppings would you like on your pizza -> { WHEN SHOPPING WHICH I LIKE GOING SHOPPING }
    allright we will deliver the pizza within thirty minutes to your home place -> { OLD LIKE TO GO TO THE PROFESSOR GIVES A WITH THE FEATHER TO MANAGE TO YOUR FRIENDS }
    we will deliver the pizza within thirty minutes to your home location -> { <unk> OR YOUR SELF WITH THE STUDYING IT'S TO ON THE TEACHING }</unk>

    Still not great, but seems a little better

     
  • David Suendermann-Oeft

    Just sent the last recording to the web service and got the following recognition result:

    { WE WILL DELIVER YOUR FEET THAT WITHIN THIRTY MINUTES TO YOUR HOME LOCATED }

    This is much, much closer to the expected results. Why is there such a discrepancy?

    The file name on TJR 211 is 201508260238080104.wav

    @Alex, please take a look ASAP.

     
  • David Suendermann-Oeft

    • assigned_to: David Suendermann-Oeft --> Alexei V. Ivanov
     
  • Alexei V. Ivanov

    @Alex, please take a look ASAP.

    which toppings would you like on your pizza -> { WHEN SHOPPING WHICH I LIKE GOING SHOPPING }
    we will deliver the pizza within thirty minutes to your home location -> { <unk> OR YOUR SELF WITH THE STUDYING IT'S TO ON THE TEACHING }</unk>

    Just sent the last recording to the web service and got the following recognition result:
    { WE WILL DELIVER YOUR FEET THAT WITHIN THIRTY MINUTES TO YOUR HOME LOCATED }

    It definitely looks like a problem with the signal buffer ordering. I.e. the sequence of audio chunks that are sent to the recognizer becomes shuffled. A very characteristic behavior pointing to that is "toppings"=> "SHOPPING WHICH I LIKE GOING SHOPPING"

    Dumping the contents of the buffer that the server sees at its end will provide a necessary verification of the above hypothesis.

    There is a possibility to add an extra step when the client and the server compute an integral characteristic of some sort (e.g. a specific energy or md5 sum) of their respective buffers and exchange these values post-recognition for the communication channel verification purposes.

     
  • David Suendermann-Oeft

    Thanks, Alex. Patrick told me you are storing the audio files on the Cassandra server as well. Would you please check how these look like and how they differ from those stored on the Cairo server? Please send me some examples from the Cassandra server.

     
    • Alexei V. Ivanov

      Dear David, Patrick,

      Please find attached the buffer dumps that I have obtained from the
      CASSANDRA server on TJR1001 with the test sessions via
      test_bin_file.html while recognizing the file 201410291608300064.wav
      (also in the attachment).

      I have made 4 successive recognitions and apparently:

      1. The system has been returning the same recognition result ( { WHEN I
        WANT TO STUDY });
      2. The dumps are identical to the original file. I tested it with 'diff
        ./201410291608300064.wav ./test.data_expX'

      In order to extract the signal buffer dump on the server one needs to do
      the following:

      1. on tjr1001 'rm -rf ~/halef-cassandra/CASSANDRA_STRM2/test.data';
      2. establish a new client-server connection and perform a recognition;
      3. take ~/halef-cassandra/CASSANDRA_STRM2/test.data from tjr1001 as the
        signal buffer dump.

      After that the buffer can be compared to the client buffer version.

      May I ask you to perform a similar experiment with a live recognition
      from Cairo server?

      Thank you,
      AI

      On 08/26/2015 05:56 PM, David Suendermann-Oeft wrote:

      Thanks, Alex. Patrick told me you are storing the audio files on the
      Cassandra server as well. Would you please check how these look like
      and how they differ from those stored on the Cairo server? Please send
      me some examples from the Cassandra server.


      [tickets:#84] http://sourceforge.net/p/halef/tickets/84/ Cassandra
      recognition results are entirely off

      Status: open
      Milestone: 1.0
      Created: Tue Aug 25, 2015 11:23 PM UTC by David Suendermann-Oeft
      Last Updated: Wed Aug 26, 2015 10:33 AM UTC
      Owner: Alexei V. Ivanov

      We have set up Cassandra on the ITL dev machines (Cairo instance
      TJR199), but the recognition results are completely off. For example,
      in callId 2c816b2d8054e1e3e881d25425d916a9@141.31.8.61, I said "yes,
      this is correct" which was recognized as "the school". While such
      results are theoretically possible, the WER is clearly above 50% on
      average suggesting there is a major issue.


      Sent from sourceforge.net because you indicated interest in
      https://sourceforge.net/p/halef/tickets/84/

      To unsubscribe from further messages, please visit
      https://sourceforge.net/auth/subscriptions/

       

      Related

      Tickets: #84

  • David Suendermann-Oeft

    Dear Alex,

    Would you please check the buffer dumps of Cassandra which were generated at the time of recognition? For example when the recent audio file 201508251904000497.wav was generated as part of call ID 01da31a4a0360086fd3ea1f470257e63@141.31.8.61 at 2015-08-25 17:56:14 database time? At runtime, the hypothesis was { WHAT EPIPHYTES } while the test page produces { WHAT ABOUT THE EDUCATION } (I said "what about your address").

    Yours,

    DSO

     
  • Alexei V. Ivanov

    Would you please check the buffer dumps of Cassandra which were generated at the time of recognition?

    That is not supported. The buffer dump file has the same name all the time. It always gets overwritten by the most recent recognition session.

     
  • David Suendermann-Oeft

    Dear Alex, please enable storing of past speech buffers as separate files on the Cassandra server. This is essential for troubleshooting of past calls.

     
  • David Suendermann-Oeft

    Current Kaldi test extensions are 7723, 7731, among others, for you to test.

     
  • Alexei V. Ivanov

    Dear All,

    please enable storing of past speech buffers as separate files on the Cassandra server

    This functionality is now awailable for the server on TJR1001. A raw audio stream, corresponing to every individual recognition is now stored in

    ~/halef-cassandra/CASSANDRA_STRM2/processed_streams/stream_{date}-{time}.raw

    This is essential for troubleshooting of past calls.

    Sure, that is an essential feature for ASR-related troubleshooting.

     
  • David Suendermann-Oeft

    Patrick,

    You resolved this bug last week, however, there still seem to be discrepancies between what Halef recognizes in live mode vs. batch mode. An example:

    CallId 1b813b4b19ff76c1bf57b977913eb8c6@141.31.8.61
    recording 201510140000040604.wav

    resulted in

    { THAT PERSON NOW I WOULD LIKE TO }

    in live mode, but in

    { THAT PERMITTED TO NOW I WOULDN'T LIKE TO }

    in batch mode.

    In the same call,

    recording 201510140000220858.wav

    has

    { I LIKE TO TALK ABOUT IS THAT THE IDEA }

    vs.

    { I LIKE TO TALK ABOUT IN THE NOTE AND TO }

    Yours,

    DSO

     
  • David Suendermann-Oeft

    • assigned_to: Alexei V. Ivanov --> Patrick Lange
     
  • Patrick L. Lange

    I will compare the three files if they are the same. 1. Halef recording, 2. my recording of what I send 3. recording of the Kaldi server

    However, this is atm not possible. I think we only store the latest recording of 2) and 3).

     
    • David Suendermann-Oeft

      Alex changed the Kaldi server to store historic recordings

       
  • David Suendermann-Oeft

    Looking at a call Keelan pointed out:

    922f4f06211353f2d5c494648ca22e1f@141.31.8.61
    201510140819380887.wav

    was recognized as

    { URN PLANT THE PLANT }

    in Halef, but calling the command line tool, I am getting

    $ kaldi 201510140819380887.wav
    Establishing the connection took 849172561 nanoseconds
    { ACTUALLY I DIDN'T USE THE }
    Recognition took 2640942734 nanoseconds

    When I ran it again, I got yet another hypothesis:

    $ kaldi 201510140819380887.wav
    Establishing the connection took 914194415 nanoseconds
    { REGARDING AND DIDN'T USE THE }
    Recognition took 2764824757 nanoseconds

    Apparently, hypotheses are not consistent.

     
  • David Suendermann-Oeft

    The EPIPHYTES example is even worse:

    846b7507bbde434b243e3e8c260697a7@141.31.8.61
    201510140838230289.wav

    Halef recognized

    { EPIPHYTES }

    while the command line tool returns

    $ kaldi 201510140838230289.wav
    Establishing the connection took 907055569 nanoseconds
    { EPIPHYTES CAN MAKE A LINE TO THE UM THE THE }
    Recognition took 2472632539 nanoseconds

     

Log in to post a comment.

MongoDB Logo MongoDB