Menu

Acoustic Recognition on 2 Speakers

Help
Remus
2011-02-24
2012-09-22
  • Remus

    Remus - 2011-02-24

    Hello,

    So, I would like to test how my acoustic model ( a CD one) is working on 2
    different speakers.

    Here is what I did, but am not sure it is the right thing

    1. I had N recordings for speaker 1 and M for speaker 2 with their transcription files
    2. I have used SphinxAlign on the 2 sets of recordings and obtained 2 folders of wdseg/phseg files, one for each speaker
    3. I have used a perl script that, for each Speaker Folder, did the following
      3.1. for each wdseg file, it divided the total acoustic score by the number of
      frames, obtaining a average / frame of acoustic score
      3.2. for all wdseg files in the folder, I made the sum of all these averages
      and divided it by the number of files
    4. I have obtained therefore 2 scores, one for each speaker which I compared.

    Question is : is this a good mesure of how an acoustic model reacts for the 2
    speakers?

    If not, what other choices do I have?

    Thank you

     
  • Nickolay V. Shmyrev

    Question is : is this a good mesure of how an acoustic model reacts for the
    2 speakers?

    Sorry, acostic model is not the thing that reacts to speakers like a dog
    reacts to a fart.

    Maybe you wanted to measure something else in this case the first thing you
    need to do is to find out what you want to measure first. Maybe if you will
    give more extended description people will be able to suggest you the right
    words.

     
  • Remus

    Remus - 2011-02-24

    :) so, having 2 speakers, I want to know if the Acoustic Model is good or not
    for both of them

    If I do Sphinx Decode, I get the WER, ok? I want a similar stuff but only with
    the acoustic model and not with to mix the language model.

    So, any inputs on that? Also, "Sorry, acostic model is not the thing that
    reacts to speakers like a dog reacts to a fart. " is not a valid scientific
    answer.

    I would prefer something telling me why making an average of the Acoustic
    Score doesn't work, maybe explaining me better what is that acoustic score.

    Thank you.

     
  • Nickolay V. Shmyrev

    :) so, having 2 speakers, I want to know if the Acoustic Model is good or
    not for both of them

    What is "good"? Models aren't good or bad as is, they aren't film heroes.
    Model can be good for some application for example for phonetic segmentation
    or for recognition. And you just need to measure the performance of the model
    on that application, not the performance of the model as is.

    If I do Sphinx Decode, I get the WER, ok? I want a similar stuff but only
    with the acoustic model and not with to mix the language model. So, any inputs
    on that?

    If you want to abstract from the language model, you can measure phonetic
    recognizer error rate. This is a standard approach for measuring acoustic
    model quality for example in TIMIT experiments.

    Also, "Sorry, acostic model is not the thing that reacts to speakers like a
    dog reacts to a fart. " is not a valid scientific answer.

    So the question is. If question is not properly stated you can't recieve
    proper answer.

    I would prefer something telling me why making an average of the Acoustic
    Score doesn't work, maybe explaining me better what is that acoustic score.

    When you are doing forced alignment you are trying to fit the audio into
    transcription which is not always a good idea. For example if there is
    mismatch between ideal transcription and real pronunciation you'll get the
    misalignments and other bad things. Automatic segmentation is still very error
    prone. You can check phonetic labels for example to see that.

    Because of that the score of the model on the segment is meaningless. It
    doesn't show how your model will behave when it will be less restricted by the
    grammar. Average of meaningless scores is even more meaningless than the score
    itself.

     
  • Remus

    Remus - 2011-02-24

    ok..thanks a lot for explaining me this. I understand that my question wasn't
    the best chosen. But now, all is clear.

    Anyway, about this : "If you want to abstract from the language model, you can
    measure phonetic recognizer error rate. This is a standard approach for
    measuring acoustic model quality for example in TIMIT experiments. "

    Can you tell me how to measure this? Or guide me to some online reading about
    this on Sphinx3? I realise I already took much of your time, so I apologise.

    Thank you

     
  • Nickolay V. Shmyrev

    You can find the documentation about setting up phonetic recognizer here:

    http://cmusphinx.sourceforge.net/wiki/phonemerecognition

    To measure the error rate you can just compare phoentic recognizer output with
    reference phonetic transcription same way as word error rate is calculated.

     
  • Remus

    Remus - 2011-02-24

    Ok, thank you. I guess this thread is closed.

     

Log in to post a comment.