Menu

Building a hebrew acoustic model

Help
hagai sela
2012-07-13
2012-09-22
  • hagai sela

    hagai sela - 2012-07-13

    Hi,
    I want to build a Hebrew command and control application generator using
    sphinx. All applications should use the same acoustic model with different
    grammars which may contain all possible Hebrew words. I used adaptation of the
    8 khz model that comes with sphinx before, but if I understand correctly it's
    better that I build an acoustic model for Hebrew.
    I read this tutorial and I have a few questions:
    1. Can I use hebrew characters in my dictionary and phoneset files? If not, what can I use to describe non english consonants?
    2. Can I use right to left sentences in my transcription and dictionary files?
    3. As I understand I should also supply a Hebrew language model. I searched a little and didn't find any models. Does anybody know if and where I can find one?
    If not I guess I'll have to do some wikpedia dumps.

    Thanks,
    Hagai.

     
  • Nickolay V. Shmyrev

    1. Can I use hebrew characters in my dictionary and phoneset files?

    In dictionary yes, in phoneset it's not recommended

    If not, what can I use to describe non english consonants?

    you can encode them with alphanumeric sequences. Like use tav for .תֵ

    1. Can I use right to left sentences in my transcription and dictionary
      files?

    Left-to-right is a way to display the file, not the way how information is
    actually stored

    1. As I understand I should also supply a Hebrew language model. I searched
      a little and didn't find any models. Does anybody know if and where I can find
      one?

    For command and control you do not need a language model, you can write a
    simple JSGF grammar.

     
  • hagai sela

    hagai sela - 2012-07-15

    Hi,
    I don't think I can use JSGF grammars, since I don't know in advance what it
    will be. I am writing an application generator where every application may
    configure a different grammar.

     
  • hagai sela

    hagai sela - 2012-07-16

    I successfully extracted sentences from wikipedia, but the hebrew dictionary I
    downloaded doesn't seem good - It isn't really phonetic, it just translates
    the hebrew letters to latin letters.
    Anyway, as you said I am not sure I need this, since my apps will all used
    JSGF grammars with a relatively small number of words.
    What I want to avoid is the need to re-train the system when a new group of
    words has been added. Will this be possible if I use many recording hours with
    a lot of possible words?

     
  • Nickolay V. Shmyrev

    What I want to avoid is the need to re-train the system when a new group of
    words has been added. Will this be possible if I use many recording hours with
    a lot of possible words?

    Yes

     
  • hagai sela

    hagai sela - 2012-07-18

    OK, Thanks.
    I have some data which I previously used to adapt the default 8 khz model, and
    I am trying to use it to build a model just to get my feet wet. When I used it
    for adaptation I got no errors or warnings.
    Now I get some "Failed to align audio to trancript: final state of the search
    is not reached" errors. Is this because I don't have enough data? Shouldn't it
    be the same as in adaptation?

    This is my training folder:
    https://docs.google.com/open?id=0B91Vmp4A3YOuMDUyaGp6ZUJISVE

     
  • hagai sela

    hagai sela - 2012-07-23

    I re-ran using the trunk version and things seem a lot better. It seemed like
    my previous run was configured for 16 khz even though I followed the 8 khz
    settings in the tutorial. Now I get less alignment errors.

     
  • Nickolay V. Shmyrev

    Great

     
  • hagai sela

    hagai sela - 2012-07-23

    Another question - Is there a way to configure JSGF instead of using a
    language model in the sphinx_train.cfg file?

     
  • Nickolay V. Shmyrev

    Another question - Is there a way to configure JSGF instead of using a
    language model in the sphinx_train.cfg file?

    No

     
  • hagai sela

    hagai sela - 2012-07-23

    So where can I configure it?

     
  • Nickolay V. Shmyrev

    So where can I configure it?

    You can not configure it. To use jsgf instead of lm you need to edit
    psdecode.pl perl script and specify the required decoder option there.

     
  • hagai sela

    hagai sela - 2012-07-24

    I managed to configure it. I added a DEC_CFG_GRAMMAR variable to
    sphinx_train.cfg, copied psdecode.pl to psdecodejsgf.pl, changed
    DEC_CFG_SCRIPT to psdecodejsgf.pl, and changed the pocketsphinx_batch command
    in psdecodejsgf.pl to run -jsgf => $ST::DEC_CFG_GRAMMAR instead of the -lm
    line.
    With the default parameters I am getting 3.8% WER and 23.8% sentence error
    rate on the training data (about 55 minutes of audio). Any pointers on which
    parameters I can change to get better results? (I am also in the process of
    collecting more audio).

     
  • Nickolay V. Shmyrev

    Any pointers on which parameters I can change to get better results?

    Size of the data

     

Log in to post a comment.