Menu

MLF loading error!

2013-07-08
2013-07-16
  • Dae Lim Choi

    Dae Lim Choi - 2013-07-08

    Hello.
    I am Dae Lim Choi, Korea.

    I am very interesting in bavieca speech recogntion toolkits and I would like to make Korean acoustic model using bavieca.

    I am trying to test training, but unfortunately, I encountered the following error message in the process of feature extraction.
    ~~~ mlfAll.txt at line 1, unexpected lexical unit found or wrong format of feature file name.

    This is my mater label file.
    "/female/fcb1jkh00s200/set200001.fea"
    gU
    gjvl_gwa
    da_Um_gwa
    gat_Un
    gjvl_ron_Ul
    vd_Ul
    su
    iS_vS_da
    "/female/fcb1jkh00s200/set200002.fea"
    gU
    sa_ram_Un
    i_ze
    jv_gi
    wa_sv
    gU_rvn
    gvl
    sal
    il_Un
    vbs_Ul
    gv_je_jo
    "/female/fcb1jkh00s200/set200003.fea"
    gU_ga
    na_rUl
    dol_a_bo_mjv
    zo_joN_hi
    ib_Ul
    jvl_vS_da
    ....

    Would you please tell me why this error happens?

    I wonder the difference between master label file and MLF segments.

    I would be very pleased if you could inform me training receipe(for example, directory structure, MLF file, MLF segments files etc.).

    Thanks in advance for your help.

     
  • Daniel Bolanos

    Daniel Bolanos - 2013-07-08

    Hello Dae Lim Choi,

    Your MLF seems fine. Please, check that there are no white spaces after the .fea" in the first line, just the end of line. The code that parses the MLF is not very tolerant to different formatting. If that does not solve the issue please send me yor MLF and I will take a closer look.

    the MLF contains all the data used for training, The MLF segments are just different parts in which the original MLF is divided. The segments are needed for parallel processing, as typically each core processes a different MLF segment. Which should be the same size. For example if you have 1000 utterances for training the "global" MLF should contain them all, and each MLF segment should contain 250 in case you are training on a 4 core machine.

    Please take a look at the WSJ training scripts in the repository, that should help you a lot.

    Dani

     
  • Dae Lim Choi

    Dae Lim Choi - 2013-07-10

    Dear Daniel,
    Thank you very much for your kind explanations.

    Although there are no white spaces after the .fea" in my MLF, still the same error occurs.

    Please find the attached my MLF.

    Global MLF was split into 4 MLF segments according to the number of cpu cores.

    I have another question.
    Should each MLF segment contain exactly the same number of utterances?

    Thank you so much again for your all help.

    Dae Lim.

     

    Last edit: Dae Lim Choi 2013-07-10
  • Daniel Bolanos

    Daniel Bolanos - 2013-07-11

    Hello Dae Lim,

    I took a closer look at your MLF and it is fine. That made me reread your original mail and realize that you are having this issue on the feature extraction, which is the "param" tool. That tool does not take any MLF as input, but a batch file. Please make sure you have the latest version of the toolkit from the git repository. Also, please take a look at the generic training and feature extraction scripts that come with it.

    Regarding your question about the MLF segments, ideally all the segments should contain the same amount of speech, rather than the same amount of utterances. So if you have four hours of speech, you can create four master label files that are one hour each. The idea behind this is that, for each reestimation iteration during training, we want all cores to finish processing their MLF at the same time more or less, This maximizes CPU usage.

    hope this helps

    Dani

     
  • Dae Lim Choi

    Dae Lim Choi - 2013-07-11

    Thank you so much, Daniel.

    I'm so sorry to bother you again.

    To extract all features was successful, but it gave an error in next step "initialize the HMM parameters to the global distribution of the data (flat-start)".

    This is the error I met.

    hmminitializer (version: 0014, author: Daniel Bolanos)

    hmminitializer -fea /home/dlchoi/bavieca-code/tasks/dict01/scripts/train/fea
    -cfg /home/dlchoi/bavieca-code/tasks/dict01/scripts/train/config/features.cfg
    -pho /home/dlchoi/bavieca
    -code/tasks/dict01/scripts/train/config/lexicon/phoneset.txt
    -lex /home/dlchoi/bavieca-code/tasks/dict01/scripts/train/config/lexicon/lexicon.txt
    -mlf /home/dlchoi/bavieca-code/tasks/dict01/scripts/train/mlf/mlfAll.txt -met flatStart
    -mod /home/dlchoi/bavieca-code/tasks/dict01/scripts/train/AM/init/models00.bin

    Error: load ../common/estimation/MLFFile.cpp 74 loading MLF at line /home/dlchoi/bavieca-code/tasks/dict01/scripts/train/mlf/mlfAll.txt at line 1, unexpected lexical unit found or wrong format of feature file name
    load ../common/estimation/MLFFile.cpp 74 loading MLF at line /home/dlchoi/bavieca-code/tasks/dict01/scripts/train/mlf/mlfAll.txt at line 1, unexpected lexical unit found or wrong format of feature file name

    Lexicon file was also attached to to take a look at this problem.

    I would appreciate you letting me know of any problems.

    Dae Lim

     

    Last edit: Dae Lim Choi 2013-07-11
  • Daniel Bolanos

    Daniel Bolanos - 2013-07-11

    Dae Lim,

    I created a mini app to try to reproduce the problem that you have. I created a phoneset for that purpose using your lexicon file (it is enclosed).

    This is how the app looks like:

    PhoneSet phoneset1("phoneset.txt");
    phoneset1.load();

    LexiconManager lexiconManager1("lexicon.txt",&phoneset1);
    lexiconManager1.load();

    MLFFile mlf(&lexiconManager1,"mlfAll.txt",MODE_READ);
    mlf.load();

    The app work well and the MLF is loaded, using the debugger i can see that it loads 41666 utterances. I cannot reproduce the issue that you are having. Please take a look at what is going on inside the mlf.load() method (MLFFile.cpp). That is where the Error you see is coming from. Put a couple of printf on that method so we can see what is going on.

    Dani

     
  • Dae Lim Choi

    Dae Lim Choi - 2013-07-16

    Hello, Daniel.

    I'm still experiencing mlf & lexicon loading error.

    First of all,
    I would like to know linux environments for compile & install bavieca(Distribution and version, gcc compiler version, etc.)

    Could you please let me know these informations in detail?

    I think it would be helpful to find the cause of the problem.

    Thank you very much for your help.

    Dae Lim

     
  • Daniel Bolanos

    Daniel Bolanos - 2013-07-16

    Hello Dae Lim,

    I'm sorry you are still dealing with that problem. Did you try to put a couple of printf to see what is going on?

    The toolkit should run on pretty much any version of Linux. I have successfully used it with:

    (gcc version 4.6.3 (Ubuntu/Linaro 4.6.3-1ubuntu5)
    and
    (gcc version 4.7.2 20121109 (Red Hat 4.7.2-8) (GCC) )
    and also in CentOS

    what version of Linux/gcc are you using?

    Daniel

     

    Last edit: Daniel Bolanos 2013-07-16

Log in to post a comment.