Bavieca (www.bavieca.org) / Discussion / General Discussion: MLF loading error!

Dae Lim Choi - 2013-07-08

Hello.
I am Dae Lim Choi, Korea.

I am very interesting in bavieca speech recogntion toolkits and I would like to make Korean acoustic model using bavieca.

I am trying to test training, but unfortunately, I encountered the following error message in the process of feature extraction.
~~~ mlfAll.txt at line 1, unexpected lexical unit found or wrong format of feature file name.

This is my mater label file.
"/female/fcb1jkh00s200/set200001.fea"
gU
gjvl_gwa
da_Um_gwa
gat_Un
gjvl_ron_Ul
vd_Ul
su
iS_vS_da
"/female/fcb1jkh00s200/set200002.fea"
gU
sa_ram_Un
i_ze
jv_gi
wa_sv
gU_rvn
gvl
sal
il_Un
vbs_Ul
gv_je_jo
"/female/fcb1jkh00s200/set200003.fea"
gU_ga
na_rUl
dol_a_bo_mjv
zo_joN_hi
ib_Ul
jvl_vS_da
....

Would you please tell me why this error happens?

I wonder the difference between master label file and MLF segments.

I would be very pleased if you could inform me training receipe(for example, directory structure, MLF file, MLF segments files etc.).

Thanks in advance for your help.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Daniel Bolanos - 2013-07-08

Hello Dae Lim Choi,

Your MLF seems fine. Please, check that there are no white spaces after the .fea" in the first line, just the end of line. The code that parses the MLF is not very tolerant to different formatting. If that does not solve the issue please send me yor MLF and I will take a closer look.

the MLF contains all the data used for training, The MLF segments are just different parts in which the original MLF is divided. The segments are needed for parallel processing, as typically each core processes a different MLF segment. Which should be the same size. For example if you have 1000 utterances for training the "global" MLF should contain them all, and each MLF segment should contain 250 in case you are training on a 4 core machine.

Please take a look at the WSJ training scripts in the repository, that should help you a lot.

Dani

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Dae Lim Choi - 2013-07-10

Dear Daniel,
Thank you very much for your kind explanations.

Although there are no white spaces after the .fea" in my MLF, still the same error occurs.

Please find the attached my MLF.

Global MLF was split into 4 MLF segments according to the number of cpu cores.

I have another question.
Should each MLF segment contain exactly the same number of utterances?

Thank you so much again for your all help.

Dae Lim.

Last edit: Dae Lim Choi 2013-07-10

mlfAll.txt

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Daniel Bolanos - 2013-07-11

Hello Dae Lim,

I took a closer look at your MLF and it is fine. That made me reread your original mail and realize that you are having this issue on the feature extraction, which is the "param" tool. That tool does not take any MLF as input, but a batch file. Please make sure you have the latest version of the toolkit from the git repository. Also, please take a look at the generic training and feature extraction scripts that come with it.

Regarding your question about the MLF segments, ideally all the segments should contain the same amount of speech, rather than the same amount of utterances. So if you have four hours of speech, you can create four master label files that are one hour each. The idea behind this is that, for each reestimation iteration during training, we want all cores to finish processing their MLF at the same time more or less, This maximizes CPU usage.

hope this helps

Dani

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Dae Lim Choi - 2013-07-11

Thank you so much, Daniel.

I'm so sorry to bother you again.

To extract all features was successful, but it gave an error in next step "initialize the HMM parameters to the global distribution of the data (flat-start)".

This is the error I met.

hmminitializer (version: 0014, author: Daniel Bolanos)

hmminitializer -fea /home/dlchoi/bavieca-code/tasks/dict01/scripts/train/fea
-cfg /home/dlchoi/bavieca-code/tasks/dict01/scripts/train/config/features.cfg
-pho /home/dlchoi/bavieca
-code/tasks/dict01/scripts/train/config/lexicon/phoneset.txt
-lex /home/dlchoi/bavieca-code/tasks/dict01/scripts/train/config/lexicon/lexicon.txt
-mlf /home/dlchoi/bavieca-code/tasks/dict01/scripts/train/mlf/mlfAll.txt -met flatStart
-mod /home/dlchoi/bavieca-code/tasks/dict01/scripts/train/AM/init/models00.bin

Error: load ../common/estimation/MLFFile.cpp 74 loading MLF at line /home/dlchoi/bavieca-code/tasks/dict01/scripts/train/mlf/mlfAll.txt at line 1, unexpected lexical unit found or wrong format of feature file name
load ../common/estimation/MLFFile.cpp 74 loading MLF at line /home/dlchoi/bavieca-code/tasks/dict01/scripts/train/mlf/mlfAll.txt at line 1, unexpected lexical unit found or wrong format of feature file name

Lexicon file was also attached to to take a look at this problem.

I would appreciate you letting me know of any problems.

Dae Lim

Last edit: Dae Lim Choi 2013-07-11

lexicon.txt

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Daniel Bolanos - 2013-07-11

Dae Lim,

I created a mini app to try to reproduce the problem that you have. I created a phoneset for that purpose using your lexicon file (it is enclosed).

This is how the app looks like:

PhoneSet phoneset1("phoneset.txt");
phoneset1.load();

LexiconManager lexiconManager1("lexicon.txt",&phoneset1);
lexiconManager1.load();

MLFFile mlf(&lexiconManager1,"mlfAll.txt",MODE_READ);
mlf.load();

The app work well and the MLF is loaded, using the debugger i can see that it loads 41666 utterances. I cannot reproduce the issue that you are having. Please take a look at what is going on inside the mlf.load() method (MLFFile.cpp). That is where the Error you see is coming from. Put a couple of printf on that method so we can see what is going on.

Dani

phoneset.txt

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Dae Lim Choi - 2013-07-16

Hello, Daniel.

I'm still experiencing mlf & lexicon loading error.

First of all,
I would like to know linux environments for compile & install bavieca(Distribution and version, gcc compiler version, etc.)

Could you please let me know these informations in detail?

I think it would be helpful to find the cause of the problem.

Thank you very much for your help.

Dae Lim

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Daniel Bolanos - 2013-07-16

Hello Dae Lim,

I'm sorry you are still dealing with that problem. Did you try to put a couple of printf to see what is going on?

The toolkit should run on pretty much any version of Linux. I have successfully used it with:

(gcc version 4.6.3 (Ubuntu/Linaro 4.6.3-1ubuntu5)
and
(gcc version 4.7.2 20121109 (Red Hat 4.7.2-8) (GCC) )
and also in CentOS

what version of Linux/gcc are you using?

Daniel

Last edit: Daniel Bolanos 2013-07-16

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

MLF loading error!

Bavieca is an open-source speech recognition tookit.

Forums

Help

MLF loading error!

MLF loading error!

Bavieca is an open-source speech recognition tookit.

Forums

Help

MLF loading error! document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

MLF loading error!