CMU Sphinx / Forums / Help: Problem training an arabic acoustic model

jam - 2015-06-16

Hi,

I am using an Ubuntu 14.04 virtual machine with a Windows 8 host OS.

I prepared all the files to create an acoustic model for a subset of the arabic language.

I started the training (continuous - 16KHz audio - 25 hours of audio with 30 different speakers - 1300 words vocabulary - 3000 senones - 16 densities) for that model and it failed. I would like to have an expert advice on that to know why it failed please.

Each Baum Welch iteration also comes with around 300 to 400 errors due to "Failed to align audio to trancript" errors. The training might have failed because of those errors.

Some details:

The training failed in module 30 and the last lines printed are the following:

Baum welch starting for iteration: 4 (1 of 1)
bw Log File
This step had 372 ERROR messages and 0 WARNING messages. Please check the log file for details.
completed
Normalization for iteration: 4
norm Log File
completed
Current Overall Likelihood Per Frame = -142.787005125471
Convergence Ratio = 0.149139295430558
Baum welch starting for iteration: 5 (1 of 1)
bw Log File
completed
Only 0 parts of 1 of Baum Welch were successfully completed
Parts 1 failed to run!
Training failed in iteration 5

In the log file for the last step we can see the following error:

bw: gauden.c:1354: gauden_scale_densities_bwd: Assertion `finite(den[c][j][k])' failed.

In the source code I can see where the error is but it is not clear why?

If some more details are needed to investigate, please do not hesitate to check the whole folder (etc + wav with all logs and source files) in the following link:
https://www.dropbox.com/s/3j027o2jq8d2vf2/error.tar.gz?dl=0

INFO: I also created another post because I wanted to use forced alignment to see if the results were better but I had problems doing it.

I think that forced alignment will get rid of those errors by removing the corresponding audio files from the training. am I right?

Thanks a lot for your help,

Jamal.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Nickolay V. Shmyrev - 2015-06-18

Each Baum Welch iteration also comes with around 300 to 400 errors due to "Failed to align audio to trancript" errors. The training might have failed because of those errors.

Yes, this is the most likely reason.

If some more details are needed to investigate, please do not hesitate to check the whole folder (etc + wav with all logs and source files) in the following link:

The data you provided is not enough to reproduce your problem for that reason it's not easy to help you and investigate the issue. You need to provide full data.

I think that forced alignment will get rid of those errors by removing the corresponding audio files from the training. am I right?

Yes.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

jam - 2015-06-18

OK thanks a lot Nickolay.

Today I will check some files that are producing errors and see what could be the source of the problems.

In parrallel I will also work on forced alignment to see if it gives better results.

What do you mean by full data? I provided the etc + wav(as a tar file) + bwaccumdir + logdir + model_architecture + model_parameters + qmanager folders which are all the folders I had.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

jam - 2015-06-18

OK I understand now Nickolay.

I checked the shared archive and I removed most of the wav files (speaker4 to 34). That is why you could not reproduce it.

Sorry for that.

I will work on analyzing the different files and try to see the source of the problem for now and get back to you if I can not find anything.

Thanks,

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

jam - 2015-06-18

After a quick analysis I noted that:
1) Some audios were not exactly corresponding to the transcriptions
2) Other audios were corresponding but the last word is slightly cut in the end that could lead to errors for the tool

I chose to use forced alignment for now and see the results.

I will get back to changing each audio (150 to 200 audios) to match transcriptions if the results are not good with forced alignment or if I need further improvements in accuracy.

Thanks for your support,

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Problem training an arabic acoustic model

Speech Recognition Toolkit

Forums

Help

Problem training an arabic acoustic model document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

Problem training an arabic acoustic model