CMU Sphinx / Forums / Help: Arabic speech recognition using Sphinx4

Speech Recognition Toolkit

Arabic speech recognition using Sphinx4

Forum: Help

Creator: Sny

Created: 2015-07-30

Updated: 2015-08-01

Sny - 2015-07-30

Hello everyone,

First, I used Audacity to record some Arabic digits, then I saved them as wav format.

I am following these two tutorial

http://cmusphinx.sourceforge.net/wiki/tutorialam

http://www.speech.cs.cmu.edu/sphinx/tutorial.html

according to first tutorial I have my recording file must be in MS WAV format

1-my first question is how can I do this using Audacity?

Second, I prepared the Transcription file, Phonetic Dictionary, Phoneset file, Phoneset file, and

a text file listing the names of the recordings.

Then, I downloaded Sphinxtrain, Sphinxbase, Sphinx3.

I started to follow the training steps in the second tutorial

First I Compiled SphinxTrain using

1.cd SphinxTrain
2. ./configure
3.make

Then, I started too train my data

perl scripts_pl/setup_tutorial.pl my data folder name

After that I opened the the file etc/sphinx_train.cfg, I changed the format of database audio from

sph to wav and nist to mswav.

Then I started to generate the features using

cd SphinxTrainTutorial/my data folder name

perl scripts_pl/make_feats.pl -ctl etc/my data folder name_train.fileids

Finally generated the acoustic models

perl scripts_pl/RunAll.pl

here there were many mistakes

WARNING: phone "E " has an extra white spaces

WARNING: phone "F " has an extra white spaces

WARNING: phone "A" has an extra white spaces

.
.
.
.
Found 13 words using 22 phones

Warning: this phone occurs in the dictionary , but not in the phone list
Warning: this phone <aa> occurs in the dictionary , but not in the phone list
Warning: this phone <ain> occurs in the dictionary , but not in the phone list
.
.
.
.
Warning: this phone </ain></aa> occurs in the phone list, but not in the dictionary
Warning: this phone <aa> occurs in the phone list, but not in the dictionary
Warning: this phone <ain> occurs in the phone list, but not in the dictionary.
.
.
.
.
WARNING: CTL file missing newline at the end of file
Warning: this phone </ain></aa> occurs in the phone list, but not in the transcription
Warning: this phone <aa> occurs in the phone list, but not in the transcription
Warning: this phone <ain> occurs in the phone list, but not in the transcription
Even though all phones are matches in all files, above mistakes appeared
My question what was </ain></aa>

The attached files are

Transcription file, Phonetic Dictionary, Phoneset file, Phoneset file, and

a text file listing the names of the recordings.

Last edit: Sny 2015-07-31

aasr.dic

aasr.filler

aasr.html

aasr.phone

aasr_test.fileids

aasr_train.fileids

aasr_train.transcription

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Nickolay V. Shmyrev - 2015-07-30

My question what was my mistake?

You are using outdated tutorial.

You are using outdated code.

You assume that if you paste text files into message your readers will get their originals. You do not know that it is better to share files in archive instead of pasting them.

You do not read output messages. It might be surprising for you, but you can read the output and act on it. It describes the problems in plain English.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Sny - 2015-07-31
  
  I attached the text files as you suggested.
  I do not understand what you meant by reading the output messages.
  could you tell me how can I save my recording in Ms format and what are the mistakes in my text files.
  
  which tutorial should I use?
  
  Last edit: Sny 2015-07-31
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Mike Pelley - 2015-08-01

In Audacity, after you record the speech, do an "export audio" and select a file type of "WAV (Microsoft) signed 16 bit PCM"

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Sny - 2015-08-01
  
  That is what I did
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Arabic speech recognition using Sphinx4

Speech Recognition Toolkit

Forums

Help

Arabic speech recognition using Sphinx4 document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

Arabic speech recognition using Sphinx4