I am using the tutorial page "Adapting the default acoustic model". When I run the step "Generating acoustic feature files", there are .mfc files created for each of my wav files. These files are in unreadable, binary format. Do they contain the features (MFCCs) of each of the sound signals?
I have learnt the following steps for creating MFCCs from .wav file (text book):
* wav file's sampling rate: 16000.
Framing --> Windowing --> apply FFT --> Find the magnitude of the fft --> Convert the fft data into filter bank outputs --> Find the log base 10 --> Find the cosine transform
Implementation of this procedure creates a file with a number of floating point values (positive and negative).
Is the procedure adapted in Sphinx4 is same as the above? If so, is it possible to get a readable (text) version of these .MFC files?
Thank you.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Thank you. I got the numbers.
But, I'm finding that the numbers are too different for the same wav file. Going through my algorithm to get the values as sphinx's output.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I am going through the MFC files created using sphinx_fe with my wav file as a parameter. I got the readable version using sphinx_cepview program.
I see the contents of each MFC file is organized as a 10 column matrix. The number of rows increase with the wav file size.
Can I get more details on the contents and structure of the MFC files.
Thank you.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I am using Sphinx4 for "Adapting the default acoustic model".
I have attached herewith data.zip which has some voice files under a directory "notworking". When I run sphinx_fe.exe (windows version X64, release) with these files, the program crashes. I have attached the Sphinx_fe error.docx to show the same.
The command line I am using is: "F:\CMUSphinx\sphinxbase\bin\Release\x64\sphinx_fe.exe" -argfile cmusphinx-en-us-5.2\feat.params -samprate 16000 -c VoiceFiles\UAS2.fileids -di .\VoiceFiles -do . -ei wav -eo mfc -mswav yes
I then converted the audio files using an online tool with the following parameters:
Sampling rate: 16000 Hz
Bit resolution: 16bits
Audio channels: mono
After conversion, the error disappears and I am able to create the .MFC files successfully. However, when I analyze the 2 audio files (before and after conversion), the metadata output is same. Running the converter has solved the problem means, there must be some change in the audio files' meta data, isn't it? I have also attached the MetaDataOutput.docx. Could somebody please look into this and clarify.
Thank you.
Hello,
I am using the tutorial page "Adapting the default acoustic model". When I run the step "Generating acoustic feature files", there are .mfc files created for each of my wav files. These files are in unreadable, binary format. Do they contain the features (MFCCs) of each of the sound signals?
I have learnt the following steps for creating MFCCs from .wav file (text book):
* wav file's sampling rate: 16000.
Framing --> Windowing --> apply FFT --> Find the magnitude of the fft --> Convert the fft data into filter bank outputs --> Find the log base 10 --> Find the cosine transform
Implementation of this procedure creates a file with a number of floating point values (positive and negative).
Is the procedure adapted in Sphinx4 is same as the above? If so, is it possible to get a readable (text) version of these .MFC files?
Thank you.
Yes
Run
sphinx_cepview -f file.mfc
Thank you. I got the numbers.
But, I'm finding that the numbers are too different for the same wav file. Going through my algorithm to get the values as sphinx's output.
I am going through the MFC files created using sphinx_fe with my wav file as a parameter. I got the readable version using sphinx_cepview program.
I see the contents of each MFC file is organized as a 10 column matrix. The number of rows increase with the wav file size.
Can I get more details on the contents and structure of the MFC files.
Thank you.
https://cmusphinx.github.io/wiki/mfcformat/
Hello,
I am using Sphinx4 for "Adapting the default acoustic model".
I have attached herewith data.zip which has some voice files under a directory "notworking". When I run sphinx_fe.exe (windows version X64, release) with these files, the program crashes. I have attached the Sphinx_fe error.docx to show the same.
The command line I am using is:
"F:\CMUSphinx\sphinxbase\bin\Release\x64\sphinx_fe.exe" -argfile cmusphinx-en-us-5.2\feat.params -samprate 16000 -c VoiceFiles\UAS2.fileids -di .\VoiceFiles -do . -ei wav -eo mfc -mswav yes
I then converted the audio files using an online tool with the following parameters:
After conversion, the error disappears and I am able to create the .MFC files successfully. However, when I analyze the 2 audio files (before and after conversion), the metadata output is same. Running the converter has solved the problem means, there must be some change in the audio files' meta data, isn't it? I have also attached the MetaDataOutput.docx. Could somebody please look into this and clarify.
Thank you.
Balaji.