I have around 500 telephone calls that I need to get transcribed. I checked for the bandwidth and found it to be 8KHz. So I am using the 8KHz models. I am facing some problems.
1)First with cmusphinx-en-us-8khz-5.2 which is a continous model:-
Righ out of the box this model get some part of the transcript right but most of it is completely wrong. So I thought of adapting the model. I started with adding one call to the adaption data. I spli the call into chunks of 10s and created thetranscription to go along with it. I followed the tutorial for model adaptation and did the following:-
Now I pointed the hmm to the new adapted directory. Now the generated transcript is completely wrong. I tested the same file I trained the model on. It transcribes something about war an all where as the audio had no mention of it. Can you tell me if I am doing something wrong or if my understanding is wrong. Since these are patient calls it would be difficult for me to share it here but for sample I have the following type of audio in my training data:-
I have around 500 telephone calls that I need to get transcribed.
For 500 calls it is easier to use the commercial service.
I checked for the bandwidth and found it to be 8KHz. So I am using the 8KHz models. I am facing some problems.
Actually it is about 12khz, higher than 8. I also doubt your phones were recorded on the phone.
Righ out of the box this model get some part of the transcript right but most of it is completely wrong.
Our tutorial recommends to compute the word error rate instead.
Can you tell me if I am doing something wrong or if my understanding is wrong.
First of all you need to get audio of better quality. The one you have provided is never going to work just because audio is bad.
Then you can try more advanced toolkits like Kaldi, but they will not probably work out of box. To get reasonable results you will have to train the model.
Second issue is when I try to use the ptm model: cmusphinx-en-us-ptm-5.2 I get he following error:-
You forgot to cleanup the feature_tranform file from the previous model when you unpacked the new one. In any case, ptm model is less accurate.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
It is not possible to cleanup already corrupted audio. Instead you need to check the recording process and avoid all harmful steps like codec transcoding.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I have around 500 telephone calls that I need to get transcribed. I checked for the bandwidth and found it to be 8KHz. So I am using the 8KHz models. I am facing some problems.
1)First with cmusphinx-en-us-8khz-5.2 which is a continous model:-
Righ out of the box this model get some part of the transcript right but most of it is completely wrong. So I thought of adapting the model. I started with adding one call to the adaption data. I spli the call into chunks of 10s and created thetranscription to go along with it. I followed the tutorial for model adaptation and did the following:-
Now I pointed the hmm to the new adapted directory. Now the generated transcript is completely wrong. I tested the same file I trained the model on. It transcribes something about war an all where as the audio had no mention of it. Can you tell me if I am doing something wrong or if my understanding is wrong. Since these are patient calls it would be difficult for me to share it here but for sample I have the following type of audio in my training data:-
and so your pharmacy where you get it filled out that they can make duplicate labels if necessary so that you can have a prescription label on everything and then your sharps container should be(chunk43)https://drive.google.com/file/d/1zLx8Tl5EN3aYtLwFSIW01HI3Rm3S8whr/view?usp=sharing (sample train file in google drive).
Please let me know what I am doing wrong. As far as I know I did not get any error during the model adaption.
2) Second issue is when I try to use the ptm model:-
cmusphinx-en-us-ptm-5.2 I get he following error:-
Any help is highly appreciated.
For 500 calls it is easier to use the commercial service.
Actually it is about 12khz, higher than 8. I also doubt your phones were recorded on the phone.
Our tutorial recommends to compute the word error rate instead.
First of all you need to get audio of better quality. The one you have provided is never going to work just because audio is bad.
Then you can try more advanced toolkits like Kaldi, but they will not probably work out of box. To get reasonable results you will have to train the model.
You forgot to cleanup the feature_tranform file from the previous model when you unpacked the new one. In any case, ptm model is less accurate.
For 500 calls it is easier to use the commercial service.
--> Which commercial service are you talking about?? IBM Watson, Google??
First of all you need to get audio of better quality. The one you have provided is never going to work just because audio is bad.
--> Is it possible to clean up this audio in anyway?
Last edit: Anirban mishra 2017-11-29
It is not possible to cleanup already corrupted audio. Instead you need to check the recording process and avoid all harmful steps like codec transcoding.