Then ran word_align.pl on a small audio file. There were a number of errors, however they could be related to my misunderstanding of the 2 files to parse though work_align.pl . Here are the errors:
Use of uninitialized value $hyp_uttid in concatenation (.) or string at word_align.pl line 53, <hyp> line 1.
Use of uninitialized value $hyp_uttid in hash element at word_align.pl line 53, <hyp> line 1.
Use of uninitialized value $ref_uttid in hash element at word_align.pl line 82, <ref> line 1.
Use of uninitialized value $hyp_uttid in concatenation (.) or string at word_align.pl line 157, <ref> line 1.</ref></ref></hyp></hyp>
This is not unexpected though. As I'm needing to transcibe many audios for a single speaker, and viewing the options of needing to train or not needing to train at https://cmusphinx.github.io/wiki/tutorialam/ , am I to assume that I need to train ?
There are over 200 audios and in excess of 77 hours of duration.
Last edit: Peter 2018-03-09
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
As the training I need to do is already recorded, I took a 10 minute WAV file, then ran it through webrtcvad ( https://github.com/wiseman/py-webrtcvad ) to split the audio into many small files.
Am now going through each file (some are just noise or empty so these are deleted), listening to it, then recording the transcript in a text file. As the transcripts that CMUSphinx needs for training are of a slightly different format, have used the attached code to modify the .txt files
With creating the transcripts for each file, I'm assuming that I should attempt to write a word exactly as I hear it, is that correct. Here is an example:
Spoken word --> because --> written word --> because
Spoken word --> 'cause --> written word --> 'cause
The example was the word 'because' spoken two different ways. I'm assuming that it's okay to put minor puntuation in the transcripts ??
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Have installed cmusphinx as per the instructions at http://jrmeyer.github.io/asr/2016/01/08/Installing-CMU-Sphinx-on-Ubuntu.html
Then ran word_align.pl on a small audio file. There were a number of errors, however they could be related to my misunderstanding of the 2 files to parse though work_align.pl . Here are the errors:
Use of uninitialized value $hyp_uttid in concatenation (.) or string at word_align.pl line 53, <hyp> line 1.
Use of uninitialized value $hyp_uttid in hash element at word_align.pl line 53, <hyp> line 1.
Use of uninitialized value $ref_uttid in hash element at word_align.pl line 82, <ref> line 1.
Use of uninitialized value $hyp_uttid in concatenation (.) or string at word_align.pl line 157, <ref> line 1.</ref></ref></hyp></hyp>
and the results:
Words: 51 Correct: 13 Errors: 44 Percent correct = 25.49% Error = 86.27% Accuracy = 13.73%
Insertions: 6 Deletions: 1 Substitutions: 37
TOTAL Words: 51 Correct: 13 Errors: 44
TOTAL Percent correct = 25.49% Error = 86.27% Accuracy = 13.73%
TOTAL Insertions: 6 Deletions: 1 Substitutions: 37
This is not unexpected though. As I'm needing to transcibe many audios for a single speaker, and viewing the options of needing to train or not needing to train at https://cmusphinx.github.io/wiki/tutorialam/ , am I to assume that I need to train ?
There are over 200 audios and in excess of 77 hours of duration.
Last edit: Peter 2018-03-09
Yes, you can train.
Thanks Nickolay :)
As the training I need to do is already recorded, I took a 10 minute WAV file, then ran it through webrtcvad ( https://github.com/wiseman/py-webrtcvad ) to split the audio into many small files.
Am now going through each file (some are just noise or empty so these are deleted), listening to it, then recording the transcript in a text file. As the transcripts that CMUSphinx needs for training are of a slightly different format, have used the attached code to modify the .txt files
Last edit: Peter 2018-03-12
With creating the transcripts for each file, I'm assuming that I should attempt to write a word exactly as I hear it, is that correct. Here is an example:
Spoken word --> because --> written word --> because
Spoken word --> 'cause --> written word --> 'cause
The example was the word 'because' spoken two different ways. I'm assuming that it's okay to put minor puntuation in the transcripts ??