I have a WAV file with a transcript. I am trying to get phonemes with timestamps from this file, but the results aren't completely accurate. The transcript I have is 100% accurate, but I don't know how to use it for phoneme alignment.
I tried playing around with -keyphrase, -kws and -kws_threshold, but it doesn't appear to be using the transcript at all. It seems the transcript is only used if I don't search for phonemes by disabling -allphone. I really need the phonemes, not just words.
I can convert the script to phonemes if necessary. Thanks for any help!
Here is the code:
const fs = require("fs");
const ps = require("pocketsphinx").ps;
const model = "source/pocketsphinx-5prealpha/model/en-us/";
const config = new ps.Decoder.defaultConfig();
config.setString("-hmm", model + "en-us");
config.setString("-dict", model + "cmudict-en-us.dict");
config.setString("-allphone", model + "en-us-phone.lm.bin");
config.setString("-keyphrase", "this is a test");
config.setFloat("-kws_threshold", 1e-50);
const decoder = new ps.Decoder(config);
fs.readFile("test.wav", function(error, data) {
if (error) throw error;
decoder.startUtt();
decoder.processRaw(data, false, false);
decoder.endUtt();
console.log(decoder.hyp());
const iter = decoder.seg().iter();
let seg = iter.next();
while ((seg = iter.next()) !== null) {
console.log(seg.word, seg.startFrame, seg.endFrame)
}
});
Last edit: MysteryPancake 2018-02-01
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I have a WAV file with a transcript. I am trying to get phonemes with timestamps from this file, but the results aren't completely accurate. The transcript I have is 100% accurate, but I don't know how to use it for phoneme alignment.
I tried playing around with
-keyphrase
,-kws
and-kws_threshold
, but it doesn't appear to be using the transcript at all. It seems the transcript is only used if I don't search for phonemes by disabling-allphone
. I really need the phonemes, not just words.I can convert the script to phonemes if necessary. Thanks for any help!
Here is the code:
Last edit: MysteryPancake 2018-02-01
I decided to post this on Stack Overflow as well, in case I can get an answer there.
Last edit: MysteryPancake 2018-02-04