CMU Sphinx / Forums / Help: Aligning phonemes using transcript?

Speech Recognition Toolkit

Aligning phonemes using transcript?

Forum: Help

Created: 2018-02-01

Updated: 2018-02-04

I have a WAV file with a transcript. I am trying to get phonemes with timestamps from this file, but the results aren't completely accurate. The transcript I have is 100% accurate, but I don't know how to use it for phoneme alignment.

I tried playing around with -keyphrase, -kws and -kws_threshold, but it doesn't appear to be using the transcript at all. It seems the transcript is only used if I don't search for phonemes by disabling -allphone. I really need the phonemes, not just words.

I can convert the script to phonemes if necessary. Thanks for any help!

Here is the code:

const fs = require("fs");
const ps = require("pocketsphinx").ps;

const model = "source/pocketsphinx-5prealpha/model/en-us/";

const config = new ps.Decoder.defaultConfig();
config.setString("-hmm", model + "en-us");
config.setString("-dict", model + "cmudict-en-us.dict");
config.setString("-allphone", model + "en-us-phone.lm.bin");

config.setString("-keyphrase", "this is a test");
config.setFloat("-kws_threshold", 1e-50);

const decoder = new ps.Decoder(config);

fs.readFile("test.wav", function(error, data) {
    if (error) throw error;
    decoder.startUtt();
    decoder.processRaw(data, false, false);
    decoder.endUtt();
    console.log(decoder.hyp());
    const iter = decoder.seg().iter();
    let seg = iter.next();
    while ((seg = iter.next()) !== null) {
        console.log(seg.word, seg.startFrame, seg.endFrame)
    }
});

Last edit: MysteryPancake 2018-02-01

MysteryPancake - 2018-02-04

I decided to post this on Stack Overflow as well, in case I can get an answer there.

Last edit: MysteryPancake 2018-02-04

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Aligning phonemes using transcript?

Speech Recognition Toolkit

Forums

Help

Aligning phonemes using transcript? document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

Aligning phonemes using transcript?