Menu

Aligning phonemes using transcript?

Help
2018-02-01
2018-02-04
  • MysteryPancake

    MysteryPancake - 2018-02-01

    I have a WAV file with a transcript. I am trying to get phonemes with timestamps from this file, but the results aren't completely accurate. The transcript I have is 100% accurate, but I don't know how to use it for phoneme alignment.

    I tried playing around with -keyphrase, -kws and -kws_threshold, but it doesn't appear to be using the transcript at all. It seems the transcript is only used if I don't search for phonemes by disabling -allphone. I really need the phonemes, not just words.

    I can convert the script to phonemes if necessary. Thanks for any help!

    Here is the code:

    const fs = require("fs");
    const ps = require("pocketsphinx").ps;
    
    const model = "source/pocketsphinx-5prealpha/model/en-us/";
    
    const config = new ps.Decoder.defaultConfig();
    config.setString("-hmm", model + "en-us");
    config.setString("-dict", model + "cmudict-en-us.dict");
    config.setString("-allphone", model + "en-us-phone.lm.bin");
    
    config.setString("-keyphrase", "this is a test");
    config.setFloat("-kws_threshold", 1e-50);
    
    const decoder = new ps.Decoder(config);
    
    fs.readFile("test.wav", function(error, data) {
        if (error) throw error;
        decoder.startUtt();
        decoder.processRaw(data, false, false);
        decoder.endUtt();
        console.log(decoder.hyp());
        const iter = decoder.seg().iter();
        let seg = iter.next();
        while ((seg = iter.next()) !== null) {
            console.log(seg.word, seg.startFrame, seg.endFrame)
        }
    });
    
     

    Last edit: MysteryPancake 2018-02-01

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.