Menu

Low accuracy on en-us (greek native speaker)

Help
Thanasis
2020-09-07
2020-09-13
  • Thanasis

    Thanasis - 2020-09-07

    Hello,
    im playing around with pocketsphinx and the adaptation (MLLR and MAP) as per your tutorials
    and have realy low accuracy on my voice using the en-us model.

    I tried to troubleshoot but i would love your opinion:)

    This is the result from ffprobe from my .wav files:
    "bitrate: 256 kb/s Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 16000 Hz, 1 channels, s16, 256 kb/s"
    and from "file" (man file, determine file type) :
    "RIFF (little-endian) data, WAVE audio, Microsoft PCM, 16 bit, mono 16000 Hz"

    I follow this scenario every time:
    "around 5 minutes of data as the adaptation input"
    "around 2 minutes of different data to test and compare recognition with and w/o the adapted model/MLLR"

    My native language is greek and when using the greek continuous model
    ( downloaded from your site and created by Fotis Pantazoglou)
    i get around 70% accuracy with the model alone, and after MLLR adaptation
    and testing, accuracy goes to ~90%. Which is satisfactory .

    Now with the en-us ptm model (included in pocketsphinx/model)
    or with the en-us continuous model (cmusphinx-cont-en-us-5.2 downloaded from your site)
    when i run the recognizer (either _ continuous or _ batch ) i get poor results (lower than 50%)
    and adapting (MLLR or MAP) does not give something higher than ~60%.
    I am wondering what are my options to better my accuracy.

    I am attaching the transcription and fileids files with the audio (.wav files of myself reading the arctic example)
    (do tell me if they got uploaded correctly)

    I am sure i got an accent and that does not help recognition,
    but i thought the adaptation could correct some of my accent
    (it does get the accuracy up by 5-10% as promised so im not complaining).
    However the recognition is still low and i dont know what to do...

    i would like to avoid training a model,
    and i would like to create something that is not based around my own voice,
    but more of a general tool to let the user record, adapt and test the adapted model
    and then work with that.

    Any help would be amazing,
    thanks in advance!

     
    • Nickolay V. Shmyrev

      pocketsphinx is very old technology, it doesn't provide enough accuracy for modern standard. You can try vosk instead https://github.com/alphacep/vosk-api with daanzu English model.

       
  • Thanasis

    Thanasis - 2020-09-07

    Ok, thank you Nickolay,
    i already started looking into it.

    I skimmed the documentation and the C++ API a bit
    and found that is pretty simpler than pocketsphinx.

    However reading the documentation blindly never helps ;)
    Do you have any suggestion on where to start?
    With sphinx there was a pretty clean and clear tutorial that helped me alot.

    ( i also feel that this convertation should be held elsewhere, should i open a new topic here?
    Or even better: where can i ask about vosk?)

     
    • Nickolay V. Shmyrev

      where can i ask about vosk?

      On github or in a telegram group.

       

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.