Menu

Non-English language models for Pocketsphinx

Help
Halle
2010-11-11
2016-05-25
  • Halle

    Halle - 2010-11-11

    Hi Nickolay,

    Here's a simple one. I've gotten requests to support the following
    accents/languages with OpenEars:

    UK English
    French
    Spanish
    German

    I've just spent a half hour Googling and searching this site but I'm missing
    it somehow and can't quite find anything but the full-sized Sphinx models that
    are too big to be shipped with a device. Are there Pocketsphinx-compatible
    language models for these accents and languages, similar to the hub4wsj_sc_8k
    hmm that ships with the 0.6.1 package, and if so where? I'll keep an eye on
    whether they are licensed in a way that would make them a match for iPhone App
    development.

    Thanks very much,

    Halle

     
  • Nickolay V. Shmyrev

    Hello Halle

    There is a very easy way to build a device-compatible LM for iPhone for few
    dozen languages - build language model from Wikipedia articles. You can
    download wikipedia dump, convert it to text with

    http://medialab.di.unipi.it/wiki/Wikipedia_Extractor

    And build the lm of 4000 words!

    One should definitely automate this thing!

     
  • Halle

    Halle - 2010-11-12

    Wow! I will definitely be checking that out shortly. But I think the users
    have lm/dic data already and they want me to add new hmms, are there sources
    for those?

     
  • Nickolay V. Shmyrev

    At least for German and Spanish voxforge data is good enough.

     
  • Halle

    Halle - 2010-11-12

    Okeydoke, and my last probably-obvious question: do I have to do anything to
    the Voxforge data in order to make it work with Pocketsphinx or is it a drop-
    in?

     
  • Nickolay V. Shmyrev

    You can download data and train semi-continuous models for mobile device or
    use pretrained models, not sure if they will work for you:

    https://sourceforge.net/projects/cmusphinx/files/Acoustic and Language Models/

     
  • Halle

    Halle - 2010-11-12

    Yeah, I started out by checking out https://sourceforge.net/projects/cmusphin
    x/files/Acoustic

    and Language Models/ but I think the models there are too large in filesize to
    ship with a mobile app unfortunately.

    So, in order to get started with the process of downloading data and training
    a semi-continuous model for a mobile device, where would I look to begin the
    learning process?

     
  • Nickolay V. Shmyrev

    where would I look to begin the learning process?

    English Voxforge model has script build.sh to automate setup the process. Look
    inside the model. You can use the same script for other languages.

     
  • Halle

    Halle - 2010-11-14

    Thanks very much, I've passed that on. I don't suppose there is anything as
    easy for them to use to create small command and control grammars for other
    languages as the CMU Language Tool, is there?

     
  • Nickolay V. Shmyrev

    Hello, I'm not sure what issue do you have.

    The easiest way is to write jsgf. You can reuse the dictionary provided with
    the model.

     
  • Halle

    Halle - 2010-11-14

    That answered my question, thanks.

     
  • Swathi EP

    Swathi EP - 2010-12-27

    Hello Mr. Nickolay ,

    I want to know, how can i use pocket sphinx to detect indian english. I have
    used voxforge to upload few wav files in indian english, but i don't know how
    to use the processed data of voxforge in pocket sphinx.

    can you please tell me the way to achieve indian english recognition using
    pocketsphinx?

    Thanks,
    Swathi

     

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.