Menu

Simple Artificial Language - What Steps?

Help
2016-09-26
2016-09-27
  • Nikolay Panayotov

    Hello all!
    I am involved in an interesting research game project using a simple artificial language. We are looking into the possibility of using speech recognition.

    I am new to speech processing and have picked up CMU Sphinx's website and 'Speech and Language Processing' by Daniel Jurafsky and James Martin as a starting off point into the theory. I was able to run the Sphinx4 tutorial (http://cmusphinx.sourceforge.net/wiki/tutorialsphinx4) in Eclipse IDE. Although that seems to work only with short sound files (around 1 second long);otherwise it causes out of memory errors for the Java heap space (some of the provided samples cause similar errors as well). Anyway, right now I am stuck on where to go from here and whether Sphinx is the right choise. I wanted to explain our project's goals to you and hopefully get some pointers as to where to go from here.

    We are going to create a simplistic artificial language. This 'language' will actually only be a list of short nonsense words based on British English phonotactics: "mada" "muda" "taba" "pib" "peb" etc. I highlight 'British' because our users (or the speakers of our artificial language) will be predominantly (but not exclusively) Scottish nationals (of all ages - children to adults).
    Our software will simply prompt the user at certain moments to say something (a single word) and will recognize the words from the artificial language if the user speaks them. That is all. It might also be worth mentioning that this will be part of an online web-based game (so presumably CMU Sphinx will run on a server). The web game would be accessible through PCs and mobiles.

    Reading through Sphinx's website it seemed that adapting the provided English model to recognize the new artificial words as new commands is the way to go. But since I am so new to this technology I am not sure at all.

    Questions:
    1. Is Sphinx4 right for this project?
    2. Is it feasible or too much effort? Do we need lots of people/time to achieve this?
    3. Is pocketsphinx more appropriate as speed might be an issue (and seeing as I already have Java heap problems)?
    4. What strategy would you recommend to tackle this task? A rough step by step would be great!

    I am ready to learn! You don't have to explain each step in detail, but please point me to what areas I need to read up on in order to perform each step. Recommended resources or alternatives would be greatly appreciated!

    Thank you for your help!
    Nikolay

     
    • Nickolay V. Shmyrev

      Our software will simply prompt the user at certain moments to say something (a single word) and will recognize the words from the artificial language if the user speaks them. That is all.

      Thats not all at all. If users says something different you have to detect it as well. So you need a good cofidence estimation, probably noise reduction and so on.

      The web game would be accessible through PCs and mobiles.

      PC is a worst choice because of variety of crappy microphones. Mobile microphones are much better usually, but mobile data has other specifics.

      1. Is Sphinx4 right for this project?

      It is better to use Kaldi with DNN models

      1. Is it feasible or too much effort? Do we need lots of people/time to achieve this?

      If you want to recognize children reliably it is half a year project at least. You will have to collect a lot of data from children for reliable models.

      1. What strategy would you recommend to tackle this task? A rough step by step would be great!

      First build a test set to estimate speech recognition accuracy, then understand sources of data collection - crowdsourcing, etc. Build initial models and deploy test system. Continue data collection and improve your models as time goes. Analyze accuracy problems to solve specific issues.

       
  • Nikolay Panayotov

    Hi, Nickolay! Thank you so much for your swift and insightful reply!

    So is your strategy assuming the 'adapting the existing CMU Sphinx English model' approach or is it better to create a new model with Kaldi?

    As this is a one-year project, half a year might be too much. Perhaps speech recognition is out of reach? I would like to reiterate on the concept just to be clear. Our 'artificial language' (which is actually just a list of words 2-3 syllables long based on English phonotactics) will contain no more than 100 words. And yes, as you pointed out, we are concerned with whether the user says a word from our list OR something else (however, what I meant to say is to keep it simple we don't really care exactly what else as long as we know it does not confidently match any of our words - if that makes sense?). As I am aware of my limitations as a complete beginner in this field I try to keep everything as simplistic as possible. However, do you reckon it might be too big a task for a novice to get to grips with it?

    And finally, could you recommend any further reading besides CMU Sphinx and Kaldi's websites? I feel like those sources already assume a significant degree of familiarity with the overall procedures.

    Thanks for the help!

     

    Last edit: Nikolay Panayotov 2016-09-27

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.