Menu

Phoneme or Visemes recognition for animation

Artur
2011-12-26
2012-09-22
  • Artur

    Artur - 2011-12-26

    Hello, All

    My name is Artur, I am currently working on my PhD in Computer Science at
    University of Louisville. My interests include application of data mining
    methods in social networking and viral marketing. I am very new to Speech
    Recognition area and Sphinx particular, so excuse me if I ask too dummy
    questions :).

    Let me first explained what I trying to do. I trying to develop an application
    where an user speaks into the microphone and a cartoon character repeats the
    words after (preferable in real time). So, it is basically a lips-sync
    application.

    Searching through this forum, its became clear to me that building a phoneme
    recognition system is a nontrivial task. But for my application I don`t even
    need the phonemes, I need only visemes (facial expression). Currently, I have
    only 18 visemes (O, OO, R, FV, S, SH, EE, TH, L,... ) which I think should be
    enough.

    Could you please, give me an advice: What is the best way to use Sphinx?
    1. Train it with 39 phonemes and map them to 18 visemes
    2. Train it with 18 visemes.
    3. Use Sphinx as its, recognize words and map them to visemes.
    4. Something else?

    I also have requirements for the application
    1. Speaker independence
    2. Continues speech
    3. Noise Level adaptation
    4. Mobile platform, if possible

    And one more: Speed is more important than Accuracy.

    Thank you!

     
  • Nickolay V. Shmyrev

    Hello

    According to this paper:

    Comparision Of Phoneme And Viseme Based Acoustic Units fFor Speech Driven
    Realistic Lip Animation by Bozkurt et all
    http://staff.eng.bahcesehir.edu.tr/~cigdemeroglu/papers/international_confere
    nce_papers/C_07_3DTV_phoneme.pdf

    Viseme is only slightly better but still it's worth to use it because of
    theoretical considerations. The less parameters to train you have the better.

    I also recommend you to check the papers

    Real-time language independent lip synchronization method using a genetic
    algorithm by Goranka Zoric and Igor S. Pandzic
    http://www.fer.unizg.hr/images/50009013/sp06.pdf

    and

    Real-Time Continuous Phoneme Recognition System Using Class-Dependent Tied-
    Mixture HMM With HBT Structure for Speech-Driven Lip-Sync by Junho Park and
    Hanseok Ko

     
  • Artur

    Artur - 2011-12-28

    Thank you for such fast reply. I'll look into those papers.

     

Log in to post a comment.