Which mandarin acoustic model to use for pocketsphinx?

Speech Recognition Toolkit

Brought to you by: air, arthchan2003, awb, bhiksha, and 5 others

This project can now be found here.

Which mandarin acoustic model to use for pocketsphinx?

Forum: Help

Creator: puluzhe

Created: 2014-03-17

Updated: 2014-03-18

puluzhe - 2014-03-17

I know for English hub4wsj_sc_8k can achieve high recognition accuracy. But for Chinese mandarin I have no idea which is the best one to use.

I found among all branches that we have below candidates,
1. tdt_sc_8k in the ps code branch: pocketsphinx\model\hmm\zh\tdt_sc_8k
2. mandarin_ptm3_notone_3s_8k.cd_ptm_5000 under /pocketsphinx-extra/model/hmm/zh
3. mandarin_sc3_notone_3s_8k.cd_semi_5000 under /pocketsphinx-extra/model/hmm/zh
and a potential 4th candidate
4. convert continuous model zh_broadcastnews_16k_ptm256_8000 under files/Acoustic and Language Models/Mandarin Broadcast News acoustic models/ to semi-continuous one.

For the 4th one, I'm not sure whether that is feasible. If it is, then is there a tool to do the conversion?

So for the above mentioned candidates, which one can achieve the best accuracy for ps?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Nickolay V. Shmyrev - 2014-03-17

know for English hub4wsj_sc_8k can achieve high recognition accuracy

en-us semi is significantly better than hub4wsj

If it is, then is there a tool to do the conversion?

No

So for the above mentioned candidates, which one can achieve the best accuracy for ps?

You can easily test it on a test database, however, I think that good Mandarin database will require training from scratch from enough amount of the data. We do not have good Mandarin model now.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

puluzhe - 2014-03-18

unfortunately I don't have such a test database.
Can you roughly deduce from the size of training corpus which one might be the best?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Nickolay V. Shmyrev - 2014-03-18

unfortunately I don't have such a test database.

You need to record it, it's actually the first thing to do when you start ASR system development

Can you roughly deduce from the size of training corpus which one might be the best?

http://cmusphinx.sourceforge.net/wiki/tutorialam

en-us database is trained on 300 hours of speech.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Log in to post a comment.