I know for English hub4wsj_sc_8k can achieve high recognition accuracy. But for Chinese mandarin I have no idea which is the best one to use.
I found among all branches that we have below candidates,
1. tdt_sc_8k in the ps code branch: pocketsphinx\model\hmm\zh\tdt_sc_8k
2. mandarin_ptm3_notone_3s_8k.cd_ptm_5000 under /pocketsphinx-extra/model/hmm/zh
3. mandarin_sc3_notone_3s_8k.cd_semi_5000 under /pocketsphinx-extra/model/hmm/zh
and a potential 4th candidate
4. convert continuous model zh_broadcastnews_16k_ptm256_8000 under files/Acoustic and Language Models/Mandarin Broadcast News acoustic models/ to semi-continuous one.
For the 4th one, I'm not sure whether that is feasible. If it is, then is there a tool to do the conversion?
So for the above mentioned candidates, which one can achieve the best accuracy for ps?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
know for English hub4wsj_sc_8k can achieve high recognition accuracy
en-us semi is significantly better than hub4wsj
If it is, then is there a tool to do the conversion?
No
So for the above mentioned candidates, which one can achieve the best accuracy for ps?
You can easily test it on a test database, however, I think that good Mandarin database will require training from scratch from enough amount of the data. We do not have good Mandarin model now.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I know for English hub4wsj_sc_8k can achieve high recognition accuracy. But for Chinese mandarin I have no idea which is the best one to use.
I found among all branches that we have below candidates,
1. tdt_sc_8k in the ps code branch: pocketsphinx\model\hmm\zh\tdt_sc_8k
2. mandarin_ptm3_notone_3s_8k.cd_ptm_5000 under /pocketsphinx-extra/model/hmm/zh
3. mandarin_sc3_notone_3s_8k.cd_semi_5000 under /pocketsphinx-extra/model/hmm/zh
and a potential 4th candidate
4. convert continuous model zh_broadcastnews_16k_ptm256_8000 under files/Acoustic and Language Models/Mandarin Broadcast News acoustic models/ to semi-continuous one.
For the 4th one, I'm not sure whether that is feasible. If it is, then is there a tool to do the conversion?
So for the above mentioned candidates, which one can achieve the best accuracy for ps?
en-us semi is significantly better than hub4wsj
No
You can easily test it on a test database, however, I think that good Mandarin database will require training from scratch from enough amount of the data. We do not have good Mandarin model now.
unfortunately I don't have such a test database.
Can you roughly deduce from the size of training corpus which one might be the best?
You need to record it, it's actually the first thing to do when you start ASR system development
http://cmusphinx.sourceforge.net/wiki/tutorialam
en-us database is trained on 300 hours of speech.