Evaluating Chinese ASR

Speech Recognition Toolkit

Brought to you by: air, arthchan2003, awb, bhiksha, and 5 others

This project can now be found here.

Evaluating Chinese ASR

Forum: Speech Recognition Theory

Creator: dovark

Created: 2012-10-26

Updated: 2012-10-28

dovark - 2012-10-26

I am going through some ASR lit (e.g The 2009 IBM GALE Mandarin broadcast transcription system http://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=5495639&url=http%3A%2F%2Fieeexplore.ieee.org%2Fxpls%2Fabs_all.jsp%3Farnumber%3D5495639) for Mandarin language. Sometimes they give performance in terms of CER (Character Error Rate) sometimes WER (Word Error Rate). What is the difference between these two? Can one compare Chinese CER with English WER?

Secondly, one can also compute syllable error rate for Chinese I guess, given that there are only ~400 syllables in it (as opposed to ~15800 in English http://semarch.linguistics.fas.nyu.edu/barker/Syllables/index.txt). Which of these three WER, CER and SER is the most proper way of evaluating Chinese ASR?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Nickolay V. Shmyrev - 2012-10-26

Hi

Most Chinese texts do not have word bounaries and words are usually just one or two characters. So the systems are just trained on a character streams and the error rate is evaluated as CER since reference is also not split on words. And word split is not a trivial task itself.

CER is not exactly the same value as WER but it's used in most evaluations so this is just a common practice.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

dovark - 2012-10-27

Thanks. Would SER (syllable error rate) be a better/worse statistic than CER for continuous speech recognition? Or is SER equivalent to phoneme-error-rate in English?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Nickolay V. Shmyrev - 2012-10-28

Or is SER equivalent to phoneme-error-rate in English?

I think SER is not directly equivalent to Manadarin CER because of number of entries in language model and different probability distributions. We can not compare 60k words or symbols encoding words in a language to 1000 most common English syllables given they have very different distribution patterns.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Log in to post a comment.