Menu

Chinese models trained on a huge amount of acoustic data

Help
yeelearn
2018-09-17
2018-09-18
  • yeelearn

    yeelearn - 2018-09-17

    I want to know the data set used by the Chinese(mandarin) model provided by CMU.

    When I want to train my Chinese model, I find that the recognition rate is very low.
    I suspect that the data set is of poor quality or training method problem.

     
    • Nickolay V. Shmyrev

      I want to know the data set used by the Chinese(mandarin) model provided by CMU.

      https://catalog.ldc.upenn.edu/LDC98S73

      When I want to train my Chinese model, I find that the recognition rate is very low.
      I suspect that the data set is of poor quality or training method problem.

      If you use AISHELL http://www.openslr.org/33/ or AISHELL2, the accuracy should be much better.

       
      • yeelearn

        yeelearn - 2018-09-17

        thanks

        I want to know what your noise file is, such as +LAUGH+ +LIPSMACK+ +COUGH+ ...

        would you mind sharing “sphinx_train.cfg” file with me, so that I can find different places with my files.

         
        • Nickolay V. Shmyrev

          I want to know what your noise file is, such as +LAUGH+ +LIPSMACK+ +COUGH+ ...

          Noisedict inside model contains:

          <s>         SIL
          </s>        SIL
          <sil>       SIL
          ++laugh++   +LAUGH+
          ++lipsmack++    +LIPSMACK+
          ++cough++   +COUGH+
          ++breath++  +BREATHE+
          ++incomplete++  +GARBAGE+
          

          would you mind sharing “sphinx_train.cfg” file with me, so that I can find different places with my files.

          The model was trained long time ago, so the configuration file is lost. You can use default one, it should give you same results.

           
          • yeelearn

            yeelearn - 2018-09-18

            Ok, let me try again.
            Thank you very much.

             

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.