Menu

Broadcast News Transcription System

2009-07-27
2012-09-22
  • Elisa Todarello

    Elisa Todarello - 2009-07-27

    I've recently read some articles (such as http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.50.3132) in a transcription system which was developer at CMU in 1996-1997. I was curious to know what happened to that system. Was it abandoned or has it been used in other projects? Is it available for download?
    I am currently trying to transcribe italian broadcast news and I'd like to know if trying to set up such a system is worth the effort.

    Thanks.

     
    • Nickolay V. Shmyrev

      > I was curious to know what happened to that system. Was it abandoned or has it been used in other projects?

      Most parts are available I suppose (sphinx3 engine, srilm for rescoring, mllr adaptation)

      > I'd like to know if trying to set up such a system is worth the effort.

      Well, if you have several years to develop the replacement, it's not worth trying to setup this one of course.

       
    • Elisa Todarello

      Elisa Todarello - 2009-07-27

      What I meant is: is this the only way to get decent results in transcribing broadcast news? What I've done up until now is simply building one acoustic model from the italian broadcast news, and I was wandering if I can ever get to a good result using the plain Sphinx-4.
      But I guess that if someone worked to set up this elaborated system, there is a good reason.
      What do you advise?

       
      • Nickolay V. Shmyrev

        > is this the only way to get decent results in transcribing broadcast news?

        Certainly not

        > I was wandering if I can ever get to a good result using the plain Sphinx-4.

        It depends on what do you mean by "good"

        > What do you advise?

        Sorry, I don't get your problem yet

         
    • Elisa Todarello

      Elisa Todarello - 2009-07-28

      what i need to do is use Sphinx-4 to transcribe italian broadcast news. So far, i have an acoustic model with 10 hours of audio from broadcast news. When I run the batch test on the training corpus, around WER=5% and on a test corpus WER=50%. So far so good.
      Then, I've used the very same model with the demo Transcriber on the same test corpus, but this time I used one audio file of about half an hour (the same one I've split for the batch test) and I get WER=70%. I guess this is due to the fact that the audio file is not cut into pieces.
      Now, my questions are
      1) Is this worsening normal?
      2) If it is, what results can I hope to get with Trasncriber? In the future, with a better AM,I'd like to get something around WER=20%. Is this even possible with Transcriber used on broadcast news?
      This is the reason why I was asking about the hub4 transcribing system, because it looks like its approach is the most common to transcribe broadcast news. I didn't find any paper in which someone says he's transcribing broadcast news without setting up a system like that.
      3) what improvements in terms of WER can i get from a system like the hub4 with respect to using the Transcriber demo on audio files of about half an hour?
      Thanks, I hope my explanation was clear this time.

       
      • Nickolay V. Shmyrev

        > Is this worsening normal?

        No, it looks like you made a mistake somewhere. What is the result of sphinx3 test as in tutorial? Did you enable MLLT?

        > If it is, what results can I hope to get with Trasncriber? In the future, with a better AM,I'd like to get something around WER=20%.

        It depends on many things, but on very clean audio of a medium vocabulary (10k words) it looks feasible. For generic conditions something like 30% look more realistic.

        > what improvements in terms of WER can i get from a system like the hub4 with respect to using the Transcriber demo on audio files of about half an hour?

        You mean what can you get if you'll implement the things described in the article in the first post? The same relative improvement in WER as described there. For example MLLR should give relative improvement around 20. So instead of 30%WER you'll have 24% WER.

         
        • Elisa Todarello

          Elisa Todarello - 2009-07-28

          >No, it looks like you made a mistake somewhere. What is the result of sphinx3 test as in tutorial? Did you enable MLLT?

          I didn't run the sphinx3 test because i'm only using sphinx4. i didn't enable MLLT. you think this is due to some mistakes during training? what can I post to help understand?

           
          • Nickolay V. Shmyrev

            > I didn't run the sphinx3 test because i'm only using sphinx4.

            Please try. Tutorial suggest it because it helps to detect mistakes

            > i didn't enable MLLT.

            It's better to enable it, it gives significant improvement in accuracy

            > you think this is due to some mistakes during training?

            Probably so, probably your language model is not optimal. Anyhow, if your vocabulary is around 5000 words, the WER should be 20%, not 50%

            > what can I post to help understand?

            You can give me access to your training db, post it somewhere and give a link or something else.

             
            • Elisa Todarello

              Elisa Todarello - 2009-07-29

              Here's a link to a sample from my training db
              http://www.gigasize.com/get.php?d=xv23hlfo2tc

              as soon as i get results from the test from the tutorial i will post them

               
              • Nickolay V. Shmyrev

                Thanks, but it would be nice to look on the whole training folder, this data obviously doesn't tell me anything. You don't need to use all 5 hours. Just prepare the model with the data you already shared, lets compare and tune the results on this small db.

                 
                • Elisa Todarello

                  Elisa Todarello - 2009-07-30

                  I'm sorry but my office shuts down today so I don't have time to build the model and I will not have acces to the data until september.
                  I will write again when I'll be back to work.
                  Thanks a lot for helping me!

                   
    • Elisa Todarello

      Elisa Todarello - 2009-07-28

      And thanks for your other answers, they're very helpful.

       

Log in to post a comment.