Menu

Accuracy expectations w/ unrestricted grammar

Help
2011-02-27
2012-09-22
  • Justin Beckwith

    Justin Beckwith - 2011-02-27

    Greetings! I am attempting to write an application that will perform
    transcription of various videos. Since the videos could come from anywhere, I
    am using the HUB4 model that pocketsphinx seems to use by default, and an
    unrestricted grammar. Just to get things started, I am using the
    pocketsphinx_continuous app and the -infile option on the latest builds to get
    an idea of what accuracy I can get. With fairly high quality audio, in PCM 16
    khz mono, I am getting an accuracy less than 60%.

    This leaves me with a few questions:
    - What type of accuracy can I expect in this scenario?
    - How can I improve the accuracy (assuming unrestricted grammar and untrained model)
    - Is pocketsphinx or sphinx4 better suited for this use case
    - Is there a model that may be better suited for my use case

    Any help would be appreciated. Thanks!

     
  • Nickolay V. Shmyrev

    Hello

    and an unrestricted grammar.

    Decoding is always restricted somehow, for example there is language model or
    a grammar. To learn more about the way CMUSphinx decoders work please read the
    tutorial

    http://cmusphinx.sourceforge.net/wiki/tutorial

    Default language model used by pocketsphinx is not quite good (hub4.5000.DMP),
    it has just 5000 words. It makes sense to build your own language model from
    existing video transcripts, for example from closed captions.

    Default acoustic model of pocketsphinx is quite good, it makes sense to try it
    instead of hub4.

    • What type of accuracy can I expect in this scenario?

    Accuracy depends on many factors including for example the decoder used to
    extract audio track. Overall, with default models 60% is the expected
    accuracy. Further improvements require you to adapt the models and to
    implement custom components.

    • How can I improve the accuracy (assuming unrestricted grammar and
      untrained model)

    I suggest you to setup a prototype then to improve componets one by one. Try
    to build better language model, adapt an acoustic model, implement
    imrpovements. For example video decoding often requires specialized music
    filtering component.

    • Is pocketsphinx or sphinx4 better suited for this use case

    For server-based applications it's better to use sphinx4. For more details see
    http://cmusphinx.sourceforge.net/wiki/versions

     

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.