Hi!
I'm interested in adding appropriate punctuations to the transcribed ASR output for English. I used the post-processing framework as part of Sphinx using the Gutenberg lm model. https://cmusphinx.github.io/2012/08/postprocessing-framework/
Wondering if there's an update to this language model or this branch that I can use for better results?
When I tried this for a passage from the Gutenberg text corpus, it appears that after some initial phrases, commas are getting added ib between every word. Any idea why this might be happening or pointers to what I can do to improve the accuracy here?
Any help regarding this would be super awesome!
Thank you very much!
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi!
I'm interested in adding appropriate punctuations to the transcribed ASR output for English. I used the post-processing framework as part of Sphinx using the Gutenberg lm model.
https://cmusphinx.github.io/2012/08/postprocessing-framework/
Wondering if there's an update to this language model or this branch that I can use for better results?
When I tried this for a passage from the Gutenberg text corpus, it appears that after some initial phrases, commas are getting added ib between every word. Any idea why this might be happening or pointers to what I can do to improve the accuracy here?
Any help regarding this would be super awesome!
Thank you very much!
https://github.com/ottokart/punctuator2 is much more accurate.
Thank you very much!!