I am currently using IBM Waton and Google Speech and i looking to develop own STT service specific to address my domain / customers. Main reason is to cut the cost and long run create a domain specific STT.
Again I am looking to build a STT cloud service specific to my domain , which can be used using a Desktop Application ( Windows, Linux and Mac) and Mobile App ( Android and iOS).
I require Continuous Speech recognition of conversations with speaker identification, time stamp , auto grammer correction , Smart formatting, etc
I been collecting large volume of training data ( Audio files and actual Transcripts) .
Question 1
Do i need Sphinx4 or Pocketsphinx to install in the server ? for requesting from Mobile and Desktop app.
Question 2
If i need to use any one.
If i am training AM and LM , will be used for both Sphinx4 and Pocketsphinx?
Question 3
Is it possible to handle Continuous Speech recognition of conversations with speaker identification, time stamp , auto grammer correction , Smart formatting, etc in Sphinx4 and Pocketsphinx
Question 4
Is it possible to get error rate lesser than IBM or Google using Sphinx?
Can somebody help me by answering the above?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
So based on your answer CMU Sphinx cant be developed to match Google or IBM Watson.
I didnt want to use the top company technology because of the data security and also cost. We process more than 8000 hours of audio per day, that will cost a lot if I go with these companies.
Thanks why i was looking for alternative.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Then cut some of your requirements - google-like accuracy is not that important probably. Speaker identification is also not important. Then engine like Kaldi should give you good results.
Cost-wise yes, it you want to process on 8000 hours per day, you can save a lot with a custom engine.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Q1 & Q2: No, you don't have to use Sphinx. For the large training data you have collected, you may try other toolkits with deep learning methods.
Q3: You are talking about a few different speech tasks other than STT. It is possible, but you may have to collect and annotate speech data differently for these tasks, and hire experienced speech scientists/engineers to work on these projects.
Q4: Since Google or IBM's speech models are built on open domain, so it is possible to build/optimize your domain-specific STT system with an equal or even better performance.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
You mentioned
Q1 & Q2: No, you don't have to use Sphinx. For the large training data you have collected, you may try other toolkits with deep learning methods.
Why one do you suggest?
Q3: You are talking about a few different speech tasks other than STT. It is possible, but you may have to collect and annotate speech data differently for these tasks, and hire experienced speech scientists/engineers to work on these projects.
I am talking about similar features of IBM Watson STT. We been using IBM STT for more than a year now and we feel cost wise its not going to be right way to go.
I am planning to hire few.
Currently i am checking which is the right technology to use.
Q4: Since Google or IBM's speech models are built on open domain, so it is possible to build/optimize your domain-specific STT system with an equal or even better performance.
Thanks
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hello
I am currently using IBM Waton and Google Speech and i looking to develop own STT service specific to address my domain / customers. Main reason is to cut the cost and long run create a domain specific STT.
Again I am looking to build a STT cloud service specific to my domain , which can be used using a Desktop Application ( Windows, Linux and Mac) and Mobile App ( Android and iOS).
I require Continuous Speech recognition of conversations with speaker identification, time stamp , auto grammer correction , Smart formatting, etc
I been collecting large volume of training data ( Audio files and actual Transcripts) .
Question 1
Do i need Sphinx4 or Pocketsphinx to install in the server ? for requesting from Mobile and Desktop app.
Question 2
If i need to use any one.
If i am training AM and LM , will be used for both Sphinx4 and Pocketsphinx?
Question 3
Is it possible to handle Continuous Speech recognition of conversations with speaker identification, time stamp , auto grammer correction , Smart formatting, etc in Sphinx4 and Pocketsphinx
Question 4
Is it possible to get error rate lesser than IBM or Google using Sphinx?
Can somebody help me by answering the above?
It is not possible, you want a top company technology for free.
Thanks Nickolay V.Shmyrev for your reply
So based on your answer CMU Sphinx cant be developed to match Google or IBM Watson.
I didnt want to use the top company technology because of the data security and also cost. We process more than 8000 hours of audio per day, that will cost a lot if I go with these companies.
Thanks why i was looking for alternative.
Then cut some of your requirements - google-like accuracy is not that important probably. Speaker identification is also not important. Then engine like Kaldi should give you good results.
Cost-wise yes, it you want to process on 8000 hours per day, you can save a lot with a custom engine.
Q1 & Q2: No, you don't have to use Sphinx. For the large training data you have collected, you may try other toolkits with deep learning methods.
Q3: You are talking about a few different speech tasks other than STT. It is possible, but you may have to collect and annotate speech data differently for these tasks, and hire experienced speech scientists/engineers to work on these projects.
Q4: Since Google or IBM's speech models are built on open domain, so it is possible to build/optimize your domain-specific STT system with an equal or even better performance.
Thanks Jake for your reply
You mentioned
Q1 & Q2: No, you don't have to use Sphinx. For the large training data you have collected, you may try other toolkits with deep learning methods.
Why one do you suggest?
Q3: You are talking about a few different speech tasks other than STT. It is possible, but you may have to collect and annotate speech data differently for these tasks, and hire experienced speech scientists/engineers to work on these projects.
I am talking about similar features of IBM Watson STT. We been using IBM STT for more than a year now and we feel cost wise its not going to be right way to go.
I am planning to hire few.
Currently i am checking which is the right technology to use.
Q4: Since Google or IBM's speech models are built on open domain, so it is possible to build/optimize your domain-specific STT system with an equal or even better performance.
Thanks
As Nickolay suggested, Kaldi should help you build a good engine. If you need help, please email me direclty.