CMU Sphinx / Forums / Help: STT Cloud Service

Anand - 2019-04-30

Hello

I am currently using IBM Waton and Google Speech and i looking to develop own STT service specific to address my domain / customers. Main reason is to cut the cost and long run create a domain specific STT.

Again I am looking to build a STT cloud service specific to my domain , which can be used using a Desktop Application ( Windows, Linux and Mac) and Mobile App ( Android and iOS).

I require Continuous Speech recognition of conversations with speaker identification, time stamp , auto grammer correction , Smart formatting, etc

I been collecting large volume of training data ( Audio files and actual Transcripts) .

Question 1
Do i need Sphinx4 or Pocketsphinx to install in the server ? for requesting from Mobile and Desktop app.

Question 2
If i need to use any one.
If i am training AM and LM , will be used for both Sphinx4 and Pocketsphinx?

Question 3
Is it possible to handle Continuous Speech recognition of conversations with speaker identification, time stamp , auto grammer correction , Smart formatting, etc in Sphinx4 and Pocketsphinx

Question 4
Is it possible to get error rate lesser than IBM or Google using Sphinx?

Can somebody help me by answering the above?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2019-04-30
  
  It is not possible, you want a top company technology for free.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Anand - 2019-05-01
    
    Thanks Nickolay V.Shmyrev for your reply
    
    So based on your answer CMU Sphinx cant be developed to match Google or IBM Watson.
    
    I didnt want to use the top company technology because of the data security and also cost. We process more than 8000 hours of audio per day, that will cost a lot if I go with these companies.
    
    Thanks why i was looking for alternative.
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
    - Nickolay V. Shmyrev - 2019-05-01
      
      Then cut some of your requirements - google-like accuracy is not that important probably. Speaker identification is also not important. Then engine like Kaldi should give you good results.
      
      Cost-wise yes, it you want to process on 8000 hours per day, you can save a lot with a custom engine.
      
      If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Jake - 2019-04-30

Q1 & Q2: No, you don't have to use Sphinx. For the large training data you have collected, you may try other toolkits with deep learning methods.

Q3: You are talking about a few different speech tasks other than STT. It is possible, but you may have to collect and annotate speech data differently for these tasks, and hire experienced speech scientists/engineers to work on these projects.

Q4: Since Google or IBM's speech models are built on open domain, so it is possible to build/optimize your domain-specific STT system with an equal or even better performance.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anand - 2019-05-01
  
  Thanks Jake for your reply
  
  You mentioned
  Q1 & Q2: No, you don't have to use Sphinx. For the large training data you have collected, you may try other toolkits with deep learning methods.
  
  Why one do you suggest?
  
  Q3: You are talking about a few different speech tasks other than STT. It is possible, but you may have to collect and annotate speech data differently for these tasks, and hire experienced speech scientists/engineers to work on these projects.
  
  I am talking about similar features of IBM Watson STT. We been using IBM STT for more than a year now and we feel cost wise its not going to be right way to go.
  I am planning to hire few.
  Currently i am checking which is the right technology to use.
  
  Q4: Since Google or IBM's speech models are built on open domain, so it is possible to build/optimize your domain-specific STT system with an equal or even better performance.
  
  Thanks
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Jake - 2019-05-01
    
    As Nickolay suggested, Kaldi should help you build a good engine. If you need help, please email me direclty.
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

STT Cloud Service

Speech Recognition Toolkit

Forums

Help

STT Cloud Service document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

STT Cloud Service