I am working on large data for speech recognition and I'd like to have both speed and accuracy. I have a server currently running pocketsphinx running on large data language model. I have an android app which records audio and sends it to server for recognition, and the output is displayed on phone. But this is slow in limited internet connections since I am sending the whole audio file.
What i'd instead like to do is to use pocketsphinx on android (or part of it) to extract mfcc and send mfcc instead of the whole audio file to server.
How should I go about this? Does pocketsphinx accept MFCC as input and give recognized text as output?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
MFCCs in incompressed form do not take much less space than compressed audio. You either need to compress MFCCs with clever algorithm or transfer compressed audio. First is not common and might require research, the disadvantage is also that you can't change the feature extraction method later.
You can transfer opus-encoded audio and extract MFCC on the server. It is easier to implement and will use even less bandwidth than MFCC.
In case you still want to transfer MFCCs, it's perfectly possible, all the calls are present in API, you just need to start writing code.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I am working on large data for speech recognition and I'd like to have both speed and accuracy. I have a server currently running pocketsphinx running on large data language model. I have an android app which records audio and sends it to server for recognition, and the output is displayed on phone. But this is slow in limited internet connections since I am sending the whole audio file.
What i'd instead like to do is to use pocketsphinx on android (or part of it) to extract mfcc and send mfcc instead of the whole audio file to server.
How should I go about this? Does pocketsphinx accept MFCC as input and give recognized text as output?
MFCCs in incompressed form do not take much less space than compressed audio. You either need to compress MFCCs with clever algorithm or transfer compressed audio. First is not common and might require research, the disadvantage is also that you can't change the feature extraction method later.
You can transfer opus-encoded audio and extract MFCC on the server. It is easier to implement and will use even less bandwidth than MFCC.
In case you still want to transfer MFCCs, it's perfectly possible, all the calls are present in API, you just need to start writing code.