I am interested in experimenting distributed recognition. As far as I understood it means that recording and feature extraction are performed on a client and sent to a server to do the actual recognition.
Have you heard about any free API doing it? Is there any standard for this kind of systems? Is it be part of a VoIP standard? Any known experiment with HTK or Sphinx? Any pointer would be greatly appreciated!
Cheers,
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Yes, Motorola and IBM did a bunch of work on this around the turn of the century. There are some ETSI standards that were published. Search for "ES 202 050", "ES 202 211", and "ES 202 212" on http://pda.etsi.org/pda/
> Have you heard about any free API doing it?
Yep. You could try to use sphinx4 together with the cajo library. This gives a you a free distributed speech recognition system with minimal efforts. The crucial point probably the selection of the proper interface: I would suggest to do the feature extraction on client side, and to use cajo to distribute the feature vectors only.
Cheers,
Holger
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I am very interested in trying out your proposal to distribute Sphinx 4 into a feature extraction module on client and decoding on server.
As far as I understand, CAJO would do multiple remote calls, for each frame in current sphinx 4 design. It would be very heavy in bandwidth because of CAJO additional information to complete a remote call, right? We could also put all frames into a big data object, but the server would have to wait for the end to start decoding.
We could also send each frame by TCP or UDP socket. What do you think would be the advantage of using CAJO?
Thanks a lot for your help.
Sylvain
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi Holger,
Thank you very much,in fact I want to use sphinx3,and I do not know very much about java,is there any library written with c++ or c language?and I will try cajo first,you help me a lot.
BestWishes
Chris
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi all,
I am interested in experimenting distributed recognition. As far as I understood it means that recording and feature extraction are performed on a client and sent to a server to do the actual recognition.
Have you heard about any free API doing it? Is there any standard for this kind of systems? Is it be part of a VoIP standard? Any known experiment with HTK or Sphinx? Any pointer would be greatly appreciated!
Cheers,
Yes, Motorola and IBM did a bunch of work on this around the turn of the century. There are some ETSI standards that were published. Search for "ES 202 050", "ES 202 211", and "ES 202 212" on http://pda.etsi.org/pda/
With respect to VoIP in particular, there is an RFC: http://www.rfc-editor.org/rfc/rfc4060.txt
See also RFC 3557, which I think says mostly the same stuff as RFC 4060.
Sphinx doesn't implement this stuff. It should not be difficult to do so based on the standards, though.
Oh my God,very very appreciate,you helps me a lot and I will try later.I like sphinx group,you are so kind David!
Hi Chris,
> Have you heard about any free API doing it?
Yep. You could try to use sphinx4 together with the cajo library. This gives a you a free distributed speech recognition system with minimal efforts. The crucial point probably the selection of the proper interface: I would suggest to do the feature extraction on client side, and to use cajo to distribute the feature vectors only.
Cheers,
Holger
Hi Holger,
I am very interested in trying out your proposal to distribute Sphinx 4 into a feature extraction module on client and decoding on server.
As far as I understand, CAJO would do multiple remote calls, for each frame in current sphinx 4 design. It would be very heavy in bandwidth because of CAJO additional information to complete a remote call, right? We could also put all frames into a big data object, but the server would have to wait for the end to start decoding.
We could also send each frame by TCP or UDP socket. What do you think would be the advantage of using CAJO?
Thanks a lot for your help.
Sylvain
Hi Holger,
Thank you very much,in fact I want to use sphinx3,and I do not know very much about java,is there any library written with c++ or c language?and I will try cajo first,you help me a lot.
BestWishes
Chris