I am treasurer in a small association. Twice a year we organise a big sale, and I would like to make our life easier during these sales by using Sphinx. However, as I have no experience with speech recognition, I would like to ask first if it would be feasible at all.
We have to type in a large amount of numbers. I would like to write a small program and use sphinx to get these number into the computer. The requests are:
- speech input by microphone (I guess they have to be rather good ones). We can speak a little bit slower than normal, but not too slow. There is always some surrounding noise.
- error rate of about 4 words from 1000
Can I achieve this? And could I increase accuracy or speed by restricting the vocabulary to the numbers 0-20 and 5 additional commands?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
speech input by microphone (I guess they have to be rather good ones). We can speak a little bit slower than normal, but not too slow. There is always some surrounding noise.
You need to give some context for that, I'm not fully sure what your current application is and for that reason it is hard to give an advise. If you want to recognize speech in noise, then you might use noise-cancelling headsets, there are pretty good ones from VXI for example.
error rate of about 4 words from 1000
This is possible in clean environment, slighly harder in noise.
Overall, it is always easier to type than to speak unless you have some requirements you didn't describe. Typing and/or label scanning are much less error prone, much faster and reliable in noise. To suggest you more on a way to design your system I need to understand more details - what kind of environment is that, what kind of noise. What kind of speaker accents do you expect, will there be a single speaker or many speakers, will you allow speaker pretraining.
If you want to chat on this, let me know.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
thanks for your fast reply! Of course I can go a little bit further into details:
It is a kind of flea market for children's clothing where every vendor gets a number and puts it on the label of every piece to sale. The sales team arranges the pieces so that they fit together. So if you look for trousers of a certain size, you only have to look at one place. I don't know if this kind of flea market is also known elsewhere...
First of all, I am not talking about the cash desks, perhaps this could be misunderstood. We are in the "back office", calculating how much each vendor has sold and how much we have to pay him. We work in teams of two: One takes a label and reads vendor number and prize to the second, who enters the numbers in the computer. Basically all we say is something like "Number 1-3-6 2 point 5" meaning that vendor number 136 has sold something for 2,50.
Normally we need about 3-4 hours to put in all the numbers, which is a tedious work. But if everybody could put in the numbers by speech, we would be twice as fast... By the way, scanning does not work, as most of the labels are handwritten.
By "noise" I meant the voices and murmuring around. Nothing really loud, but of course it is not deadly silent.
Speaker training would be difficult, as the teams are changing (except for me...). But accents shouldn't be a problem, I guess. I'm from Germany, and not from a region where dialect is spoken.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I'm sorry, I don't think this would be economically viable to develop such software. Its too much effort for now. Noise is a problem. Accuracy is a problem too. And you only have this twice a year.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hello all,
I am treasurer in a small association. Twice a year we organise a big sale, and I would like to make our life easier during these sales by using Sphinx. However, as I have no experience with speech recognition, I would like to ask first if it would be feasible at all.
We have to type in a large amount of numbers. I would like to write a small program and use sphinx to get these number into the computer. The requests are:
- speech input by microphone (I guess they have to be rather good ones). We can speak a little bit slower than normal, but not too slow. There is always some surrounding noise.
- error rate of about 4 words from 1000
Can I achieve this? And could I increase accuracy or speed by restricting the vocabulary to the numbers 0-20 and 5 additional commands?
Dear Irene
Thanks in your interest in CMUSphinx
You need to give some context for that, I'm not fully sure what your current application is and for that reason it is hard to give an advise. If you want to recognize speech in noise, then you might use noise-cancelling headsets, there are pretty good ones from VXI for example.
This is possible in clean environment, slighly harder in noise.
Overall, it is always easier to type than to speak unless you have some requirements you didn't describe. Typing and/or label scanning are much less error prone, much faster and reliable in noise. To suggest you more on a way to design your system I need to understand more details - what kind of environment is that, what kind of noise. What kind of speaker accents do you expect, will there be a single speaker or many speakers, will you allow speaker pretraining.
If you want to chat on this, let me know.
Hi Nickolay,
thanks for your fast reply! Of course I can go a little bit further into details:
It is a kind of flea market for children's clothing where every vendor gets a number and puts it on the label of every piece to sale. The sales team arranges the pieces so that they fit together. So if you look for trousers of a certain size, you only have to look at one place. I don't know if this kind of flea market is also known elsewhere...
First of all, I am not talking about the cash desks, perhaps this could be misunderstood. We are in the "back office", calculating how much each vendor has sold and how much we have to pay him. We work in teams of two: One takes a label and reads vendor number and prize to the second, who enters the numbers in the computer. Basically all we say is something like "Number 1-3-6 2 point 5" meaning that vendor number 136 has sold something for 2,50.
Normally we need about 3-4 hours to put in all the numbers, which is a tedious work. But if everybody could put in the numbers by speech, we would be twice as fast... By the way, scanning does not work, as most of the labels are handwritten.
By "noise" I meant the voices and murmuring around. Nothing really loud, but of course it is not deadly silent.
Speaker training would be difficult, as the teams are changing (except for me...). But accents shouldn't be a problem, I guess. I'm from Germany, and not from a region where dialect is spoken.
I'm sorry, I don't think this would be economically viable to develop such software. Its too much effort for now. Noise is a problem. Accuracy is a problem too. And you only have this twice a year.