Hi everyone,
I'm a student in Taiwan. For my work, I have to use Kaldi. I'm working on a project about recognizing in realtime and offline with DNN. I'm asking if somebody knows that it need a big storage to speech recognize offline? So that I will know I should bring a hard drive or not.
Thank you for help
Mao-Chang
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I'm not sure exactly what you mean when you use the terms "off-line" and
"real-time" here - perhaps you could clarify what these terms mean to you.
And perhaps instead of "bring" you mean "buy"? You basically need a Linux
system and you need some familiarity with UNIX in order to do this. The
hard drive space requirement depends on how much data you want to train on,
but for small databases, a few tens of gigabytes may be enough. For large
databases, to run DNNs you'd need a cluster of computers with GPUs, and I'm
guessing from your question that that is not something you have.
Hi everyone,
I'm a student in Taiwan. For my work, I have to use Kaldi. I'm working on
a project about recognizing in realtime and offline with DNN. I'm asking if
somebody knows that it need a big storage to speech recognize offline? So
that I will know I should bring a hard drive or not.
Sorry! Maybe I should clarify how my project is going. I will carry a system with me. The system can collect sound and recognize with Kaldi in real-time. The DNN in Kaldi has been trained completely, so the DNN in system only need to test and recognize. The data system collect won't send to server run the result. I wonder if this process need a big storage or not.
Some of your message is help for me.
Hope this will be clear enough.
Mao-Chang
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hm. So you mean real-time decoding on a device without being connected to
a server.
It doesn't require very much storage, except that required to store the DNN
(a few megabytes, maybe), and to store the audio if you want to store this
for logging purposes.
Building usable real-time systems is challenging.
I suggest as a starting point you look at the program
online2-wav-nnet2-latgen-faster
and the corresponding example scripts in egs/*/s5/local/online/run_nnet2.sh
Sorry! Maybe I should clarify how my project is going. I will carry a
system with me. The system can collect sound and recognize with Kaldi in
real-time. The DNN in Kaldi has been trained completely, so the DNN in
system only need to test and recognize. The data system collect won't send
to server run the result. I wonder if this process need a big storage or
not.
Some of your message is help for me.
Hope this will be clear enough.
Thanks a lot! After a discussion with my group members, I got another question. The DNN model won't require much storage. What about the output of DNN after every training data which is big vocabulary. So that it can compare with the test data for recognizing. I am asking that if it needs big storage for output of DNN trained with big data.
Mao-Chang
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Thanks a lot! After a discussion with my group members, I got another
question. The DNN model won't require much storage. What about the output
of DNN after every training data which is big vocabulary. So that it can
compare with the test data for recognizing. I am asking that if it needs
big storage for output of DNN trained with big data.
It depends on so many things that the question doesn't even make sense.
Before asking any more questions, please do some reading and try to
understand at least a little about how speech recognition works.
Dan
Hi everyone,
I'm a student in Taiwan. For my work, I have to use Kaldi. I'm working on a project about recognizing in realtime and offline with DNN. I'm asking if somebody knows that it need a big storage to speech recognize offline? So that I will know I should bring a hard drive or not.
Thank you for help
Mao-Chang
I'm not sure exactly what you mean when you use the terms "off-line" and
"real-time" here - perhaps you could clarify what these terms mean to you.
And perhaps instead of "bring" you mean "buy"? You basically need a Linux
system and you need some familiarity with UNIX in order to do this. The
hard drive space requirement depends on how much data you want to train on,
but for small databases, a few tens of gigabytes may be enough. For large
databases, to run DNNs you'd need a cluster of computers with GPUs, and I'm
guessing from your question that that is not something you have.
Dan
On Sun, Oct 5, 2014 at 2:08 AM, Mao-Chang kevin79577@users.sf.net wrote:
Sorry! Maybe I should clarify how my project is going. I will carry a system with me. The system can collect sound and recognize with Kaldi in real-time. The DNN in Kaldi has been trained completely, so the DNN in system only need to test and recognize. The data system collect won't send to server run the result. I wonder if this process need a big storage or not.
Some of your message is help for me.
Hope this will be clear enough.
Mao-Chang
Hm. So you mean real-time decoding on a device without being connected to
a server.
It doesn't require very much storage, except that required to store the DNN
(a few megabytes, maybe), and to store the audio if you want to store this
for logging purposes.
Building usable real-time systems is challenging.
I suggest as a starting point you look at the program
online2-wav-nnet2-latgen-faster
and the corresponding example scripts in egs/*/s5/local/online/run_nnet2.sh
Dan
On Sun, Oct 5, 2014 at 9:26 PM, Mao-Chang kevin79577@users.sf.net wrote:
Thanks a lot! After a discussion with my group members, I got another question. The DNN model won't require much storage. What about the output of DNN after every training data which is big vocabulary. So that it can compare with the test data for recognizing. I am asking that if it needs big storage for output of DNN trained with big data.
Mao-Chang
It won't need a lot of storage, no. You should probably read a bit of
background about speech recognition, e.g. the HTK Book.
Dan
On Sun, Oct 12, 2014 at 10:34 PM, Mao-Chang kevin79577@users.sf.net wrote:
I have asked someone who use HTK told me that it only need less than 50 MB to store all the model for a data set. So Kaldi is also need about 50 MB ?
Mao-Chang
It depends on so many things that the question doesn't even make sense.
Before asking any more questions, please do some reading and try to
understand at least a little about how speech recognition works.
Dan
On Tue, Oct 14, 2014 at 12:36 PM, Mao-Chang kevin79577@users.sf.net wrote: