From: Dimitris V. <dva...@gm...> - 2014-10-24 06:44:56
|
Hi Dan, regarding your poll for new stuff in the kaldi toolkit: 1. DNN-based, robust VAD. Integrated with the online decoders 2. More decoder optimisations for speed (RTF): - I have seen some benchmarks <http://on-demand.gputechconf.com/gtc/2014/video/S4732-deep-learning-networks-automatic-speech-recognition.mp4> where Python/Theano achieves 3x speedup over kaldi (for DNNs), both on CPU and GPU. - perhaps make use of techniques like caching to save on computation - optimisations for online decoding 3. integrate RNN LMs deeper into the decoder, i.e. incorporate RNNs during the 1st pass of the decoder (there is an interesting paper by Microsoft here <http://www.google.gr/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&uact=8&ved=0CCUQFjAA&url=http://research.microsoft.com/pubs/210168/rnnFirstPass.pdf&ei=7_BJVKnpBYLCPKH2gbAG&usg=AFQjCNHbMC8rdUZUskZ9oPIKHrW093KuZQ&sig2=F9RBm2R0oTCrcFR8-BNJCA&bvm=bv.77880786,d.ZWU>) Thanks, Dimitris |