you'd want to break it up into smaller pieces, like 15 segments as the decoder doesn't handle too-long segments, but thaat's trivial. On Tue, Sep 11, 2018 at 3:43 AM shashi shashi1020@users.sourceforge.net wrote: Sorry Dan.. I'll make my question clear to you. Using kaldi is it possible to translate a stram of 6-8 hours audio into text? Sent from sourceforge.net because you indicated interest in https://sourceforge.net/p/kaldi/wiki/Home/ To unsubscribe from further messages, please visit https:/...
see kaldi-asr.org/forums.html for how to ask questions, but your questions are very unclear. On Mon, Sep 10, 2018 at 7:12 AM shashi shashi1020@users.sourceforge.net wrote: Hello, Can Any one provide the limitations for kaldi usage such as, how it works on realtime data? can we stream data for 6-8 hours? speaking recoginization based on time window? Sent from sourceforge.net because you indicated interest in https://sourceforge.net/p/kaldi/wiki/Home/ To unsubscribe from further messages, please visit...
Cool. I know that in China they have various dedicated lists. If you were to start one for Kaldi researchers in India it might be helpful. If you do, try to set it up so the archives are searchable (like kaldi-help) so that people can find it from google. Dan On Wed, Jul 4, 2018 at 2:34 PM, rohit kodali rohitgowtham@users.sourceforge.net wrote: Hi dan, I actively follow kaldi forums. I don't think we have anything specially for indian or any other tonal languages, but if we have something for it...
Rohit: thanks for responding. For your guys' info, kaldi-help is the primary location for these discussions, see kaldi-asr.org/forums.html. There may be forums for Indian users of Kaldi too, which I am not aware of, and these, if they exist, would very very suitable for new users like Vaibhav. On Wed, Jul 4, 2018 at 4:34 AM, rohit kodali rohitgowtham@users.sourceforge.net wrote: Add More data from more speakers and get good phone tic coverage On Wed, 4 Jul 2018, 1:33 pm Vaibhav, ervaibhavkumar@users.sourceforge.net...
kaldi doesn't live on sourceforge anymore. There isn't a script in steps/, but you can easily figure out how to write one if you understand Kaldi I/O mechanisms, with reference to the existing scripts. On Tue, Jul 3, 2018 at 6:00 AM, Vaibhav ervaibhavkumar@users.sourceforge.net wrote: Hi I wanted to know that where i can found the script to extract pitch features only . I know there is script for mfcc + pitch and plp + pitch features . But i am unable to find the script to extract only pitch features...
ERROR! The markdown supplied could not be parsed correctly. Did you forget to surround...
BTW, in case anyone is getting these forum emails, please know that this forum, like...
It looks to me like the issue was that for some reason the riff_chunk_size specified...
ERROR! The markdown supplied could not be parsed correctly. Did you forget to surround...
ERROR! The markdown supplied could not be parsed correctly. Did you forget to surround...
ERROR! The markdown supplied could not be parsed correctly. Did you forget to surround...
Everything looks right in what you described. Possibly there was a mismatch in a...
Possibly it is trying to do a split where validation-set speakers are distinct from...
trunk: minor fix to last trunk commit RE cu-dev...
sandbox/nnet3: merge changes from trunk: also a...
trunk: modifying cu-device.cc to work around wh...
Karel, could you please fix this? I think a comment explaining what the "r=1" thing...
What you are encountering is instability; it is a common problem in neural network...
I haven't looked into tuning that particular setup. You could just use all-defaults....
I don't think his issue is coming from his archive that starts with "gunzip". I think...
Those objective function improvements are too large- they should be around 10. It...
sandbox/nnet3: added some test code
Yenda, so does the slurm.pl support those new-style options? I didn't realize that;...
It's probably some local issue, maybe path or directory or permission-related, but...
I don't think I have ever run the setup with the fbank features-- there is really...
Thanks, committing the fix. Dan On Tue, Jul 14, 2015 at 5:35 AM, Morino mozno@users.sf.net...
trunk: small fix to get_lda_block.sh (thanks: m...
You don't need to use kaldi's lattice format, you can use standard FST tools such...
sandbox/nnet3: extensions to test routines and ...
You should probably use a grammar (G.fst) that only allows one label. See openfst.org...
sandbox/nnet3: various code cleanup and documen...
Show us the complete linking line (i.e. the command that the Makefile executed)....
It's the per-transition-id occupation counts. They are rarely needed -- e.g. I think...
For dialogue, what you need is speaker diarization, not just speaker identification....
Yes, what I meant is, how did you create the ARPA-format LM? Also, make sure your...
hamming and hanning are standard functions that you could look at online. The "povey"...
Probably the issue is that some of your features have a different dimension than...
sandbox/nnet3: further code cleanup and more do...
You could try setting the num_leaves to the cube of the number of phones in your...
Firstly, if the model was trained for 16kHz data you cannot recognize 8kHz data,...
sandbox/nnet3: clarifications, name changes, mi...
How did you create the LM? Also, you might want to get it in a debugger and figure...
Look at steps/get_ctm.sh Dan On Sun, Jul 12, 2015 at 3:11 AM, Konstantinos Themelis...
sandbox/nnet3: renaming and minor bug fixes; ad...
sandbox/nnet3: cosmetic and documentation changes
One easy way to do this is simply to set up the neural network with two affine components...
sandbox/nnet3: documentation changes
Hm. There is also a script called steps/nnet2/train_multilang2.sh which you may find...
Sorry I don't recall the exact meaning and don't have time to check, you'll have...
Those are not the stats from nbest-to-linear that you should be looking at, they...
Yes, if you want a block structure then BlockAffineComponentPreconditioned would...
sandbox/nnet3: mostly minor changes, adding a l...
If the lattice changes cause by determization function. Does that mean if I get the...
If you put the string --delta-order=0 into a file exp/mono/delta_opts (or whatever...
I'm trying to get the lattice during online decoding by using SingleUtteranceNnet2Decoder:...
It looks like the delta-opts are not being passed in to the decoding script- most...
Hm. I suspect that that directory does exist but you did not see it somehow. Or possibly...
sandbox/nnet3: various bug fixes, get tests wor...
This completely depends on what decoding pipeline you are using. If it's one that...
The form "--delta-opts "--delta-order=0"" would be used on the command line of train_deltas.sh,...
I found a tutorial on iVectors here http://www1.icsi.berkeley.edu/Speech/presentations/AFRL_ICSI_visit2_JFA_tutorial_icsitalk.pdf...
regarding setting relu activation component instead of pnorm, from what you mentioned...
You could write the MLLR transforms using the ark,scp format and split the .scp file....
Yes, they are only for initialization. Dan On Tue, Jul 7, 2015 at 9:22 AM, Yan Yin...
Various minor fixes, plus adding (prototype) gl...
I think you may not understand what MLLR is. You could do what you say by creating...
Your edited post said: "Also, I have tried to use the SRILM tools instead to build...
You might find the SRILM tools to be easier to use-- have a look for a setup that...
sandbox/nnet3: various extra tests, debugging c...
It's for diagnostics and various "fixing" procedures (nnet-fix) where we detect that...
ARM is usually for small embedded platforms. The decoding method used in Kaldi is...
If you are concerned about the accuracy of the transcriptions, use steps/cleanup/find_bad_utts.sh...
The shift is 10 ms and the window size is 250 ms by default. These are parameters...
Just a follow-up on this: I realized it doesn't make sense to have the input be a...
sandbox/nnet3: extension to optimization code t...
sandbox/nnet3: test code and bug fixes for opti...
It's OK, there is no real difference between help and discuss. we are currently using...
fMLLR is a feature-space adaptation method and can't be applied to the model (at...
sandbox/nnet3: Committing some optimization cod...
sandbox/nnet3: committing some more work (nnet-...
sandbox/nnet3: committing some more progress (c...
I think some kind of misconception may be behind this question. Whatever you want,...
There isn't a mechanism, and making one would be complicated. Right now I am putting...
BTW, I am showing the text-form FST with words instead of integer symbols. The RM...
When you say "scoring", I think what you are really talking about is decoding. Scoring...
I'm trying to figure out how to use the Bottleneck features... Why do we need the...
Most of these FSTs are too large to visualize in a picture- there may be an fstdraw...
fstprint On Wed, Jul 1, 2015 at 12:46 AM, blculiwei blculiwei@users.sf.net wrote:...
The answer is similar to the answer I previously gave to your question about the...
That paper is very old and I doubt the results are applicable to modern systems....
sandbox/nnet3: adding functionality for pretty-...
Some words may have alternative pronunciations, and the alignment process can choose...
See scripts like steps/align.sh and steps/align_fmllr.sh. I'm not sure where this...
LSTM is not currently supported in nnet2. In 2 or 3 months it should be available...
sandbox/nnet3: various bug fixes and refactoring.
copy-feats input.ark ark,t:- | less On Sun, Jun 28, 2015 at 4:13 AM, atuk atuk123@users.sf.net...
sandbox/nnet3: fixing conceptual bug in how com...
That paper seems to be addressing a different problem- namely, how to make use of...
There is a way to get the frame-by-frame acoustic scores, but it is slightly complicated....
When I use kaldi to do gmm online decoding, I found I use asus notebook to record,...