hello, im using sphinx 3.0.6 for my project... as i used the engine, i noticed it has a default value for s3_max_frames to 15000 which means 150 secs of audio input right? so i changed it to 30000... now i can decode audio files with less than 5 minutes of length but if i try to make it larger, say 50000, i get an error... my application stops and exits.... why is this happening? what happens when i change the value of s3_max_frames? is there a limitation for it?... and as a side question, using live decode, if i change it to 50000, i get an error "Bad lw2 argument (294127368) to lm_tg_score" at about frame number 32769 (not exact), why does this also happen?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Actually it's not recommended to decode big chunks of text at once. Algorithm will lag on such a big amount of data. For example here you have long int overflow. Of course you can make every int bigger but it will not help you a lot.
Split your big text on chunks and decode each one separately. Refer to sphinx3-decode code or use it directly. Similar comment:
ummm... if i split a file, say i have this 10 min file and split it to, say, five 2-minute files, a problem could occur if i split the file at the point where a word is being spoken so i will it will end up at the being the first few syllables of the word at the first file and the other syllables at the other file having it a wrong recognition... or will sphinx3_continuous (as mentioned on the link you posted) do the job for it?
and also, aside from sphinx3_decode, i used sphinx3_livedecode but modified slightly at recording of samples... instead of getting the samples from waveIn, i directly get the sample data from the wave file at make it as input for the ld_process_raw functions... it goes well also in small files but gets an error if use my bigger files....
thx for the reply...
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
cont_ad will track silence region and split your utterance during silence. sphinx3_continuous does exactly that. Nobody can speak without pauses for 10 minutes. ad_read should return 0 if there was silence in all samples so you can check it and if it's 0 for several iterations you can start decoding of an utterance. The rest will be handled later.
So you should probably look in main_continuous.c source. Both livedecode and livepretend aren't designed to with long input.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
hello, im using sphinx 3.0.6 for my project... as i used the engine, i noticed it has a default value for s3_max_frames to 15000 which means 150 secs of audio input right? so i changed it to 30000... now i can decode audio files with less than 5 minutes of length but if i try to make it larger, say 50000, i get an error... my application stops and exits.... why is this happening? what happens when i change the value of s3_max_frames? is there a limitation for it?... and as a side question, using live decode, if i change it to 50000, i get an error "Bad lw2 argument (294127368) to lm_tg_score" at about frame number 32769 (not exact), why does this also happen?
Actually it's not recommended to decode big chunks of text at once. Algorithm will lag on such a big amount of data. For example here you have long int overflow. Of course you can make every int bigger but it will not help you a lot.
Split your big text on chunks and decode each one separately. Refer to sphinx3-decode code or use it directly. Similar comment:
https://sourceforge.net/forum/message.php?msg_id=4349959
ummm... if i split a file, say i have this 10 min file and split it to, say, five 2-minute files, a problem could occur if i split the file at the point where a word is being spoken so i will it will end up at the being the first few syllables of the word at the first file and the other syllables at the other file having it a wrong recognition... or will sphinx3_continuous (as mentioned on the link you posted) do the job for it?
and also, aside from sphinx3_decode, i used sphinx3_livedecode but modified slightly at recording of samples... instead of getting the samples from waveIn, i directly get the sample data from the wave file at make it as input for the ld_process_raw functions... it goes well also in small files but gets an error if use my bigger files....
thx for the reply...
cont_ad will track silence region and split your utterance during silence. sphinx3_continuous does exactly that. Nobody can speak without pauses for 10 minutes. ad_read should return 0 if there was silence in all samples so you can check it and if it's 0 for several iterations you can start decoding of an utterance. The rest will be handled later.
So you should probably look in main_continuous.c source. Both livedecode and livepretend aren't designed to with long input.
thx... i've just tried sphinx3_continuous and it works correctly... since livedecode can't do the job, i'll shift to using sphinx3_continuous now...
thanks again... i'll try to look at the source code now... i can read C but not that much though... im just pinvoking it using C#...