I want to experiment with the data collected from VoxForge project.
Some of the recordings however seem to be very quiet/noisy though.
Is there some DSP tool(prefferably command line so I can use it in
a script) that can be used to do a crude filtering of the too
quiet utterances? I don't know much about DSP, but I guess some
crude heuristics can be used. Maybe something like the average energy
in the central part of the utterance?
Are there more sophisticated tools available to assess the quality
in terms of noise, sound level and signal to noise ratio?
Thank you!
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
There is no ready-to-use tool to do that. You can output SNR estimation from
sphinx4 VAD for example you can do many other things with various external
tools. It would be nice to include database cleanup infrastructure into core
sphinxtrain process.
However, my experience with database training shows that it's better to keep
noisy and even incorrect prompts in the database. It gives better accuracy
after all. Its counter-intuitive but confirmed many times. So you need to be
careful about filtering.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Now as I think again about the issue I realize that you are right...
The problematic recordings may improve the generalization power
and robustness of the model in real-world conditions.
Thanks!
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi,
I want to experiment with the data collected from VoxForge project.
Some of the recordings however seem to be very quiet/noisy though.
Is there some DSP tool(prefferably command line so I can use it in
a script) that can be used to do a crude filtering of the too
quiet utterances? I don't know much about DSP, but I guess some
crude heuristics can be used. Maybe something like the average energy
in the central part of the utterance?
Are there more sophisticated tools available to assess the quality
in terms of noise, sound level and signal to noise ratio?
Thank you!
There is no ready-to-use tool to do that. You can output SNR estimation from
sphinx4 VAD for example you can do many other things with various external
tools. It would be nice to include database cleanup infrastructure into core
sphinxtrain process.
However, my experience with database training shows that it's better to keep
noisy and even incorrect prompts in the database. It gives better accuracy
after all. Its counter-intuitive but confirmed many times. So you need to be
careful about filtering.
Now as I think again about the issue I realize that you are right...
The problematic recordings may improve the generalization power
and robustness of the model in real-world conditions.
Thanks!