I'm using Sphinx4 to analyze audio files with a highly restrictive grammar. I'm not getting terribly useful results, however. I wonder if any of you might be able to have a quick look at the data and configuration file and suggest how to improve the performance. Any assistance would be greatly appreciated.
Here's a bit of background:
The files I'm analyzing are responses from a psychological experiment investigating reading speed. Subjects were presented with a screen full of letters, and were asked to read them back as quickly as possible. As a result, the audio files contain just a few letters -- J, H, F, U and V -- repeated several times in a random order.
I'm trying to extract both the sequence of letters that were read, and the timings of the utterances.
The program I wrote is based on the HelloWorld demo and the Wav transcription demo. Since the set of possible utterances is so small, though, I wrote a simple new grammar definition as well.
Below are links to relevant files and a complete copy of the configuration file I'm using. If you need anything else, or would prefer these in a different format, I'll gladly oblige.
If any of you have any bright ideas, please send them my way!
As a secondary question, in some segments Sphinx4 fails to recognize any words at all. Do you know what settings I might be able to change to avoid this?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
(00:55:25) nshm: d_h_benson: <property name="sampleRate" value="44100"/> <- this will not work for sure
(00:55:33) nshm: samplerate must be 16000
(00:55:48) nshm: but you can use AudioStreamDataSource that will do conversion for you
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi,
I'm using Sphinx4 to analyze audio files with a highly restrictive grammar. I'm not getting terribly useful results, however. I wonder if any of you might be able to have a quick look at the data and configuration file and suggest how to improve the performance. Any assistance would be greatly appreciated.
Here's a bit of background:
The files I'm analyzing are responses from a psychological experiment investigating reading speed. Subjects were presented with a screen full of letters, and were asked to read them back as quickly as possible. As a result, the audio files contain just a few letters -- J, H, F, U and V -- repeated several times in a random order.
I'm trying to extract both the sequence of letters that were read, and the timings of the utterances.
The program I wrote is based on the HelloWorld demo and the Wav transcription demo. Since the set of possible utterances is so small, though, I wrote a simple new grammar definition as well.
Below are links to relevant files and a complete copy of the configuration file I'm using. If you need anything else, or would prefer these in a different format, I'll gladly oblige.
If any of you have any bright ideas, please send them my way!
Many thanks,
Dave Benson
>>
Here's the grammar definition:
http://www.mediafire.com/file/vyvm0grn8ny/letters.gram
Here's the main class of the application:
http://www.mediafire.com/file/jzwt9dvurbw/WordTimings.java
Here's a sample audio file:
http://www.mediafire.com/file/7baafegaaaz/test_file.wav
Here's an audacity project showing the audio file and the resulting analysis:
http://www.mediafire.com/file/jq3ndhkcna7/test_file_audacity_project.zip
Here's the config file:
http://www.mediafire.com/file/zchocuucgcw/wordtimings_letters.config.xml
>>
<?xml version="1.0" encoding="UTF-8"?>
<!--
Sphinx-4 Configuration file
-->
<!-- ******** -->
<!-- an4 configuration file -->
<!-- ******** -->
<config>
<!-- <property name="silenceInsertionProbability" value="0.1"/> -->
<property name="languageWeight" value="8"/>
</component>
<!-- <property name="wordReplacement" value="<sil>"/>
<property name="allowMissingWords" value="false"/> -->
<property name="unitManager" value="unitManager"/>
</component>
<!-- <component name="microphone"
type="edu.cmu.sphinx.frontend.util.Microphone">
<property name="msecPerRead" value="10"/>
<property name="closeBetweenUtterances" value="false"/>
</component>
-->
</config>
As a secondary question, in some segments Sphinx4 fails to recognize any words at all. Do you know what settings I might be able to change to avoid this?
(00:55:25) nshm: d_h_benson: <property name="sampleRate" value="44100"/> <- this will not work for sure
(00:55:33) nshm: samplerate must be 16000
(00:55:48) nshm: but you can use AudioStreamDataSource that will do conversion for you