And I got a perfect result, however when I run the file included iwth the WavFile demo, 12345.wav through this config I get "what you eat or five", which is not so good. I am partially worried about the audio formats I am using, so I used some software to convert the an4 data to a wavfile and when I run that through my java code (see the routines below), I get the following:
File is in :PCM_SIGNED 44100.0 Hz, 16 bit, stereo, 4 bytes/frame, little-endian
for the audio format. This file STILL PRODUCES PERFECT RESULTS despite the fact that the format is all wrong. however when I attempt to convert it to the right format using (see java code below):
which should down convert it to match my config.xml file, I get no results whatsoever.
So any ideas as to what is going on here, why am I able to work with a 44.1 khz file and not with a 16khz file, when I specified a 16khz file in the config.xml file? Do all sphinx inputs need to be mono, and how do I set the bytesPerRead property (how is this calculated)?
I see now that java can't down covert my file to 16khz, but I still don't understand how the file actually works with the setup in the config.xml file.
My goal is to pass a converted mp3 file to sphinx and have it spit out some meaningful text, currently I am testing it with wavs and getting 0 words correct in a 10 second file using hub4. I must be doing something wrong?
Please Help!!!!
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
All-
As my previous post mentioned I have been having problems converting the transcribe config files to work with text data.
I decided to run some tests using the an4 dataset, specifically the:
an4_clstk/fbbh/cen8-fbbh-b.raw
audio file.
I ran this through the following congif.xml:
<?xml version="1.0" encoding="UTF-8"?>
<!--
Sphinx-4 Configuration file
-->
<!-- ******** -->
<!-- biship configuration file -->
<!-- ******** -->
<config>
<!-- ******** -->
<!-- frequently tuned properties -->
<!-- ******** -->
<property name="absoluteBeamWidth" value="1000"/>
<property name="relativeBeamWidth" value="1E-80"/>
<property name="absoluteWordBeamWidth" value="500"/>
<property name="relativeWordBeamWidth" value="1E-60"/>
<property name="wordInsertionProbability" value="1E-16"/>
<property name="languageWeight" value="7.0"/>
<property name="silenceInsertionProbability" value=".1"/>
<property name="frontend" value="mfcFrontEnd"/>
<property name="recognizer" value="recognizer"/>
<property name="showCreations" value="false"/>
</config>
And I got a perfect result, however when I run the file included iwth the WavFile demo, 12345.wav through this config I get "what you eat or five", which is not so good. I am partially worried about the audio formats I am using, so I used some software to convert the an4 data to a wavfile and when I run that through my java code (see the routines below), I get the following:
File is in :PCM_SIGNED 44100.0 Hz, 16 bit, stereo, 4 bytes/frame, little-endian
for the audio format. This file STILL PRODUCES PERFECT RESULTS despite the fact that the format is all wrong. however when I attempt to convert it to the right format using (see java code below):
AudioFormat.Encoding.PCM_SIGNED,
16000,
16,
1,
2,
2,
16000,
false );
which should down convert it to match my config.xml file, I get no results whatsoever.
So any ideas as to what is going on here, why am I able to work with a 44.1 khz file and not with a 16khz file, when I specified a 16khz file in the config.xml file? Do all sphinx inputs need to be mono, and how do I set the bytesPerRead property (how is this calculated)?
Very confused,
James
Here is all of my java code:
private AudioInputStream convertFormat(AudioInputStream audioInputStream) {
AudioFormat audioFormat = audioInputStream.getFormat();
System.out.println( "Play input audio format=" + audioFormat );
// Result result = recognizer.recognize();
// String resultText = result.getBestResultNoFiller();
// System.out.println("You said: " + resultText + "\n");
//
I see now that java can't down covert my file to 16khz, but I still don't understand how the file actually works with the setup in the config.xml file.
My goal is to pass a converted mp3 file to sphinx and have it spit out some meaningful text, currently I am testing it with wavs and getting 0 words correct in a 10 second file using hub4. I must be doing something wrong?
Please Help!!!!