Let me preface this by saying that I have attempted to read everything on the website and as much of the forums as I can before I posted my problem.
I have an mp3 file that I would like to transcribe.
I convert the mp3 file to wav using mp3.jar (javaworld article code on how to do this) and jl1.0.jar (more open source java mp3 libraries). That all seems to go well, I get a wav file with:
Good right? I have 16khz, 16bit, little-endian, everything should be well.
So I am then using the following config file, pinched from transcriber and modified as the readme mentions (change grammar, linguist, and language model):
So as you can see I am using the hub4 models (and dicts) to attempt my decoding. Also note where I specify the wav file format, I think that is right, though the bytesPerRead I am less sure of (though I read a recent post saying that what I have is correct, I think, and does being in stereo matter?).
So my problem is that on a 10 second wav file, the runtime is extraordinary (over 20 minutes so far on a 3.2GHz dual-core machine with 4 GB of ram) and the transcription is really terrible.
Does anyone have any config files for transcribing text that work okay, that I could see? This seems like a fairly common usage, maybe we could add it to the wiki....
Thanks much in advance, and apologies for my incompetence.
James
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
HI-
Let me preface this by saying that I have attempted to read everything on the website and as much of the forums as I can before I posted my problem.
I have an mp3 file that I would like to transcribe.
I convert the mp3 file to wav using mp3.jar (javaworld article code on how to do this) and jl1.0.jar (more open source java mp3 libraries). That all seems to go well, I get a wav file with:
WAVE (.wav) file, byte length: 789322, data format: PCM_SIGNED 16000.0 Hz, 16 bit, stereo, 4 bytes/frame, little-endian, frame length: 197313
Good right? I have 16khz, 16bit, little-endian, everything should be well.
So I am then using the following config file, pinched from transcriber and modified as the readme mentions (change grammar, linguist, and language model):
<?xml version="1.0" encoding="UTF-8"?>
<!--
Sphinx-4 Configuration file
-->
<!-- ******** -->
<!-- an4 configuration file -->
<!-- ******** -->
<config>
</component>
<component name="searchManager"
type="edu.cmu.sphinx.decoder.search.SimpleBreadthFirstSearchManager">
<property name="logMath" value="logMath"/>
<property name="linguist" value="lexTreeLinguist"/>
<property name="pruner" value="trivialPruner"/>
<property name="scorer" value="threadedScorer"/>
<property name="activeListFactory" value="activeList"/>
</component>
</config>
So as you can see I am using the hub4 models (and dicts) to attempt my decoding. Also note where I specify the wav file format, I think that is right, though the bytesPerRead I am less sure of (though I read a recent post saying that what I have is correct, I think, and does being in stereo matter?).
So my problem is that on a 10 second wav file, the runtime is extraordinary (over 20 minutes so far on a 3.2GHz dual-core machine with 4 GB of ram) and the transcription is really terrible.
Does anyone have any config files for transcribing text that work okay, that I could see? This seems like a fairly common usage, maybe we could add it to the wiki....
Thanks much in advance, and apologies for my incompetence.
James