Hi,
I want to build a Hebrew command and control application generator using
sphinx. All applications should use the same acoustic model with different
grammars which may contain all possible Hebrew words. I used adaptation of the
8 khz model that comes with sphinx before, but if I understand correctly it's
better that I build an acoustic model for Hebrew.
I read this tutorial and I have a few questions:
1. Can I use hebrew characters in my dictionary and phoneset files? If not, what can I use to describe non english consonants?
2. Can I use right to left sentences in my transcription and dictionary files?
3. As I understand I should also supply a Hebrew language model. I searched a little and didn't find any models. Does anybody know if and where I can find one?
If not I guess I'll have to do some wikpedia dumps.
Thanks,
Hagai.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Can I use hebrew characters in my dictionary and phoneset files?
In dictionary yes, in phoneset it's not recommended
If not, what can I use to describe non english consonants?
you can encode them with alphanumeric sequences. Like use tav for .תֵ
Can I use right to left sentences in my transcription and dictionary
files?
Left-to-right is a way to display the file, not the way how information is
actually stored
As I understand I should also supply a Hebrew language model. I searched
a little and didn't find any models. Does anybody know if and where I can find
one?
For command and control you do not need a language model, you can write a
simple JSGF grammar.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi,
I don't think I can use JSGF grammars, since I don't know in advance what it
will be. I am writing an application generator where every application may
configure a different grammar.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I successfully extracted sentences from wikipedia, but the hebrew dictionary I
downloaded doesn't seem good - It isn't really phonetic, it just translates
the hebrew letters to latin letters.
Anyway, as you said I am not sure I need this, since my apps will all used
JSGF grammars with a relatively small number of words.
What I want to avoid is the need to re-train the system when a new group of
words has been added. Will this be possible if I use many recording hours with
a lot of possible words?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
What I want to avoid is the need to re-train the system when a new group of
words has been added. Will this be possible if I use many recording hours with
a lot of possible words?
Yes
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
OK, Thanks.
I have some data which I previously used to adapt the default 8 khz model, and
I am trying to use it to build a model just to get my feet wet. When I used it
for adaptation I got no errors or warnings.
Now I get some "Failed to align audio to trancript: final state of the search
is not reached" errors. Is this because I don't have enough data? Shouldn't it
be the same as in adaptation?
I re-ran using the trunk version and things seem a lot better. It seemed like
my previous run was configured for 16 khz even though I followed the 8 khz
settings in the tutorial. Now I get less alignment errors.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I managed to configure it. I added a DEC_CFG_GRAMMAR variable to
sphinx_train.cfg, copied psdecode.pl to psdecodejsgf.pl, changed
DEC_CFG_SCRIPT to psdecodejsgf.pl, and changed the pocketsphinx_batch command
in psdecodejsgf.pl to run -jsgf => $ST::DEC_CFG_GRAMMAR instead of the -lm
line.
With the default parameters I am getting 3.8% WER and 23.8% sentence error
rate on the training data (about 55 minutes of audio). Any pointers on which
parameters I can change to get better results? (I am also in the process of
collecting more audio).
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi,
I want to build a Hebrew command and control application generator using
sphinx. All applications should use the same acoustic model with different
grammars which may contain all possible Hebrew words. I used adaptation of the
8 khz model that comes with sphinx before, but if I understand correctly it's
better that I build an acoustic model for Hebrew.
I read this tutorial and I have a few questions:
1. Can I use hebrew characters in my dictionary and phoneset files? If not, what can I use to describe non english consonants?
2. Can I use right to left sentences in my transcription and dictionary files?
3. As I understand I should also supply a Hebrew language model. I searched a little and didn't find any models. Does anybody know if and where I can find one?
If not I guess I'll have to do some wikpedia dumps.
Thanks,
Hagai.
In dictionary yes, in phoneset it's not recommended
you can encode them with alphanumeric sequences. Like use tav for .תֵ
Left-to-right is a way to display the file, not the way how information is
actually stored
For command and control you do not need a language model, you can write a
simple JSGF grammar.
Hi,
I don't think I can use JSGF grammars, since I don't know in advance what it
will be. I am writing an application generator where every application may
configure a different grammar.
I successfully extracted sentences from wikipedia, but the hebrew dictionary I
downloaded doesn't seem good - It isn't really phonetic, it just translates
the hebrew letters to latin letters.
Anyway, as you said I am not sure I need this, since my apps will all used
JSGF grammars with a relatively small number of words.
What I want to avoid is the need to re-train the system when a new group of
words has been added. Will this be possible if I use many recording hours with
a lot of possible words?
Yes
OK, Thanks.
I have some data which I previously used to adapt the default 8 khz model, and
I am trying to use it to build a model just to get my feet wet. When I used it
for adaptation I got no errors or warnings.
Now I get some "Failed to align audio to trancript: final state of the search
is not reached" errors. Is this because I don't have enough data? Shouldn't it
be the same as in adaptation?
This is my training folder:
https://docs.google.com/open?id=0B91Vmp4A3YOuMDUyaGp6ZUJISVE
I re-ran using the trunk version and things seem a lot better. It seemed like
my previous run was configured for 16 khz even though I followed the 8 khz
settings in the tutorial. Now I get less alignment errors.
Great
Another question - Is there a way to configure JSGF instead of using a
language model in the sphinx_train.cfg file?
No
So where can I configure it?
You can not configure it. To use jsgf instead of lm you need to edit
psdecode.pl perl script and specify the required decoder option there.
I managed to configure it. I added a DEC_CFG_GRAMMAR variable to
sphinx_train.cfg, copied psdecode.pl to psdecodejsgf.pl, changed
DEC_CFG_SCRIPT to psdecodejsgf.pl, and changed the pocketsphinx_batch command
in psdecodejsgf.pl to run -jsgf => $ST::DEC_CFG_GRAMMAR instead of the -lm
line.
With the default parameters I am getting 3.8% WER and 23.8% sentence error
rate on the training data (about 55 minutes of audio). Any pointers on which
parameters I can change to get better results? (I am also in the process of
collecting more audio).
Size of the data