The model was trained from 5 hours of speech, unfortunately from a small amount of speakers. So please help to improve it - submit your speech to voxforge.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I tried the new model, mostly in the S4 HelloNGram demo.
For digits it works great, but free speech is still pretty bad, though I could improve accuracy a lot by creating a .lm from the full transcription file (3000 sentences).
One of the main problems I encounter is still the missing word problem I posted above, with the German umlauts.
I guess it's a problem in the getWord() method of the FastDictionary component, though it's probably fixed by now in SVN? (I'm still using the original sphinx4-1.0beta package).
I hope me and some other Germans will find some time to contribute to the voxforge project as the model is currently very much overtrained by Ralf's files.
Is there a possibility of getting all the transcripts (all the prompts.txt files) in the repository without downloading the audio files or going through every submission in the "listen" section?
We could then at least create a probably pretty decent language model for free speech ...
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
regarding the missing word error:
All the words containing those umlauts are correct in the dictionary, transcription and language model.
There seems to be a problem loading them from he dictionary.
If you're not encountering this problem, I should probably upgrade my sphinx4 to a newer SVN version.
The mentioned problem is severe for recognition because in Ralf's samples there are a lot of words with umlauts.
Decoding with S3 there's apparently no such problem as the digit "fünf" containing an umlaut is correctly recognized.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Sure, I'll upload jar too. Right now I just need help from someone who knows the language in order to fix rather significant amount of BW errors during training.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Well, I could probably help you a little.
I'm a German student and I've been waiting for a German acoustic model for some time now.
I'll try out the one from Voxforge and if I can be of some (not too much time consuming) help, feel free to ask me anything.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Thanks! Your help is really appreciated. We have to share audio first (it's available for download from voxforge right now but in rather unpleasant way (you have to use wget). I would be really happy if someone could review the dictionary and prompts. Training gives a lot of errors on bad alignment, so one with German experience should check transcription.
About jsgf and missing phones, it should work, model has test subfolder with test script for sphinx3. I think it should be rather stable to reproduce. I'll try to upload jar too so we can check each other.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Could you post some links to the files I have to download? I can't really find it on their website? Or should I download everything in the speech corpus from the "listen" section?
The Phoneset seems pretty good, also the pronunciation in the dictionary, but to individually check all 3200+ entries could take some time. Could someone explain to me, why the phone "qq" is used in front of every word beginning with a vocal ?
I could check the transcript for errors, but where can I find the prompts to download?
If you have a link to a .jar version of the acoustic model for S4, I could test it with a few demos. Mine seems to be created wrong, at least flatLinguist complains as stated above, maybe I should create one with the cd-8gau-files?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
> Could you post some links to the files I have to download? I can't really find it on their website? Or should I download everything in the speech corpus from the "listen" section?
Yeah, currently they are only available in Listen. You can get them with wget for example. See the discussion at the bottom of the:
> The Phoneset seems pretty good, also the pronunciation in the dictionary, but to individually check all 3200+ entries could take some time. Could someone explain to me, why the phone "qq" is used in front of every word beginning with a vocal ?
I could check the transcript for errors, but where can I find the prompts to download?
I understand. The problem is that due to the Bomp restrictions dictionary is created by espeak rules and a little perl script for phone mapping. Probably qq is really not needed for the beginning. I suppose the dictionary is not correct just because there are around 300 rejected prompts as you can see in the log.
> If you have a link to a .jar version of the acoustic model for S4, I could test it with a few demos. Mine seems to be created wrong, at least flatLinguist complains as stated above, maybe I should create one with the cd-8gau-files?
I'll try to prepare it tomorrow.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Thanks.
I did some testing with it.
Decoding digits with the WavFile or the HelloDigits demo and a jsgf grammar works very good.
When using a language model, created from the transcript file, and decoding with the HelloNGram demo, digits work pretty good. But recognition of random speech is really bad (almost completely random results).
Was the acoustic model created from all the files in the German "Listen" section on voxforge.org ?
Because even decoding audio files by Ralf Herzog (which should have been trained I guess) returns really bad results.
So I don't know what's wrong, but a model created from all those sentences in the German corpus should decode better ...
Also, I don't know if it's a problem on my side, but when starting the HelloNGram demo, it complains about all the words with "umlaute", like äöü, though they are represented correctly in the dictionary and the language model.
Here's a little output:
02:04.181 WARNING dictionary Missing word: bᅢᄐcher
in edu.cmu.sphinx.linguist.dictionary.FastDictionary:getWord-
dictionary
02:04.181 WARNING dictionary Missing word: verzᅢᄊgert
in edu.cmu.sphinx.linguist.dictionary.FastDictionary:getWord-
dictionary
02:04.195 WARNING dictionary Missing word: hartnᅢᄂckig
in edu.cmu.sphinx.linguist.dictionary.FastDictionary:getWord-
dictionary
02:04.196 WARNING dictionary Missing word: abstᅢᄂnden
in edu.cmu.sphinx.linguist.dictionary.FastDictionary:getWord-
dictionary
02:04.196 WARNING dictionary Missing word: fᅢᄐrsorglichkeit
in edu.cmu.sphinx.linguist.dictionary.FastDictionary:getWord-
dictionary
02:04.196 WARNING dictionary Missing word: gefᅢᄂhrlich
in edu.cmu.sphinx.linguist.dictionary.FastDictionary:getWord-
dictionary
02:04.196 WARNING dictionary Missing word: verhᅢᄂltnis
in edu.cmu.sphinx.linguist.dictionary.FastDictionary:getWord-
dictionary
02:04.196 WARNING dictionary Missing word: dᅢᄐrfen
in edu.cmu.sphinx.linguist.dictionary.FastDictionary:getWord-
dictionary
02:04.196 WARNING dictionary Missing word: kᅢᄐchen
in edu.cmu.sphinx.linguist.dictionary.FastDictionary:getWord-
dictionary
02:04.196 WARNING dictionary Missing word: zusᅢᄂtzliche
in edu.cmu.sphinx.linguist.dictionary.FastDictionary:getWord-
dictionary
02:04.196 WARNING dictionary Missing word: wᅢᄂhrungspolitik
in edu.cmu.sphinx.linguist.dictionary.FastDictionary:getWord-
dictionary
02:04.196 WARNING dictionary Missing word: gᅢᄂngigsten
in edu.cmu.sphinx.linguist.dictionary.FastDictionary:getWord-
dictionary
Is FastDictionary having problems with those (notice the wrong display of the letters) or is it something I have set wrong ?
The WavFile and HelloDigits demos didn't complain about the digit "fünf" for example.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
BTW,
what problems in the training run do you mean?
Looking at the html file, your last training run completed successful.
Do you mean the "final state not reached" errors in the cd-training logs?
I don't know exactly what those mean. Is it that there's something wrong with the audio file to the corresponding utterance in the transcript?
If every one of those utterances is ignored, it's of course a big problem ... would it be hard to force-align them, so they're used in BW training anyway?
I listened to some of the problematic sound files, and they were perfectly ok, no cut-offs or anything. I don't know why the error occurs for them as opposed to the other files.
Also, why are there only few submission by Ralf Herzog trainied? (de91 through de120)
Would the model be overtrained otherwise ?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
About errors, yeah I meant errors when final state not reached in cd logs. It mostly means that transcription is out-of-sync with the audio. Probably not in this particular place but it signals about problem in transcription and/or dictionary. That's why I'm asking for review here. I think we need to start with a smaller set of prompts and find a problem in our dictionary. Then we can double the data and keep an eye on dropped prompts. Final state should always be reached. Force align is a good idea indeed, I'll try it too.
Overtraining is a problem, I several times told Ralf we don't need so many recordings from him. But since there is no more data I think we'll train on existing audio. There are some other speakers so I hope it will be ok for beginning. Also, we'll probably use Ralf's voice for TTS
database, so his work at least sensible. I trained only some of them just because I didn't download everything else. We are still in process of transferring the data to repository.
About problems with generic recognition, could you please submit a sample as I did. I'll try to look and check what's wrong there.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Ich checked the transcript and dictionary for spelling. Everything's alright here.
I also checked some of the problematic sound files (those mentioned in the training logs) with audacity, they are completely normal, like any of the other files.
I really don't know what's wrong here ...
Also, I still don't know why I'm getting those errors with the special German characters (like posted above).
I tried creating an LM from the transcript and it worked, but still the decoding is pretty bad, probably because of all the "missing" words from the dictionary and the skipped utterances in training.
If can can give me a hint in the right direction to go on, please do so.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I just encountered a problem, I don't know if it's my mistake:
I created a .jar S4 Model to use the German Voxforge Model for testing the HelloDigits demo.
I also created a .lm with the LMTool.
I tried running the demo, both with an lmGrammar or a jsgfGrammar, but flatLinguist always complains about missing HMM for certain phones, though the phones are in the .phone file and in the dictionary and the words in the jsgfGrammar are all in the dictionary and transcript ...
I used the ci_cont-Models, should I better use the CD ones ?
flatLinguist can't find the phone "qq" when using a lmGrammar for the demo and the phone "v" when using a jsgfGrammar.
For the jsgf I only used German digits, so the phone "v" is only in the word "zwei".
Have I done something wrong anywhere ? Seems like my acoustic model is broken.?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I came across this thread while searching for information about German speech
recognition with Sphinx. I have downloaded the German acoustic model,
dictionary
The digits example works well, but the dictionary has only a small number of
words and was missing a majority of the words in a portion of sample text. I
am about to
dictionary-version-016/ ) and also a direct phonetic interpretation of the
spelling. This is probably more achievable in German than it would be in
English, as German
pronunciation is fairly predictable.
The current German dictionary from the above link is in a slightly odd format,
however. Here is a sample:
ab qq a p
abbauen qq a b au @ n
aber qq aa: b ei
abfälle qq a p f ee l @
abgaben qq a p g aa: b @ n
abgebaut qq a p g @ b au t
abgeben qq a p g e: b @ n
abgebrochen qq a p g @ b r oo x @ n
abgedeckt qq a p g @ d ee k t
abgefallen qq a p g @ f a l @ n
abgefunden qq a p g @ f uu n d @ n
abgehalten qq a p g @ h a l t @ n
I noted that someone up above asked about the double-q phoneme that appears
before initial vowels. There was no clear answer... Does anyone know why it is
there
and what it means? Would it matter if this was left out of the dictionary?
Also, the unstressed e (the schwa) is represented by an @ symbol, which is not
part of the
If someone has already created a large German pronunciation dictionary and/or
acoustic model, I would like to share it. Also, I can probably help train the
model using
some of the German audiobooks in my collection.
Cheers,
Craig.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I am about to extend the dictionary, and intend to try both a translation
from IPA to Arpabet
(using Ralph's German dictionary, http://spirit.blau.in/simon/2009/10/24/ralfs-german-dictionary-
version-016/ ) and also a direct phonetic interpretation of the spelling.
This is probably more achievable in German than it would be in English,
as German pronunciation is fairly predictable.
The existing dictionary was built with espeak TTS and the script from the
acoustic model package root so you can easily
extend it to any vocabulary you like. There is Ralf's work. There are
commercial alternatives (BOMP) for example which could be
better in some situations. anyway, you have many choices here and could pick
the best one. For voxforge any consistent dictionary will be good and it would
be very nice to retrain the German model with it.
Existing dictionary can be extended manually or with G2P software. I would
also recommend
you to plug into TTS engine like OpenMARY which can actually generate
pronunciations without
fixed vocabulary specified by the dictionary. This will also solve
tokenization issue when you need
to convert numbers and abbreviations into textual form.
I noted that someone up above asked about the double-q phoneme that appears
before initial vowels. There was no clear answer... Does anyone know why it
is there and what it means? Would it matter if this was left out of the
dictionary?
qq means glottal stop which usually present before vowel. Though some German
phoneticians disagree on that.
See related discussion.
Also,
the unstressed e (the schwa) is represented by an @ symbol, which is not part
of the standard Arpabet ( http://www.speech.cs.cmu.edu/cgi-
bin/cmudict ). Is it necessary
to keep the existing phoneme labels to use the acoustic model?
They are not Arpabet in any sense. Phones shouldn't be the part of the
Arpabet, it's unrelated thing. They are just phones specific for each model.
It's preferred to have ASCII-only case-insensitive phones. You can choose
whatever names you like.
If someone has already created a large German pronunciation dictionary
and/or
acoustic model, I would like to share it. Also, I can probably help train the
model using some of the German audiobooks in my collection.
Ralf did, why don't you use his work.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I'd be happy to use Ralph's work but didn't find it in a format ready to be
used in Sphinx. Instead I found a huge XML file that needs reformatting to
remove all the XML tags. I'll search through the links above and the related
discussion and see if I can find a better link than the one I originally
posted.
It seems to me that the glottal stop would not stand alone as a phoneme very
well, since it is short and voiceless. Isn't it more a means of transitioning
between vowel sounds (as in the hyphen in "uh-oh"), or perhaps starting a
vowel abruptly? In other words it might modify the neighbouring phonemes more
than exist as a context-insensitive phoneme in its own right. My guyess is
that dropping it would make little difference.
My question about the nonstandard phone labels (@ and so on) related to the
acoustic model downloaded from the link I posted. I know you can use whatever
labels you like in making a model, but I already have the model that came with
the German digits demo ( http://www.mediafire.com/?j1l9d0ujmgg
), along with a tiny dictionary that uses the same labels. If I use that
acoustic model, am I stuck with the labels its creators chose, or is there
some way of changing them? If Ralph already has a better acoustic model and
matching dictionary then the question is somewhat academic, but I am still
curious.
Cheers,
Craig.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
So I need to write a program to convert a massive XML file into the right
format, run a Perl script to get access to the phoneme labels, etc... Sure,
it's all possible, but it's hardly set up for easy use. I'm part-way through
converting the XML file with my own Java program but it is so large my IDE can
barely handle it. Is there really no-one who has a decent, ready-to-use Sphinx
set-up for German?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi all. We are happy to announce first open source German acoustic model based on free GPL speech corpus. You can download it at Voxforge as usual:
http://voxforge.org/home/downloads
The model was trained from 5 hours of speech, unfortunately from a small amount of speakers. So please help to improve it - submit your speech to voxforge.
I tried the new model, mostly in the S4 HelloNGram demo.
For digits it works great, but free speech is still pretty bad, though I could improve accuracy a lot by creating a .lm from the full transcription file (3000 sentences).
One of the main problems I encounter is still the missing word problem I posted above, with the German umlauts.
I guess it's a problem in the getWord() method of the FastDictionary component, though it's probably fixed by now in SVN? (I'm still using the original sphinx4-1.0beta package).
I hope me and some other Germans will find some time to contribute to the voxforge project as the model is currently very much overtrained by Ralf's files.
Is there a possibility of getting all the transcripts (all the prompts.txt files) in the repository without downloading the audio files or going through every submission in the "listen" section?
We could then at least create a probably pretty decent language model for free speech ...
regarding the missing word error:
All the words containing those umlauts are correct in the dictionary, transcription and language model.
There seems to be a problem loading them from he dictionary.
If you're not encountering this problem, I should probably upgrade my sphinx4 to a newer SVN version.
The mentioned problem is severe for recognition because in Ralf's samples there are a lot of words with umlauts.
Decoding with S3 there's apparently no such problem as the digit "fünf" containing an umlaut is correctly recognized.
Hi Nickolay,
cool! Would it be possible to provide a script along with the files which creates a s4-AcousticModel-jar?
-Holger
Sure, I'll upload jar too. Right now I just need help from someone who knows the language in order to fix rather significant amount of BW errors during training.
Well, I could probably help you a little.
I'm a German student and I've been waiting for a German acoustic model for some time now.
I'll try out the one from Voxforge and if I can be of some (not too much time consuming) help, feel free to ask me anything.
Thanks! Your help is really appreciated. We have to share audio first (it's available for download from voxforge right now but in rather unpleasant way (you have to use wget). I would be really happy if someone could review the dictionary and prompts. Training gives a lot of errors on bad alignment, so one with German experience should check transcription.
About jsgf and missing phones, it should work, model has test subfolder with test script for sphinx3. I think it should be rather stable to reproduce. I'll try to upload jar too so we can check each other.
Could you post some links to the files I have to download? I can't really find it on their website? Or should I download everything in the speech corpus from the "listen" section?
The Phoneset seems pretty good, also the pronunciation in the dictionary, but to individually check all 3200+ entries could take some time. Could someone explain to me, why the phone "qq" is used in front of every word beginning with a vocal ?
I could check the transcript for errors, but where can I find the prompts to download?
If you have a link to a .jar version of the acoustic model for S4, I could test it with a few demos. Mine seems to be created wrong, at least flatLinguist complains as stated above, maybe I should create one with the cd-8gau-files?
> Could you post some links to the files I have to download? I can't really find it on their website? Or should I download everything in the speech corpus from the "listen" section?
Yeah, currently they are only available in Listen. You can get them with wget for example. See the discussion at the bottom of the:
http://voxforge.org/home/forums/other-languages/german/localizing-the-speechsubmission-app-to-german?pn=2
> The Phoneset seems pretty good, also the pronunciation in the dictionary, but to individually check all 3200+ entries could take some time. Could someone explain to me, why the phone "qq" is used in front of every word beginning with a vocal ?
I could check the transcript for errors, but where can I find the prompts to download?
I understand. The problem is that due to the Bomp restrictions dictionary is created by espeak rules and a little perl script for phone mapping. Probably qq is really not needed for the beginning. I suppose the dictionary is not correct just because there are around 300 rejected prompts as you can see in the log.
> If you have a link to a .jar version of the acoustic model for S4, I could test it with a few demos. Mine seems to be created wrong, at least flatLinguist complains as stated above, maybe I should create one with the cd-8gau-files?
I'll try to prepare it tomorrow.
Ok, here is an example for you to test sphinx4:
http://www.mediafire.com/?j1l9d0ujmgg
Thanks.
I did some testing with it.
Decoding digits with the WavFile or the HelloDigits demo and a jsgf grammar works very good.
When using a language model, created from the transcript file, and decoding with the HelloNGram demo, digits work pretty good. But recognition of random speech is really bad (almost completely random results).
Was the acoustic model created from all the files in the German "Listen" section on voxforge.org ?
Because even decoding audio files by Ralf Herzog (which should have been trained I guess) returns really bad results.
So I don't know what's wrong, but a model created from all those sentences in the German corpus should decode better ...
Also, I don't know if it's a problem on my side, but when starting the HelloNGram demo, it complains about all the words with "umlaute", like äöü, though they are represented correctly in the dictionary and the language model.
Here's a little output:
02:04.181 WARNING dictionary Missing word: bᅢᄐcher
in edu.cmu.sphinx.linguist.dictionary.FastDictionary:getWord-
dictionary
02:04.181 WARNING dictionary Missing word: verzᅢᄊgert
in edu.cmu.sphinx.linguist.dictionary.FastDictionary:getWord-
dictionary
02:04.195 WARNING dictionary Missing word: hartnᅢᄂckig
in edu.cmu.sphinx.linguist.dictionary.FastDictionary:getWord-
dictionary
02:04.196 WARNING dictionary Missing word: abstᅢᄂnden
in edu.cmu.sphinx.linguist.dictionary.FastDictionary:getWord-
dictionary
02:04.196 WARNING dictionary Missing word: fᅢᄐrsorglichkeit
in edu.cmu.sphinx.linguist.dictionary.FastDictionary:getWord-
dictionary
02:04.196 WARNING dictionary Missing word: gefᅢᄂhrlich
in edu.cmu.sphinx.linguist.dictionary.FastDictionary:getWord-
dictionary
02:04.196 WARNING dictionary Missing word: verhᅢᄂltnis
in edu.cmu.sphinx.linguist.dictionary.FastDictionary:getWord-
dictionary
02:04.196 WARNING dictionary Missing word: dᅢᄐrfen
in edu.cmu.sphinx.linguist.dictionary.FastDictionary:getWord-
dictionary
02:04.196 WARNING dictionary Missing word: kᅢᄐchen
in edu.cmu.sphinx.linguist.dictionary.FastDictionary:getWord-
dictionary
02:04.196 WARNING dictionary Missing word: zusᅢᄂtzliche
in edu.cmu.sphinx.linguist.dictionary.FastDictionary:getWord-
dictionary
02:04.196 WARNING dictionary Missing word: wᅢᄂhrungspolitik
in edu.cmu.sphinx.linguist.dictionary.FastDictionary:getWord-
dictionary
02:04.196 WARNING dictionary Missing word: gᅢᄂngigsten
in edu.cmu.sphinx.linguist.dictionary.FastDictionary:getWord-
dictionary
Is FastDictionary having problems with those (notice the wrong display of the letters) or is it something I have set wrong ?
The WavFile and HelloDigits demos didn't complain about the digit "fünf" for example.
BTW,
what problems in the training run do you mean?
Looking at the html file, your last training run completed successful.
Do you mean the "final state not reached" errors in the cd-training logs?
I don't know exactly what those mean. Is it that there's something wrong with the audio file to the corresponding utterance in the transcript?
If every one of those utterances is ignored, it's of course a big problem ... would it be hard to force-align them, so they're used in BW training anyway?
I listened to some of the problematic sound files, and they were perfectly ok, no cut-offs or anything. I don't know why the error occurs for them as opposed to the other files.
Also, why are there only few submission by Ralf Herzog trainied? (de91 through de120)
Would the model be overtrained otherwise ?
About errors, yeah I meant errors when final state not reached in cd logs. It mostly means that transcription is out-of-sync with the audio. Probably not in this particular place but it signals about problem in transcription and/or dictionary. That's why I'm asking for review here. I think we need to start with a smaller set of prompts and find a problem in our dictionary. Then we can double the data and keep an eye on dropped prompts. Final state should always be reached. Force align is a good idea indeed, I'll try it too.
Overtraining is a problem, I several times told Ralf we don't need so many recordings from him. But since there is no more data I think we'll train on existing audio. There are some other speakers so I hope it will be ok for beginning. Also, we'll probably use Ralf's voice for TTS
database, so his work at least sensible. I trained only some of them just because I didn't download everything else. We are still in process of transferring the data to repository.
About problems with generic recognition, could you please submit a sample as I did. I'll try to look and check what's wrong there.
Ich checked the transcript and dictionary for spelling. Everything's alright here.
I also checked some of the problematic sound files (those mentioned in the training logs) with audacity, they are completely normal, like any of the other files.
I really don't know what's wrong here ...
Also, I still don't know why I'm getting those errors with the special German characters (like posted above).
I tried creating an LM from the transcript and it worked, but still the decoding is pretty bad, probably because of all the "missing" words from the dictionary and the skipped utterances in training.
If can can give me a hint in the right direction to go on, please do so.
Ah, I've found the problem:
Uttid mismatch: ctlfile = "de100-75"; transcript = "de91-75"
.fileids file is just not in sync with transcript file. I'll retrain model and upload audio this weekend.
Well, I uploaded new models, they must be available here soon:
http://www.repository.voxforge1.org/downloads/de/Trunk/AcousticModels/
audio is also available:
http://www.repository.voxforge1.org/downloads/de/Trunk/Audio/Main/16kHz_16bit/
it's more than 21 hours.
The model is clearly overtrained mostly because of Ralf's submission. But please try it and report about results.
I just encountered a problem, I don't know if it's my mistake:
I created a .jar S4 Model to use the German Voxforge Model for testing the HelloDigits demo.
I also created a .lm with the LMTool.
I tried running the demo, both with an lmGrammar or a jsgfGrammar, but flatLinguist always complains about missing HMM for certain phones, though the phones are in the .phone file and in the dictionary and the words in the jsgfGrammar are all in the dictionary and transcript ...
I used the ci_cont-Models, should I better use the CD ones ?
flatLinguist can't find the phone "qq" when using a lmGrammar for the demo and the phone "v" when using a jsgfGrammar.
For the jsgf I only used German digits, so the phone "v" is only in the word "zwei".
Have I done something wrong anywhere ? Seems like my acoustic model is broken.?
Hi people,
I came across this thread while searching for information about German speech
recognition with Sphinx. I have downloaded the German acoustic model,
dictionary
and a simple demo from:
http://www.mediafire.com/?j1l9d0ujmgg
The digits example works well, but the dictionary has only a small number of
words and was missing a majority of the words in a portion of sample text. I
am about to
extend the dictionary, and intend to try both a translation from IPA to
Arpabet (using Ralph's German dictionary,
http://spirit.blau.in/simon/2009/10/24/ralfs-
german-
dictionary-version-016/ ) and also a direct phonetic interpretation of the
spelling. This is probably more achievable in German than it would be in
English, as German
pronunciation is fairly predictable.
The current German dictionary from the above link is in a slightly odd format,
however. Here is a sample:
ab qq a p
abbauen qq a b au @ n
aber qq aa: b ei
abfälle qq a p f ee l @
abgaben qq a p g aa: b @ n
abgebaut qq a p g @ b au t
abgeben qq a p g e: b @ n
abgebrochen qq a p g @ b r oo x @ n
abgedeckt qq a p g @ d ee k t
abgefallen qq a p g @ f a l @ n
abgefunden qq a p g @ f uu n d @ n
abgehalten qq a p g @ h a l t @ n
I noted that someone up above asked about the double-q phoneme that appears
before initial vowels. There was no clear answer... Does anyone know why it is
there
and what it means? Would it matter if this was left out of the dictionary?
Also, the unstressed e (the schwa) is represented by an @ symbol, which is not
part of the
standard Arpabet ( http://www.speech.cs.cmu.edu/cgi-
bin/cmudict ). Is it necessary
to keep the existing phoneme labels to use the acoustic model?
If someone has already created a large German pronunciation dictionary and/or
acoustic model, I would like to share it. Also, I can probably help train the
model using
some of the German audiobooks in my collection.
Cheers,
Craig.
Hello
The existing dictionary was built with espeak TTS and the script from the
acoustic model package root so you can easily
extend it to any vocabulary you like. There is Ralf's work. There are
commercial alternatives (BOMP) for example which could be
better in some situations. anyway, you have many choices here and could pick
the best one. For voxforge any consistent dictionary will be good and it would
be very nice to retrain the German model with it.
Existing dictionary can be extended manually or with G2P software. I would
also recommend
you to plug into TTS engine like OpenMARY which can actually generate
pronunciations without
fixed vocabulary specified by the dictionary. This will also solve
tokenization issue when you need
to convert numbers and abbreviations into textual form.
qq means glottal stop which usually present before vowel. Though some German
phoneticians disagree on that.
See related discussion.
http://www.voxforge.org/home/forums/message-boards/general-discussion
/dictionary-format
They are not Arpabet in any sense. Phones shouldn't be the part of the
Arpabet, it's unrelated thing. They are just phones specific for each model.
It's preferred to have ASCII-only case-insensitive phones. You can choose
whatever names you like.
Ralf did, why don't you use his work.
Thanks for the answers...
I'd be happy to use Ralph's work but didn't find it in a format ready to be
used in Sphinx. Instead I found a huge XML file that needs reformatting to
remove all the XML tags. I'll search through the links above and the related
discussion and see if I can find a better link than the one I originally
posted.
Thanks for the info about the glottal stop - I had since found it on wiki: ht
tp://en.wikipedia.org/wiki/Glottal_stop
It seems to me that the glottal stop would not stand alone as a phoneme very
well, since it is short and voiceless. Isn't it more a means of transitioning
between vowel sounds (as in the hyphen in "uh-oh"), or perhaps starting a
vowel abruptly? In other words it might modify the neighbouring phonemes more
than exist as a context-insensitive phoneme in its own right. My guyess is
that dropping it would make little difference.
My question about the nonstandard phone labels (@ and so on) related to the
acoustic model downloaded from the link I posted. I know you can use whatever
labels you like in making a model, but I already have the model that came with
the German digits demo (
http://www.mediafire.com/?j1l9d0ujmgg
), along with a tiny dictionary that uses the same labels. If I use that
acoustic model, am I stuck with the labels its creators chose, or is there
some way of changing them? If Ralph already has a better acoustic model and
matching dictionary then the question is somewhat academic, but I am still
curious.
Cheers,
Craig.
Yes, you need a simple script to convert
You should use same labels and same way to build the dictionary. That's
espeak2phones.pl from the archive
http://www.repository.voxforge1.org/downloads/de/Archive/voxforge-
de.tar.gz
But the advantage of voxforge is that you can easily retrain the model with
your own dictionary. There is no problem doing that.
"Ralp did, why don't you use his work."
"Yes, you need a simple script to convert "
So I need to write a program to convert a massive XML file into the right
format, run a Perl script to get access to the phoneme labels, etc... Sure,
it's all possible, but it's hardly set up for easy use. I'm part-way through
converting the XML file with my own Java program but it is so large my IDE can
barely handle it. Is there really no-one who has a decent, ready-to-use Sphinx
set-up for German?