German voxforge model has a script espeak2phones.pl included into the archive to create a dictionary for unknown words.
You can use this script to create a dictionary for unknown words from the list. You can add this dictionary to the original dictionary before the adaptation.
You need to have espeak installed for this script to work.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I am getting errors when I run espeak2phones.pl as such, it runs fine when
i input just a few words -list is attached- , then throws errors as follow:
something did not match: R'In
something did not match: R@n
here is the list:
~~~~~~~~~~~~~~`
ABWESENHEIT
ABZUHOLEN
ALLEM
ALTERSLOSEM
ANBEHALTEN
ANRUFT
ANTWORT
ANTWORTEST
ANZAHL
ARM
AUFBOHREN
AUGEN
AUSGESPUCKT
AUßER
BADEN
BADEST
BAHNHOF
BAR
BEGEGNEN
BEGEGNUNG
BEGEHREN
BEKANNTE
BESITZEN
BESTIMMEN
BETT
BEWEGT
BEWEGUNGEN
BLAU
BLIEBE
BOHRMASCHINE
BRAUCHEN
BRÄUCHTEST
BÄUME
CAFÉ
DARIN
DASSELBE
DEIN
DEINE
DEINEM
DEINEN
DUNKLE
EINSTRÖMENDE
EINZULASSEN
ERFAHRUNG
ERINNERUNGSWELLE
ERKENNST
ERSEHNT
ERWACHT
ERWARTUNGEN
FLIEßENDE
FLUCHT
FOLGENDEN
FRAGST
FREMDE
FÄLLT
FÜHLT
FÜNF
FÜR
GEBOREN
GEGENWÄRTIGKEIT
GEHÖRT
GENAUES
GESICHT
GESPROCHEN
GIBST
GINGE
GLEICHGÜLTIGKEIT
GROßSTADT
GÄBE
HALTESTELLE
HAUT
HEBT
HINTERLÄSST
HOFFEN
HÄLTST
HÄTTE
HÄTTEST
HÖRST
IDEEN
IHREM
IHRER
ILLUSIONEN
INNERER
INNERLICH
JEMANDES
JUNG
KEINEM
KEINER
KLÄNGE
KONKRETES
KURZLEBIG
KÄME
KÖNNTE
KÖNNTEST
KÜHLER
LANDSCHAFT
LANGSAM
LEBENDIGKEIT
LEISE
LESENDER
LUFTHAUCH
LÄNGER
MACHT
MANCHMAL
MARMOR
MENSCH
MONDLICHT
MÜDE
NACHAHMENS
NACHFORSCHENS
NACHTS
NAH
NIE
NÄHE
NÜTZTE
OFFENBART
PARK
PERFEKT
RAUSCHEN
REDEST
RUNDEN
SAGST
SAGT
SCHATTEN
SCHAUT
SCHLEIER
SCHLÜSSEL
SCHREIEN
SCHULTERLANGES
SCHWARZ
SCHWEIGEN
SOLLTEST
SONST
STELLST
STILLE
STIMME
STIMMEN
STRECKST
STRUKTUR
STUNDEN
STÜNDE
SUCHT
TAG
TANZ
TON
TRENNT
TROTZ
TRÄGT
UNAUFLÖSBAR
UNERGRÜNDLICHER
UNERMESSLICHE
VERANTWORTUNG
VERGANGENHEIT
VERGANGENHEITEN
VERGESSEN
VERLIERT
VERLÄSST
VERSCHLUCKT
VERSCHWENDUNG
VERZERRTE
VORAUSGEHT
VORSTELLUNG
WACHSENDE
WASSER
WEIßT
WELT
WIEDERSEHEN
WIRKLICHKEIT
WOCHEN
WORT
WORTE
WORTLOS
WÄHREND
WÄRE
WÄREN
WÄRST
WÜRDE
WÜRDEST
ZEICHNUNG
ZERBRECHLICHER
ZUFÄLLIG
ZUG
ÄLTER
ÜBERALL
ÜBERFÄLLT
~~~~~~~~~~~~~~~
Last edit: Nickolay V. Shmyrev 2014-02-12
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
You need to extend espeak2phones to work with new espeak a bit by adding new maps from espeak phones like r to voxforge phones. An updated script is attached, itshould work fine for your whole list
You can join two text files with paste command. So the real command to run espeak2phones must be like this:
Almost there, now there is only one word 'IDEEN' sending the error :
'something did not match: -'
And i suppose i just paste the additional words into the Voxforge dictionary? do they need to be in alphabetical order, or have additional specificities?
Last edit: ventoline 2014-02-13
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I think I am hitting another kind of error related to this: The words concerned seem to contain umlaut letters
here is the error, how should i fix this? I am a bit cautious about modifying the dic. phonems myself. 16:04:05.687 SEVERE lexTreeLinguist Bad HMM Unit: ui
Feb 25, 2014 4:04:05 PM edu.cmu.sphinx.linguist.lextree.HMMTree addPronunciation
SEVERE: Missing HMM for unit r with lc=t rc=ui
16:04:05.696 SEVERE lexTreeLinguist Bad HMM Unit: ui
Feb 25, 2014 4:04:05 PM edu.cmu.sphinx.linguist.lextree.HMMTree addPronunciation
SEVERE: Missing HMM for unit ui with lc=r rc=m
16:04:05.697 SEVERE lexTreeLinguist Bad HMM Unit: ui
Feb 25, 2014 4:04:05 PM edu.cmu.sphinx.linguist.lextree.HMMTree addPronunciation
SEVERE: Missing HMM for unit m with lc=ui rc=t
16:04:05.703 SEVERE lexTreeLinguist Bad HMM Unit: ui
Feb 25, 2014 4:04:05 PM edu.cmu.sphinx.linguist.lextree.HMMTree addPronunciation
SEVERE: Missing HMM for unit ui with lc=k rc=n
16:04:05.705 SEVERE lexTreeLinguist Bad HMM Unit: ui
Feb 25, 2014 4:04:05 PM edu.cmu.sphinx.linguist.lextree.HMMTree addPronunciation
SEVERE: Missing HMM for unit n with lc=ui rc=t
16:04:05.712 SEVERE lexTreeLinguist Bad HMM Unit: ui
Feb 25, 2014 4:04:05 PM edu.cmu.sphinx.linguist.lextree.HMMTree addPronunciation
SEVERE: Missing HMM for unit f with lc=ui rc=n
16:04:05.715 SEVERE lexTreeLinguist Bad HMM Unit: ui
Feb 25, 2014 4:04:05 PM edu.cmu.sphinx.linguist.lextree.HMMTree addPronunciation
SEVERE: Missing HMM for unit ui with lc=k rc=r
16:04:05.715 SEVERE lexTreeLinguist Bad HMM Unit: ui
Feb 25, 2014 4:04:05 PM edu.cmu.sphinx.linguist.lextree.HMMTree addPronunciation
SEVERE: Missing HMM for unit r with lc=ui rc=p
16:04:05.721 SEVERE lexTreeLinguist Bad HMM Unit: e
Feb 25, 2014 4:04:05 PM edu.cmu.sphinx.linguist.lextree.HMMTree addPronunciation
SEVERE: Missing HMM for unit t with lc=ee: rc=e
16:04:05.721 SEVERE lexTreeLinguist Bad HMM Unit: e
Feb 25, 2014 4:04:05 PM edu.cmu.sphinx.linguist.lextree.HMMTree addPronunciation
SEVERE: Missing HMM for unit e with lc=t rc=s
16:04:05.722 SEVERE lexTreeLinguist Bad HMM Unit: e
Feb 25, 2014 4:04:05 PM edu.cmu.sphinx.linguist.lextree.HMMTree addPronunciation
SEVERE: Missing HMM for unit s with lc=e rc=t
16:04:05.767 SEVERE lexTreeLinguist Bad HMM Unit: ui
I am adapting voxforge German, and when testing i get the following warnings:
*
INFO: cmn.c(175): CMN: 19.48 -1.38 -0.35 -0.34 -0.27 -0.27 -0.22 -0.23 -0.22 -0.18 -0.21 -0.19 -0.18
WARNING: "mk_phone_list.c", line 178: Unable to lookup word 'MÜDE' in the lexicon
WARNING: "next_utt_states.c", line 83: Unable to produce phonetic transcription for the utterance '
ZU MÜDE SICH ZU ÄUßERN'WARNING: "main.c", line 841: Skipped utterance '
ZU MÜDE SICH ZU ÄUßERN'utt> 334 bild4_tagblende__txt_150513_Audio_Extracted_16_16Hz 885 0 16 utt 0.000x 0.200e upd 0.000x 0.000e fwd 0.000x 0.000e bwd 0.000x 0.000e gau 0.000x 0.000e rsts 0.000x 0.000e rstf 0.000x 0.000e rstu 0.000x 0.000e
INFO: cmn.c(175): CMN: 19.53 -1.35 -0.34 -0.35 -0.27 -0.27 -0.23 -0.23 -0.21 -0.19 -0.21 -0.20 -0.18
WARNING: "mk_phone_list.c", line 178: Unable to lookup word 'WÄRST' in the lexicon
WARNING: "next_utt_states.c", line 83: Unable to produce phonetic transcription for the utterance '
DU WÄRST IHR NAH'WARNING: "main.c", line 841: Skipped utterance '
DU WÄRST IHR NAH'utt> 335 bild4_tagblende__txt_150513_Audio_Extracted_17_16Hz 1549 0 16 utt 0.000x 0.000e upd 0.000x 0.000e fwd 0.000x 0.000e bwd 0.000x 0.000e gau 0.000x 0.000e rsts 0.000x 0.000e rstf 0.000x 0.000e rstu 0.000x 0.000e
*
I am missing 160 words in the current dictionary, how can I add them in the adaptation?
German voxforge model has a script espeak2phones.pl included into the archive to create a dictionary for unknown words.
You can use this script to create a dictionary for unknown words from the list. You can add this dictionary to the original dictionary before the adaptation.
You need to have espeak installed for this script to work.
I am getting errors when I run espeak2phones.pl as such, it runs fine when
i input just a few words -list is attached- , then throws errors as follow:
something did not match: R'In
something did not match: R@n
here is the list:
~~~~~~~~~~~~~~`
ABWESENHEIT
ABZUHOLEN
ALLEM
ALTERSLOSEM
ANBEHALTEN
ANRUFT
ANTWORT
ANTWORTEST
ANZAHL
ARM
AUFBOHREN
AUGEN
AUSGESPUCKT
AUßER
BADEN
BADEST
BAHNHOF
BAR
BEGEGNEN
BEGEGNUNG
BEGEHREN
BEKANNTE
BESITZEN
BESTIMMEN
BETT
BEWEGT
BEWEGUNGEN
BLAU
BLIEBE
BOHRMASCHINE
BRAUCHEN
BRÄUCHTEST
BÄUME
CAFÉ
DARIN
DASSELBE
DEIN
DEINE
DEINEM
DEINEN
DUNKLE
EINSTRÖMENDE
EINZULASSEN
ERFAHRUNG
ERINNERUNGSWELLE
ERKENNST
ERSEHNT
ERWACHT
ERWARTUNGEN
FLIEßENDE
FLUCHT
FOLGENDEN
FRAGST
FREMDE
FÄLLT
FÜHLT
FÜNF
FÜR
GEBOREN
GEGENWÄRTIGKEIT
GEHÖRT
GENAUES
GESICHT
GESPROCHEN
GIBST
GINGE
GLEICHGÜLTIGKEIT
GROßSTADT
GÄBE
HALTESTELLE
HAUT
HEBT
HINTERLÄSST
HOFFEN
HÄLTST
HÄTTE
HÄTTEST
HÖRST
IDEEN
IHREM
IHRER
ILLUSIONEN
INNERER
INNERLICH
JEMANDES
JUNG
KEINEM
KEINER
KLÄNGE
KONKRETES
KURZLEBIG
KÄME
KÖNNTE
KÖNNTEST
KÜHLER
LANDSCHAFT
LANGSAM
LEBENDIGKEIT
LEISE
LESENDER
LUFTHAUCH
LÄNGER
MACHT
MANCHMAL
MARMOR
MENSCH
MONDLICHT
MÜDE
NACHAHMENS
NACHFORSCHENS
NACHTS
NAH
NIE
NÄHE
NÜTZTE
OFFENBART
PARK
PERFEKT
RAUSCHEN
REDEST
RUNDEN
SAGST
SAGT
SCHATTEN
SCHAUT
SCHLEIER
SCHLÜSSEL
SCHREIEN
SCHULTERLANGES
SCHWARZ
SCHWEIGEN
SOLLTEST
SONST
STELLST
STILLE
STIMME
STIMMEN
STRECKST
STRUKTUR
STUNDEN
STÜNDE
SUCHT
TAG
TANZ
TON
TRENNT
TROTZ
TRÄGT
UNAUFLÖSBAR
UNERGRÜNDLICHER
UNERMESSLICHE
VERANTWORTUNG
VERGANGENHEIT
VERGANGENHEITEN
VERGESSEN
VERLIERT
VERLÄSST
VERSCHLUCKT
VERSCHWENDUNG
VERZERRTE
VORAUSGEHT
VORSTELLUNG
WACHSENDE
WASSER
WEIßT
WELT
WIEDERSEHEN
WIRKLICHKEIT
WOCHEN
WORT
WORTE
WORTLOS
WÄHREND
WÄRE
WÄREN
WÄRST
WÜRDE
WÜRDEST
ZEICHNUNG
ZERBRECHLICHER
ZUFÄLLIG
ZUG
ÄLTER
ÜBERALL
ÜBERFÄLLT
~~~~~~~~~~~~~~~
Last edit: Nickolay V. Shmyrev 2014-02-12
I have singled out the word which were throwing errors, now I have a file with phonemes only, how yo extend the Voxforge dictionary with it?
Hello
You need to extend espeak2phones to work with new espeak a bit by adding new maps from espeak phones like r to voxforge phones. An updated script is attached, itshould work fine for your whole list
You can join two text files with
paste
command. So the real command to run espeak2phones must be like this:Almost there, now there is only one word 'IDEEN' sending the error :
'something did not match: -'
And i suppose i just paste the additional words into the Voxforge dictionary? do they need to be in alphabetical order, or have additional specificities?
Last edit: ventoline 2014-02-13
I think I am hitting another kind of error related to this: The words concerned seem to contain umlaut letters
here is the error, how should i fix this? I am a bit cautious about modifying the dic. phonems myself.
16:04:05.687 SEVERE lexTreeLinguist Bad HMM Unit: ui
Feb 25, 2014 4:04:05 PM edu.cmu.sphinx.linguist.lextree.HMMTree addPronunciation
SEVERE: Missing HMM for unit r with lc=t rc=ui
16:04:05.696 SEVERE lexTreeLinguist Bad HMM Unit: ui
Feb 25, 2014 4:04:05 PM edu.cmu.sphinx.linguist.lextree.HMMTree addPronunciation
SEVERE: Missing HMM for unit ui with lc=r rc=m
16:04:05.697 SEVERE lexTreeLinguist Bad HMM Unit: ui
Feb 25, 2014 4:04:05 PM edu.cmu.sphinx.linguist.lextree.HMMTree addPronunciation
SEVERE: Missing HMM for unit m with lc=ui rc=t
16:04:05.703 SEVERE lexTreeLinguist Bad HMM Unit: ui
Feb 25, 2014 4:04:05 PM edu.cmu.sphinx.linguist.lextree.HMMTree addPronunciation
SEVERE: Missing HMM for unit ui with lc=k rc=n
16:04:05.705 SEVERE lexTreeLinguist Bad HMM Unit: ui
Feb 25, 2014 4:04:05 PM edu.cmu.sphinx.linguist.lextree.HMMTree addPronunciation
SEVERE: Missing HMM for unit n with lc=ui rc=t
16:04:05.712 SEVERE lexTreeLinguist Bad HMM Unit: ui
Feb 25, 2014 4:04:05 PM edu.cmu.sphinx.linguist.lextree.HMMTree addPronunciation
SEVERE: Missing HMM for unit f with lc=ui rc=n
16:04:05.715 SEVERE lexTreeLinguist Bad HMM Unit: ui
Feb 25, 2014 4:04:05 PM edu.cmu.sphinx.linguist.lextree.HMMTree addPronunciation
SEVERE: Missing HMM for unit ui with lc=k rc=r
16:04:05.715 SEVERE lexTreeLinguist Bad HMM Unit: ui
Feb 25, 2014 4:04:05 PM edu.cmu.sphinx.linguist.lextree.HMMTree addPronunciation
SEVERE: Missing HMM for unit r with lc=ui rc=p
16:04:05.721 SEVERE lexTreeLinguist Bad HMM Unit: e
Feb 25, 2014 4:04:05 PM edu.cmu.sphinx.linguist.lextree.HMMTree addPronunciation
SEVERE: Missing HMM for unit t with lc=ee: rc=e
16:04:05.721 SEVERE lexTreeLinguist Bad HMM Unit: e
Feb 25, 2014 4:04:05 PM edu.cmu.sphinx.linguist.lextree.HMMTree addPronunciation
SEVERE: Missing HMM for unit e with lc=t rc=s
16:04:05.722 SEVERE lexTreeLinguist Bad HMM Unit: e
Feb 25, 2014 4:04:05 PM edu.cmu.sphinx.linguist.lextree.HMMTree addPronunciation
SEVERE: Missing HMM for unit s with lc=e rc=t
16:04:05.767 SEVERE lexTreeLinguist Bad HMM Unit: ui
The message says you that the phoneset of the dictionary doesn't match with the phoneset of the model. You can not decode this way.
You should either update your dictionary to not include the missing phones or you need to retrain the model.
This is the easiest way. Make sure that dictionary creation script replaces the missing phones with the ones which are present in the dictionary.