Menu

extending vocabulary when adapting Acoustic model

Help
ventoline
2014-02-11
2014-02-25
  • ventoline

    ventoline - 2014-02-11

    I am adapting voxforge German, and when testing i get the following warnings:

    *
    INFO: cmn.c(175): CMN: 19.48 -1.38 -0.35 -0.34 -0.27 -0.27 -0.22 -0.23 -0.22 -0.18 -0.21 -0.19 -0.18
    WARNING: "mk_phone_list.c", line 178: Unable to lookup word 'MÜDE' in the lexicon
    WARNING: "next_utt_states.c", line 83: Unable to produce phonetic transcription for the utterance ' ZU MÜDE SICH ZU ÄUßERN '
    WARNING: "main.c", line 841: Skipped utterance ' ZU MÜDE SICH ZU ÄUßERN '
    utt> 334 bild4_tagblende__txt_150513_Audio_Extracted_16_16Hz 885 0 16 utt 0.000x 0.200e upd 0.000x 0.000e fwd 0.000x 0.000e bwd 0.000x 0.000e gau 0.000x 0.000e rsts 0.000x 0.000e rstf 0.000x 0.000e rstu 0.000x 0.000e

    INFO: cmn.c(175): CMN: 19.53 -1.35 -0.34 -0.35 -0.27 -0.27 -0.23 -0.23 -0.21 -0.19 -0.21 -0.20 -0.18
    WARNING: "mk_phone_list.c", line 178: Unable to lookup word 'WÄRST' in the lexicon
    WARNING: "next_utt_states.c", line 83: Unable to produce phonetic transcription for the utterance ' DU WÄRST IHR NAH '
    WARNING: "main.c", line 841: Skipped utterance ' DU WÄRST IHR NAH '
    utt> 335 bild4_tagblende__txt_150513_Audio_Extracted_17_16Hz 1549 0 16 utt 0.000x 0.000e upd 0.000x 0.000e fwd 0.000x 0.000e bwd 0.000x 0.000e gau 0.000x 0.000e rsts 0.000x 0.000e rstf 0.000x 0.000e rstu 0.000x 0.000e
    *

    I am missing 160 words in the current dictionary, how can I add them in the adaptation?

     
  • Nickolay V. Shmyrev

    German voxforge model has a script espeak2phones.pl included into the archive to create a dictionary for unknown words.

    You can use this script to create a dictionary for unknown words from the list. You can add this dictionary to the original dictionary before the adaptation.

    You need to have espeak installed for this script to work.

     
    • ventoline

      ventoline - 2014-02-12

      I am getting errors when I run espeak2phones.pl as such, it runs fine when
      i input just a few words -list is attached- , then throws errors as follow:
      something did not match: R'In
      something did not match: R@n

      here is the list:

      ~~~~~~~~~~~~~~`
      ABWESENHEIT
      ABZUHOLEN
      ALLEM
      ALTERSLOSEM
      ANBEHALTEN
      ANRUFT
      ANTWORT
      ANTWORTEST
      ANZAHL
      ARM
      AUFBOHREN
      AUGEN
      AUSGESPUCKT
      AUßER
      BADEN
      BADEST
      BAHNHOF
      BAR
      BEGEGNEN
      BEGEGNUNG
      BEGEHREN
      BEKANNTE
      BESITZEN
      BESTIMMEN
      BETT
      BEWEGT
      BEWEGUNGEN
      BLAU
      BLIEBE
      BOHRMASCHINE
      BRAUCHEN
      BRÄUCHTEST
      BÄUME
      CAFÉ
      DARIN
      DASSELBE
      DEIN
      DEINE
      DEINEM
      DEINEN
      DUNKLE
      EINSTRÖMENDE
      EINZULASSEN
      ERFAHRUNG
      ERINNERUNGSWELLE
      ERKENNST
      ERSEHNT
      ERWACHT
      ERWARTUNGEN
      FLIEßENDE
      FLUCHT
      FOLGENDEN
      FRAGST
      FREMDE
      FÄLLT
      FÜHLT
      FÜNF
      FÜR
      GEBOREN
      GEGENWÄRTIGKEIT
      GEHÖRT
      GENAUES
      GESICHT
      GESPROCHEN
      GIBST
      GINGE
      GLEICHGÜLTIGKEIT
      GROßSTADT
      GÄBE
      HALTESTELLE
      HAUT
      HEBT
      HINTERLÄSST
      HOFFEN
      HÄLTST
      HÄTTE
      HÄTTEST
      HÖRST
      IDEEN
      IHREM
      IHRER
      ILLUSIONEN
      INNERER
      INNERLICH
      JEMANDES
      JUNG
      KEINEM
      KEINER
      KLÄNGE
      KONKRETES
      KURZLEBIG
      KÄME
      KÖNNTE
      KÖNNTEST
      KÜHLER
      LANDSCHAFT
      LANGSAM
      LEBENDIGKEIT
      LEISE
      LESENDER
      LUFTHAUCH
      LÄNGER
      MACHT
      MANCHMAL
      MARMOR
      MENSCH
      MONDLICHT
      MÜDE
      NACHAHMENS
      NACHFORSCHENS
      NACHTS
      NAH
      NIE
      NÄHE
      NÜTZTE
      OFFENBART
      PARK
      PERFEKT
      RAUSCHEN
      REDEST
      RUNDEN
      SAGST
      SAGT
      SCHATTEN
      SCHAUT
      SCHLEIER
      SCHLÜSSEL
      SCHREIEN
      SCHULTERLANGES
      SCHWARZ
      SCHWEIGEN
      SOLLTEST
      SONST
      STELLST
      STILLE
      STIMME
      STIMMEN
      STRECKST
      STRUKTUR
      STUNDEN
      STÜNDE
      SUCHT
      TAG
      TANZ
      TON
      TRENNT
      TROTZ
      TRÄGT
      UNAUFLÖSBAR
      UNERGRÜNDLICHER
      UNERMESSLICHE
      VERANTWORTUNG
      VERGANGENHEIT
      VERGANGENHEITEN
      VERGESSEN
      VERLIERT
      VERLÄSST
      VERSCHLUCKT
      VERSCHWENDUNG
      VERZERRTE
      VORAUSGEHT
      VORSTELLUNG
      WACHSENDE
      WASSER
      WEIßT
      WELT
      WIEDERSEHEN
      WIRKLICHKEIT
      WOCHEN
      WORT
      WORTE
      WORTLOS
      WÄHREND
      WÄRE
      WÄREN
      WÄRST
      WÜRDE
      WÜRDEST
      ZEICHNUNG
      ZERBRECHLICHER
      ZUFÄLLIG
      ZUG
      ÄLTER
      ÜBERALL
      ÜBERFÄLLT
      ~~~~~~~~~~~~~~~

       

      Last edit: Nickolay V. Shmyrev 2014-02-12
  • ventoline

    ventoline - 2014-02-13

    I have singled out the word which were throwing errors, now I have a file with phonemes only, how yo extend the Voxforge dictionary with it?

     
  • Nickolay V. Shmyrev

    Hello

    You need to extend espeak2phones to work with new espeak a bit by adding new maps from espeak phones like r to voxforge phones. An updated script is attached, itshould work fine for your whole list

    You can join two text files with paste command. So the real command to run espeak2phones must be like this:

     cat some.words | espeak -v de -x -q  | ./espeak2phones.pl > some.phones && paste some.words some.phones > some.dic
    
     
  • ventoline

    ventoline - 2014-02-13

    Almost there, now there is only one word 'IDEEN' sending the error :

    'something did not match: -'

    And i suppose i just paste the additional words into the Voxforge dictionary? do they need to be in alphabetical order, or have additional specificities?

     

    Last edit: ventoline 2014-02-13
  • ventoline

    ventoline - 2014-02-25

    I think I am hitting another kind of error related to this: The words concerned seem to contain umlaut letters
    here is the error, how should i fix this? I am a bit cautious about modifying the dic. phonems myself.
    16:04:05.687 SEVERE lexTreeLinguist Bad HMM Unit: ui
    Feb 25, 2014 4:04:05 PM edu.cmu.sphinx.linguist.lextree.HMMTree addPronunciation
    SEVERE: Missing HMM for unit r with lc=t rc=ui
    16:04:05.696 SEVERE lexTreeLinguist Bad HMM Unit: ui
    Feb 25, 2014 4:04:05 PM edu.cmu.sphinx.linguist.lextree.HMMTree addPronunciation
    SEVERE: Missing HMM for unit ui with lc=r rc=m
    16:04:05.697 SEVERE lexTreeLinguist Bad HMM Unit: ui
    Feb 25, 2014 4:04:05 PM edu.cmu.sphinx.linguist.lextree.HMMTree addPronunciation
    SEVERE: Missing HMM for unit m with lc=ui rc=t
    16:04:05.703 SEVERE lexTreeLinguist Bad HMM Unit: ui
    Feb 25, 2014 4:04:05 PM edu.cmu.sphinx.linguist.lextree.HMMTree addPronunciation
    SEVERE: Missing HMM for unit ui with lc=k rc=n
    16:04:05.705 SEVERE lexTreeLinguist Bad HMM Unit: ui
    Feb 25, 2014 4:04:05 PM edu.cmu.sphinx.linguist.lextree.HMMTree addPronunciation
    SEVERE: Missing HMM for unit n with lc=ui rc=t
    16:04:05.712 SEVERE lexTreeLinguist Bad HMM Unit: ui
    Feb 25, 2014 4:04:05 PM edu.cmu.sphinx.linguist.lextree.HMMTree addPronunciation
    SEVERE: Missing HMM for unit f with lc=ui rc=n
    16:04:05.715 SEVERE lexTreeLinguist Bad HMM Unit: ui
    Feb 25, 2014 4:04:05 PM edu.cmu.sphinx.linguist.lextree.HMMTree addPronunciation
    SEVERE: Missing HMM for unit ui with lc=k rc=r
    16:04:05.715 SEVERE lexTreeLinguist Bad HMM Unit: ui
    Feb 25, 2014 4:04:05 PM edu.cmu.sphinx.linguist.lextree.HMMTree addPronunciation
    SEVERE: Missing HMM for unit r with lc=ui rc=p
    16:04:05.721 SEVERE lexTreeLinguist Bad HMM Unit: e
    Feb 25, 2014 4:04:05 PM edu.cmu.sphinx.linguist.lextree.HMMTree addPronunciation
    SEVERE: Missing HMM for unit t with lc=ee: rc=e
    16:04:05.721 SEVERE lexTreeLinguist Bad HMM Unit: e
    Feb 25, 2014 4:04:05 PM edu.cmu.sphinx.linguist.lextree.HMMTree addPronunciation
    SEVERE: Missing HMM for unit e with lc=t rc=s
    16:04:05.722 SEVERE lexTreeLinguist Bad HMM Unit: e
    Feb 25, 2014 4:04:05 PM edu.cmu.sphinx.linguist.lextree.HMMTree addPronunciation
    SEVERE: Missing HMM for unit s with lc=e rc=t
    16:04:05.767 SEVERE lexTreeLinguist Bad HMM Unit: ui

     
  • Nickolay V. Shmyrev

    16:04:05.722 SEVERE lexTreeLinguist Bad HMM Unit: e

    The message says you that the phoneset of the dictionary doesn't match with the phoneset of the model. You can not decode this way.

    You should either update your dictionary to not include the missing phones or you need to retrain the model.

    I am a bit cautious about modifying the dic. phonems myself.

    This is the easiest way. Make sure that dictionary creation script replaces the missing phones with the ones which are present in the dictionary.

     

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.