Menu

dropped letters

Help
Anonymous
2007-06-08
2012-09-22
  • Anonymous

    Anonymous - 2007-06-08

    I am in the process of collecting data to use for training. This is the first time I have tried this and have yet to pull everything together so forgive me if this seems like a stupid question.

    How well does sphinx cope with letters that are not always pronounced in normal speech?
    For example, the 't' in 'cat' should always be distinctly pronounced, at least according to pronounciation guides. But in normal speech the 't' might be dropped, resulting in something that sounds like 'ca'.

    While producing training data, is it best to record examples of each pronounciation, and let sphinx work out that the 't' may or may not be sounded. Or would it be better to put two entries in the dictionary, one where the 't' is sounded, one where it is not? Or is there another solution?

    Thank you in advance
    Matt

     
    • Anonymous

      Anonymous - 2007-06-11

      Ok thank you both. I will give both a try.

      Matt

       
    • David Huggins-Daines

      Best thing to do (presuming you know what the different pronunciations are) would be to put alternative pronunciations in your dictionary, like this:

      CAT K AE T
      CAT(2) K AE

       
    • Anonymous

      Anonymous - 2007-06-09

      I agree with David's answer, but suggest caution in applying it in cases such as the the example cited by Matt.

      The phone T (in Sphinx notation) is an unvoiced stop consonant, which is characterized by (1) a brief silence while the vocal tract is closed by the tongue tip, (2) a "burst" as the pent-up air pressure is released, and (3) a brief duration of noise (aspiration) as air flows through the narrow but widening constriction at the tongue tip. In addition, the formants in the surrounding phones will move due to the changing position of the tongue as it moves into and out of the stop. The burst and aspiration may be more or less evident, depending on phonetic context and the way in which the T is pronounced.

      I wrote that lengthy explanation to suggest that if Matt doesn't hear a "noisy" T in CAT, it may be a mistake to conclude that the T has been omitted; I suggest that it's just pronounced not as noisily as you might imagine it should be. I don't think I'd use two pronunciations for CAT. With enough data, acoustic model training will "learn" the acoustic characteristics of T in its various contexts.

      There are, to be sure, cases where a T is genuinely dropped, and you can find such by a stroll through the CMUdict. For example, consider:
      IDENTITY AY D EH N T AX T IY
      IDENTITY(2) AY D EH N AX T IY
      Note that the T is articulated at the same place as the preceding N, and in rapid speech, one can omit the stop altogether. IMHO this is a valid case for two pronunciations.

      To summarize, multiple dictionary pronunciations are needed to cope with different pronunciations of many words. I simply urge a little conservatism in deciding what is and isn't a different pronunciation at the broad phonetic level used in Sphinx.

      cheers,
      jerry

       

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.