The common approach to deal with OOV problem is to make dictionary with
smaller units like word-fragments and phones so that being combined they could
form almost every word. The setup to decode with word fragments is not
different from usual setup it includes definition of the language model and
the dictionary. For CMU Sphinx decoders it doesn't matter if language model is
subword-based or word based.
To build subword language model specialized software is used. CMU Sphinx
doesn't provide tools to do that yet. One of frequently used free tools is
Sequitur-G2P.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi,
whether any methods defined in Sphinx for recognizing OOV words ?
The common approach to deal with OOV problem is to make dictionary with
smaller units like word-fragments and phones so that being combined they could
form almost every word. The setup to decode with word fragments is not
different from usual setup it includes definition of the language model and
the dictionary. For CMU Sphinx decoders it doesn't matter if language model is
subword-based or word based.
To build subword language model specialized software is used. CMU Sphinx
doesn't provide tools to do that yet. One of frequently used free tools is
Sequitur-G2P.