Obtaining a string of phones from speech

Speech Recognition Toolkit

Brought to you by: air, arthchan2003, awb, bhiksha, and 5 others

This project can now be found here.

Obtaining a string of phones from speech

Forum: Speech Recognition Theory

Creator: petebleackley

Created: 2011-02-17

Updated: 2012-09-22

petebleackley - 2011-02-17

I've seen a lot of language recognition software on the internet, but it's all
text based. I'd like to create a speech-based language recognizer. To do that,
I need a stripped-down speech recognition engine that will convert an
utterance to a sequence of phones, which can then be fed into an AI component
which infers the most likely language. Is it possible to use Sphinx in this
way?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Nickolay V. Shmyrev - 2011-02-17

Yes, it's possible. See

http://cmusphinx.sourceforge.net/wiki/phonemerecognition

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

petebleackley - 2011-02-18

That article makes it look difficult to use for my application - what I would
really like is a component that, when fed a stream of audio, will feed phones
extracted from it into my language classifier software, like the following
pseudocode

Classifier = LanguageClassifier(models) # models represents the phonotactics
of various languages
Phones = PhonemeClassifier(AudioStream) # an iterable object
Client = LangugeIdentifierClient()

for phone in Phones:
Client.UseResults(Classifier.IdentifyLanguage(phone))

It looks like I'll have to hack something together out of the source code.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Log in to post a comment.