CMU Sphinx / Forums / Help: Basic concepts

Speech Recognition Toolkit

Basic concepts

Forum: Help

Creator: Focus Research

Created: 2016-12-23

Updated: 2016-12-23

Focus Research - 2016-12-23

I've attempted to read all the documentation. Unfortunately, the documentation feels like someone took a 9-speed automatic transmission apart, threw all the pieces on a table and then said "There!". Not overly helpful unless you already know the pile of bits is an automatic transmission. So I'm still confused about some basic concepts.

My questions are about the front end, starting with the microphone. From what I've been able to figure out, some process listens to the microphone and magically decides when to start and stop recording. It then writes the raw data to a file (in a known format) and pases the file onto the rest of the software for recognition. Is that right?

OR... does the front end simply continuously record data and pass data in X-second chunks to the rest of the code?

The reason I ask is that I want to put speach recognition in an embedded system, so I need to understand where the real-time processes are vs. processes that can be time-shared by the RTOS. It appears that all the recognition code can run at a lower priority and swapped in/out by the RTOS. But the thread that is recording the microphone data needs to be continously running lest there be discontinuities in the data stream.

Any explanation of this would be helpful.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2016-12-23
  
  My questions are about the front end, starting with the microphone. From what I've been able to figure out, some process listens to the microphone and magically decides when to start and stop recording. It then writes the raw data to a file (in a known format) and pases the file onto the rest of the software for recognition. Is that right?
  
  No.
  
  OR... does the front end simply continuously record data and pass data in X-second chunks to the rest of the code?
  
  Yes.
  
  The reason I ask is that I want to put speach recognition in an embedded system, so I need to understand where the real-time processes are vs. processes that can be time-shared by the RTOS. It appears that all the recognition code can run at a lower priority and swapped in/out by the RTOS. But the thread that is recording the microphone data needs to be continously running lest there be discontinuities in the data stream.
  
  There could be multiple solutions here from running whole thing in realtime to buffering audio in realtime and processing offline. It depends on your preference and the exact workload of the system.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Basic concepts

Speech Recognition Toolkit

Forums

Help

Basic concepts document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

Basic concepts