CMU Sphinx / Forums / Help: Some thoughts to tune Sphinx

Yueyu/Lin - 2008-09-30

I'm currently using a very small NGram to test the Sphinx performance. I called Timer.dumpAll() to show the time spended in different steps. The followings are my results:

--------------- Summary statistics ---------

Total Time Audio: 1.62s Proc: 0.17s Speed: 0.10 X real time
result is are your grandparents still living,duration is 6251

----------------------------- Timers----------------------------------------

Name Count CurTime MinTime MaxTime AvgTime TotTime

streamDataSourc 25 0.0000s 0.0000s 0.0010s 0.0003s 0.0070s
premphasizer 25 0.0000s 0.0000s 0.0010s 0.0001s 0.0020s
windower 25 0.0000s 0.0000s 0.0010s 0.0002s 0.0060s
fft 163 0.0000s 0.0000s 0.0040s 0.0003s 0.0550s
melFilterBank 163 0.0000s 0.0000s 0.0010s 0.0000s 0.0030s
dct 163 0.0000s 0.0000s 0.0010s 0.0000s 0.0040s
featureExtracti 159 0.0000s 0.0000s 0.0010s 0.0000s 0.0010s
Score 162 0.0000s 0.0000s 0.0810s 0.0007s 0.1160s
Prune 971 0.0000s 0.0000s 0.0010s 0.0000s 0.0010s
Grow 972 0.0000s 0.0000s 0.0040s 0.0001s 0.1050s
DictionaryLoad 1 0.5430s 0.5430s 0.5430s 0.5430s 0.5430s
AM_Load 1 3.4910s 3.4910s 3.4910s 3.4910s 3.4910s
compile 1 1.6120s 1.6120s 1.6120s 1.6120s 1.6120s
buildHmmPool 1 1.6010s 1.6010s 1.6010s 1.6010s 1.6010s
Create HMMTree 1 0.0060s 0.0060s 0.0060s 0.0060s 0.0060s

From above, we can find the AM_Load--Acoustic Model loading costs a lot of time.
I'm using edu.cmu.sphinx.model.acoustic.WSJ_8gau_13dCep_16k_40mel_130Hz_6800Hz.Model
We can find the function loader.load(); costs the time.
In my Configure.xml, the loader is
edu.cmu.sphinx.model.acoustic.WSJ_8gau_13dCep_16k_40mel_130Hz_6800Hz.ModelLoader

The first step is to improve the performance of ModelLoader.

There are two options:
1. Re-use the Acoustic Model instance for multiple recognizers. From what I undersatnd now, it's not possible. Am I right?
2. Re-implement the Acoustic Model, especially the Acoustic Model Loader. Actually I think it's a practical way. The model is a tree and we can keep the tree data among the whole JVM. What I'm planning is to use an embedded DB(for an example, BerkeleyDB) as the cache. For one model, just store the acoustic model data structure to the database. So multiple recognizers can share the same database instance. Since it's a read only operation, it will be super fast to access it(from my industry experience in other fields).

Do you have any more thoughts about this?

If the acoustic model can be tuned like this, the process buildHmmPool and Create HMMTree can also be tuned in this way. How do you think about it?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Yueyu/Lin - 2008-10-01
  
  I have checked the code of LexTextLinguist.java. In theory, the acoustic model and NGram grammar can be separated. Unfortunately, the allocate() function is responsible to initiate the acoustic model and compile the grammar. I can't find any way to recompile the grammar without touching the acoustic model. Even worse, the allocate() function set the acousticModel variable to null so I can't even access it later.
  But I can still manage to handle that.
  1. I copied out all the codes from LexTextLinguist.java and rename it to MyLexTextLinguist.java
  2. Keep the acousticModel variable to be reused later
  3. Allocate the MyLexTextLinguist instance
  4. Use the recognizer to do the recognition
  4.1 I have reimplemented a NGram grammar that can be reused multiple times that use different files
  5. Set the languageModel's resouce a new file location
  6. Recompile the languageModel for the MyLexTextLinguist instance
  
  That will work for my cases. Now I can continue to tune the application to cache the grammar into an embedded database so I won't be bothered to recompile them every time I use them. The text NGram grammar compilation is not cheap, too. I guess I can make them as fast as I can.
  
  But I really hope the Sphinx team may abstract the data access layer out to enable more spaces to create different applications.
  
  BTW: I also used Nuance recognizer. They can make all recognizer to share the same copy of acoustic models and language models. For one acoustic model and language model referred by the same URL, the Nuance SDK always points to the same copy instead of load and compile them from time to time. I can actually make the Sphinx to behave like this. But it's much more like a hack because I can't find more support in the current framework design.
  
  That's all what I'm thinking and doing now. Any comments are welcome. Thanks.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Nickolay V. Shmyrev - 2008-10-01
    
    > That will work for my cases.
    
    You are welcome to submit a patch
    
    > But I really hope the Sphinx team may abstract the data access layer out to enable more spaces to create different applications.
    
    Right now there is no immediate need in this.
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Yueyu/Lin - 2008-09-30
  
  After profiling using netbeans, I found
  1. The edu.cmu.sphinx.model.acoustic.WSJ_8gau_13dCep_16k_40mel_130Hz_6800Hz.ModelLoader.loadHMMPool(boolean, java.io.InputStream, String) costs 61% time
  2. edu.cmu.sphinx.model.acoustic.WSJ_8gau_13dCep_16k_40mel_130Hz_6800Hz.Model.lookupNearestHMM(edu.cmu.sphinx.linguist.acoustic.Unit, edu.cmu.sphinx.linguist.acoustic.HMMPosition, boolean) costs 22% time
  
  Since what I want to do is to reuse recognizer using different grammars(it means differnt LanguageModel instances). So I think the first time can be saved since all grammars can share the same acoustic model. For the second part, it's possible to improve as much as we can.
  
  Do you have any opinions? Thanks in advance.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Nickolay V. Shmyrev - 2008-09-30
    
    There was similar discussion already.
    
    https://sourceforge.net/forum/message.php?msg_id=4625020
    
    I think first of all like in any optimization you need to know what to optimize, understand the application and it's behaviour. So, please provide your testing application first, the model loading should be a minor thing really comparing to speech processing.
    
    In part of speech processing speed there are well-known tricks that should speedup processing significantly. Multipass recognition with lattice rescoring for example.
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
    - Yueyu/Lin - 2008-09-30
      
      My application is not that typical recognition application.
      I have a lot of small grammars that needs to be used by different recognition process.
      So it means I can't use one grammar to do the recognition. Actually everytime, the grammar should be different.
      That requires loading acoustic model and compiling grammars should be as fast as possible. Since the grammar is small, the recognition performance is not the major issue for this application.
      For typical recognition application, the grammar is not shifted frequently and there will not be that much grammars-- different grammar for different recognition session. So the grammar loading and acoustic model loading is not a issue since they can be only loaded once and be reused in the later sessions.
      
      That's my intention to improve the performance of loading acoustic model and compiling the grammars.
      
      Do you have any ideas about it? Thanks.
      
      If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
      - Nickolay V. Shmyrev - 2008-09-30
        
        I'm sure it's not required to reload acoustic model to recompile grammar, you should load acoustic model once and reuse it everywhere. Probably something went wrong in your code.
        
        Grammar compilation must be relatively fast.
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Some thoughts to tune Sphinx

Speech Recognition Toolkit

Forums

Help

Some thoughts to tune Sphinx document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

--------------- Summary statistics ---------

----------------------------- Timers----------------------------------------

Name Count CurTime MinTime MaxTime AvgTime TotTime

Some thoughts to tune Sphinx