Using others Acoustic models on Simon

Help
2009-08-27
2013-01-13
  • PatrickAlves
    PatrickAlves
    2009-08-27

    Hi Peter,

    I hope you're in good health (due to accident)!

    I'm working with simon for Brazilian Portuguese. It's works great with tha acoustic models create with my voice. We have a robust acoustic models (   
    trained more than 15 hours) built with HTK (using context dependent models with large gaussian mixtures) the we use for dictations task. I'm trying to use then on simon.

    So, first I try so copy the hmmdefs and tiedlist to .kde4/share/apps/simond/models/patrick/active and change some configurations on julius,jconf. Didn't work because I use MFCC_E_D_A_Z to train the model and julius 3.5 don't have the option -htkconf to load the MFCC configuration. So I convert the hmmdefs to binary format using the follow command:

    mkbinhmm -htkconf edaz.conf hmmdefs binhmm ;
    mv binhmm hmmdefs

    And it works !!! The recognition going well with any speaker.

    Here is my questions:

    1) The julius version that simon uses realy support the _Z  (MFCC_E_D_A_Z) option ? I think that its support because julius don't shows any error message.

    2) This version of julius only support 16Khz on acoustic models ?  Because I changed to 22050 Hz on simon configuration "Speech model -> Samplerate" and the  Sample-rate on recordings configuration on simon.
    I also use:
    -smpFreq 22050 
    -smpPeriod 454 
    on julius.jconf.
    But when I use simon and look on the julius.log:

    Speech Analysis Module(s)

    [MFCC01]  for [AM00 _default]

    Acoustic analysis condition:
            parameter = MFCC_E_D_A_Z (39 dim. from 12 cepstrum + energy with CMN)
            sample frequency = 16000 Hz
            sample period =  625  (1 = 100ns)
            window size =  550 samples (34.4 ms)
            frame shift =  220 samples (13.8 ms)
            pre-emphasis = 0.97
            # filterbank = 26
            cepst. lifter = 22
            raw energy = True
            energy normalize = True (scale = 0.1, silence floor = 50.0 dB)
            delta window = 2 frames (27.5 ms) around
            acc window = 2 frames (27.5 ms) around
            hi freq cut = OFF
            lo freq cut = OFF
            zero mean frame = ON
            use power = OFF
            CVN = OFF
            VTLN = OFF
           spectral subtraction = off
          cepstral normalization = real-time MAP-CMN
          base setup from = binhmm-embedded

    seems that Julius still working on 16Khz !!

    Bests,

    Patrick

     
    • Peter Grasch
      Peter Grasch
      2009-08-27

      Hi!

      > I hope you're in good health (due to accident)!
      I am doing ok. Thanks!

      > We have a robust acoustic models (trained more than 15 hours) built with HTK (using context dependent models with large gaussian mixtures) the we use for dictations task. I'm trying to use then on simon. 
      I am more than interested in your findings. Are they going to be made public somewhere?

      > 1) The julius version that simon uses realy support the _Z (MFCC_E_D_A_Z) option ? I think that its support because julius don't shows any error message.
      Honestly, I have no idea. Sorry. You can find the julius/sent configuration in their respective default locations (julius/libjulius/include/julius/config_linux.h, julius/libsent/include/sent/config_linux.h) and even change them if you need to. But if it works I guess that it does :)

      > 2) This version of julius only support 16Khz on acoustic models ? Because I changed to 22050 Hz on simon configuration "Speech model -> Samplerate" and the Sample-rate on recordings configuration on simon.
      Well it should support that but that is not really tested. Julius supports it but the juius parameter might not make it all the way to the recognition part. (The julius.jconf is ignored as it is "overwritten" by the options of simond)

      Is the new samplerate updated in ~/.kde/share/apps/simond/models/<user>/active/activerc? Does it work if you change it there manually?

      As a quick resolution, you could hardcode your parameters in simond/src/juliuscontrol.cpp (line #116)...

      Oh and if you change anything in your model input files, the compiled model will be overwritten! You can work around that, tough.
      simon will recompile the model once the last change of the model input files is newer than the date the active model was last compiled. The latter date/time is stored in the activerc files (~/.kde/share/apps/simon/model/activerc and ~/.kde/share/apps/simond/models/<user>/active/activerc; once for the server and once for the client). If you change the date to something like 1.1.2050 then simon won't touch it. (Altough you might want to plan for a very weird problem in the January of 2050 :P)

      Greetings,
      Peter

       
    • PatrickAlves
      PatrickAlves
      2009-08-28

      Hi Peter,

      > I am more than interested in your findings. Are they going to be made public somewhere?
      - Yes, we are a lot of resources (Acoustic models, dictionaries, databases) for Brazilian Portuguese available on: http://www.laps.ufpa.br/falabrasil/
      The website is in Portuguese, but you can click on "registre-se" create a account and go to download section were the resources area available.

      > Is the new samplerate updated in ~/.kde/share/apps/simond/models/<user>/active/activerc? Does it work if you change it there manually?
      - Yes !!! A change manually and works !!!!  julius is working on 22050 Khz now !! Thanks !!!

      >  Oh and if you change anything in your model input files, the compiled model will be overwritten! You can work around that, tough.
      - I know that, a made a script to update the model. But the problem is when I insert a new word on grammar,  simon need to recompile the grammar and it rebuild the acoustic model, so after that I run the script again and copy the models.

      I have another questions:

      First, let me introduce myself.
      I'm Patrick Silva from Belém city of Pará state (Brazil). I'm working on Federal University of Pará (UFPA)  on Signal Processing Lab (LaPS - www.laps.ufpa.br). I'm finishing my Masters on Telecommunication (more specific Automatic Speech Recognition ). We have a group called FalaBrasil that produce resources for ASR for Brazilian Portuguese.

      We (FalaBrasil) are interesting on produce a Brazilian Portuguese version  of Simon. The Idea is use Simon for special people (that can't use keyboard or mouse). So, the idea is to create a complete system of command and control (similar to that shown on youtube). We would like to call it SimonBR, do you agree with the name ? we can keep the original name if you don't.

      The idea is:
      - Use our acoustic model (so the user don't need to train it).
      - Compile a grammar with all commands, shortcuts, lists, etc needed. But allowing the user insert new commands, textmacros, etc.
      - Create a simple install version for windows and Linux. Or the user can download the simon0.2 and run some script to update it to Brazilian version.

      To do this we will have to do some changes on source code. For example, disable acoustic model train on Simon, synchronization process will just compile the grammar.
      We are interesting to change the simon's interface to Portuguese too.

      We would like to know your opinion about this, and if you have any suggestion for Brazilian Portuguese Simon's  version.

      Greetings,
      Patrick

       
      • Peter Grasch
        Peter Grasch
        2009-08-28

        Hi Patrick!

        >Yes, we are a lot of resources (Acoustic models, dictionaries, databases) for Brazilian Portuguese available on: http://www.laps.ufpa.br/falabrasil/
        Great, but the licence is a bit problematic (as not GPL compatible). Did you consider adding it to the voxforge project (http://voxforge.org). This would get you:
        a.) A lot of public exposure. Voxforge is well known in the open source speech recognition community
        b.) A complete contribution framework with desktop application and java applet (web access) to contribute speech data to the model.
        c.) The (few) already existing recordings from users.

        Maybe have a look here: http://voxforge.org/pt_br

        > Yes !!! A change manually and works !!!! julius is working on 22050 Khz now !! Thanks !!! 
        Ok then I know what to look for.

        > I know that, a made a script to update the model. But the problem is when I insert a new word on grammar, simon need to recompile the grammar and it rebuild the acoustic model, so after that I run the script again and copy the models.
        Yes, I did something similar when I tried out the voxforge model (English)...

        > We (FalaBrasil) are interesting on produce a Brazilian Portuguese version of Simon. [...]
        That would be great but I would very much prefer a different approach. We both have the same goal here (creating a complete system for the disabled). So why split the project (and thus the resources)?

        The points you listed:
        * Translated GUI
          Well you can translate simon (usings KDEs lokalize) very easily. simon is already translated to German, Dutch, Czech, Spanish and French (altough some translations are incomplete). The translation will get automatically picked up and used if simon is started on a system with the appropriate locale (German on a German system, etc.).
        * Disable acoustic model compilation
          This could just be a config option but really the correct way to do is to introduce what I'll just call "base models". We would like the user to be able to define a certain hmm model (yours, for example) to be the "base model". The base model will never be touched by simon but can be adapted to the users speech with training data thus generating an adapted, user-specific version of the base model.
          Without training data, the base model itself is used as the "adapted" model so that the user doesn't _have_ to train at all but it would still give him the option to do so - if the recognition rate is low.
          Base models are something I would like to introduce to simon 0.3 but I am not sure if it will be ready in time.
          If you really want the user to not be able to change the model at all, simon could provide a third option: No base model (the current situation), base model, static model. The latter would be the same base model but without the speaker adaption part.
        * Easily installable "grammar / vocabulary / commands"-package
          I am currently working on "packaging" a combination of vocabulary, commands, grammar and trainingstexts in what we call "scenarios". You could then create (and share) e.g. a "Webbrowsing"-scenario. With this technology in place (as I said I am currently working on that but it's in an early stage) you could just create a "Brazilian"-scenario. The scenarios are of course user editable (so the user can change them). The idea is to allow the user to create, modify and distribute those scenarios easily to allow users to share their simon configuration. This will definitely come with simon 0.3.

        So all in all we - again - share the same goals. This is why I would very much like to "join forces". Instead of a complete fork you could maybe integrate the missing functionality into simon itself. This way there is only one codebase to maintain.

        If you are interested, I will give you / your team write access to simons svn.

        Greetings,
        Peter

         
    • PatrickAlves
      PatrickAlves
      2009-08-31

      Hi Peter !

      That's a great idea !!

      We (me and my advisor (Aldebaro Klautau)) would like to discuss this by email, can you give the yours ?

      mine is patrickalves@ufpa.br

      Bests,
      Patrick

       
    • Peter Grasch
      Peter Grasch
      2009-09-01

      Hi!

      My e-Mail is grasch <ate> simon-listens°org.

      Looking forward to hearing from you!

      Greetings,
      Peter