CMU Sphinx / Forums / Help: sphinx.subwiki.com: Please help build wiki

pi - 2007-10-10

I recently found
sphinx.subwiki.com

It has some Getting Started Tutorials which are very good for getting n00bs (like me) started.

IMHO Sphinx is lacking a simple entrypoint for n00bs; I suggest building this wiki to be that entrypoint.

I am stuck and need help building this page:
http://sphinx.subwiki.com/sphinx/index.php/Explanation_of_Parameters_to_sphinx3_livepretend

It is a shoot-off from the first Getting Started Tutorial (section: Create config file)
http://sphinx.subwiki.com/sphinx/index.php/Hello_World_Decoder_QuickStart_Guide#Create_config_file

It's the first parameter I can't figure out:
-hmm $SPHINX_ROOT/sphinx3/model/hmm/hub4_cd_continuous_8gau_1s_c_d_dd

I found folder
/blablabla/Sphinx/sphinx3-0.7/build/share/sphinx3/model/hmm/hub4_cd_continuous_8gau_1s_c_d_dd

what is it? How is it created? what are the raw components you feed in to the blackbox to get this? (my guess: rawaudio of someone speaking utterances into the mic + phonetic transcript of each, -- something like section 4.9.5 of Hieroglyphs??)

and what is the blackbox? how to operate it?

I used the AN4 training model, so (I think) it came with all this stuff premade.

But I want to understand the mechanism -- how to build from scratch. I want to put in my own voicefiles and phonetic transcripts (because I want good accuracy).

And I really want to contribute to making a simple yet comprehensive run-thru from start to finish to demonstrate using sphinx. I think this will attract front-end developers. As it is, it's a marathon job to even get this far.

Many thanks,

Sam

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- pi - 2007-10-13
  
  > There is even third wiki about cmusphinx :)
  >
  > http://lima.lti.cs.cmu.edu/mediawiki/index.php/Main_Page
  >
  > And using even two is certainly not an option, moreover if you don't
  > control second one (you don't have admin rights there, do you?) The
  > last one is more preferable then since dfdu likes MediaWiki more than
  > sourceforge's twiki.
  
  This is David's personal notes -- he's using the wiki format because it's easy to maintain & structure, not to make a multi-user wiki. He says so on the page. So this is not an option. We are back to two wikis.
  
  I advocate using the subwiki for now because:
  1) it's currently impossible to log into the official wiki
  2) the whole contents of the subwiki will need to be moved across at some point anyway
  3) media wiki is nicer to use
  4) the existing structure of the subwiki is a suitable (tho not comprehensive - yet) launchpoint for a beginner
  
  Not knowing the author/admin is aq bit of a worry:
  If the worst comes to the worst and subwiki disappears... I'll see if I can download a local copy with wget
  If we can't trace the author, maybe one of the sphinx team (maybe David?) would be up for re-hosting a WikiMedia wiki, then the subwiki contents can simply be copied over, and the existing wiki merged. I'd be happy to host, but have a feeling the Sphinx team would prefer to keep the admin within the core group! (it would make sense..)
  
  on this point I'll be happy to hand over the op-lev30 for irc.freenode.net#cmusphinx if any of the sphinx team would like to take over / op anyone who wants to help out: just send me ur nick
  
  Sam
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2007-10-10
  
  I completely agree more precise newbie documentation is required in speech recognition domain so this activity is welcome for sure.
  
  But to be honest I'm quite upset by this wiki, first of all "sphinx" is a different project dedicated to database search, our project is called cmusphinx. Second, there is official wiki, why just not use it?
  
  Also it would be nice to integrate wiki with offline documentation, for example on your question:
  
  -hmm
  Directory for specifying Sphinx 3's hmm, the following files are assummed to be present, mdef, mean, var, mixw, tmat. If -mdef, -mean, -var, -mixw or -tmat are specified, they will override this command.
  
  This is directory with acoustic model. mdef is a model definition file, mean contains HMM means and so on. There are variances, mixture weights and transition matrices. But to understand any component here you must know what is HMM and how is it created with Sphinxtrain. So probably there is no easy answer which can fit in a single post.
  
  > But I want to understand the mechanism -- how to build from scratch. I want to put in my own voicefiles and phonetic transcripts (because I want good accuracy).
  
  What stops you then?
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- pi - 2007-10-12
  
  > But to be honest I'm quite upset by this wiki, first of all "sphinx" is a
  > different project dedicated to database search, our project is called cmusphinx.
  > Second, there is official wiki, why just not use it?
  
  true it is ambiguously named
  also true that the idea of having two wikis for the same thing is a contradiction
  
  I have no idea who created it
  but the content is perfectly structured for a beginner to learn - I cannot find this content anywhere else
  and looking at the main wiki, it is totally unsuitable for a beginner.
  
  there does not seem to be any content overlap between them
  
  I suggest putting a hyperlink entitled " Beginner's wiki " in the main page of the main wiki, linking to sphinx.subwiki.com. I tried to do this myself but I cannot create a user account for the main wiki - it says '403 forbidden'. can anyone give me a password for username SunFish7?
  
  I don't want to step on anyone's toes here -- my only effort is to integrate documentation. a wiki is surely the way to go -- with so many developers currently the help files are scattered all over the place. wikis can also be merged at a later date.
  
  > > But I want to understand the mechanism -- how to build from scratch.
  > > I want to put in my own voicefiles and phonetic transcripts (because
  > > I want good accuracy).
  >
  > What stops you then?
  
  knowledge! I can't find the answer anywhere in the help/documentation!
  
  I'm trying to get the most basic possible run-through, from start to finish of the whole process of training ( internals can come later ). the documentation says you have to start with raw-audio and phonetic transcripts. but I cannot find any documentation of how you make the first steps with this data.
  
  this Tutorial goes some of the way:
  http://www.speech.cs.cmu.edu/sphinx/tutorial.html
  
  this seems to be the key bit:
  perl scripts_pl/make_feats.pl -ctl etc/an4_train.fileids
  
  .. looking in etc/an4_train.fileids it references lots of .sph files in
  .\an4\wav\an4_clstk\
  
  the script looks like it throws all these files thru wav2feat
  I'm assuming wav2feat constructs the 5 folders mdef, mean, var, mixw or tmat out of the .sph files
  
  What's in these .sph files? does this contain a rawaudio sample and a phonetic transcript? How to make them?
  
  Sam
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Nickolay V. Shmyrev - 2007-10-12
    
    There is even third wiki about cmusphinx :)
    
    http://lima.lti.cs.cmu.edu/mediawiki/index.php/Main_Page
    
    And using even two is certainly not an option, moreover if you don't control second one (you don't have admin rights there, do you?) The last one is more preferable then since dfdu likes MediaWiki more than sourceforge's twiki.
    
    > I tried to do this myself but I cannot create a user account for the main wiki - it says '403 forbidden'. can anyone give me a password for username SunFish7?
    
    Me too, so clearly it's an administration problem.
    
    > I'm assuming wav2feat constructs the 5 folders mdef, mean, var, mixw or tmat out of the .sph files
    
    No, wave2feat just transforms waves to features file (mfc extension) containing cepstra, deltas and accels.
    
    > this Tutorial goes some of the way:
    http://www.speech.cs.cmu.edu/sphinx/tutorial.html
    
    It's bad, I suggest you to read sphinxtrain documentation first
    
    http://www.speech.cs.cmu.edu/sphinxman/scriptman1.html
    
    > What's in these .sph files? does this contain a rawaudio sample and a phonetic transcript? How to make them?
    
    It's just a wave in special NIST encoding just like ADPCM.
    
    The whole process goes this way. You record waves, updates config in etc subfoler, fill prompts in etc/$id.transcription and fileids in etc/$id.fileids. then you extract feats with make_feats. Then you start training with ./scripts_pl/RunAll.pl. Training will create a model for you.
    
    Heh and actually I think that we should care about docs but the beginner shouldn't train his model at all. Model training is very complicated task and often it gives confusing results. Most efforts should go into "just works" direction :)
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

sphinx.subwiki.com: Please help build wiki

Speech Recognition Toolkit

Forums

Help

sphinx.subwiki.com: Please help build wiki document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

sphinx.subwiki.com: Please help build wiki