Menu

sphinx.subwiki.com: Please help build wiki

Help
pi
2007-10-10
2012-09-22
  • pi

    pi - 2007-10-10

    I recently found
    sphinx.subwiki.com

    It has some Getting Started Tutorials which are very good for getting n00bs (like me) started.

    IMHO Sphinx is lacking a simple entrypoint for n00bs; I suggest building this wiki to be that entrypoint.

    I am stuck and need help building this page:
    http://sphinx.subwiki.com/sphinx/index.php/Explanation_of_Parameters_to_sphinx3_livepretend

    It is a shoot-off from the first Getting Started Tutorial (section: Create config file)
    http://sphinx.subwiki.com/sphinx/index.php/Hello_World_Decoder_QuickStart_Guide#Create_config_file

    It's the first parameter I can't figure out:
    -hmm $SPHINX_ROOT/sphinx3/model/hmm/hub4_cd_continuous_8gau_1s_c_d_dd

    I found folder
    /blablabla/Sphinx/sphinx3-0.7/build/share/sphinx3/model/hmm/hub4_cd_continuous_8gau_1s_c_d_dd

    what is it? How is it created? what are the raw components you feed in to the blackbox to get this? (my guess: rawaudio of someone speaking utterances into the mic + phonetic transcript of each, -- something like section 4.9.5 of Hieroglyphs??)

    and what is the blackbox? how to operate it?

    I used the AN4 training model, so (I think) it came with all this stuff premade.

    But I want to understand the mechanism -- how to build from scratch. I want to put in my own voicefiles and phonetic transcripts (because I want good accuracy).

    And I really want to contribute to making a simple yet comprehensive run-thru from start to finish to demonstrate using sphinx. I think this will attract front-end developers. As it is, it's a marathon job to even get this far.

    Many thanks,

    Sam

     
    • pi

      pi - 2007-10-13

      > There is even third wiki about cmusphinx :)
      >
      > http://lima.lti.cs.cmu.edu/mediawiki/index.php/Main_Page
      >
      > And using even two is certainly not an option, moreover if you don't
      > control second one (you don't have admin rights there, do you?) The
      > last one is more preferable then since dfdu likes MediaWiki more than
      > sourceforge's twiki.

      This is David's personal notes -- he's using the wiki format because it's easy to maintain & structure, not to make a multi-user wiki. He says so on the page. So this is not an option. We are back to two wikis.

      I advocate using the subwiki for now because:
      1) it's currently impossible to log into the official wiki
      2) the whole contents of the subwiki will need to be moved across at some point anyway
      3) media wiki is nicer to use
      4) the existing structure of the subwiki is a suitable (tho not comprehensive - yet) launchpoint for a beginner

      Not knowing the author/admin is aq bit of a worry:
      If the worst comes to the worst and subwiki disappears... I'll see if I can download a local copy with wget
      If we can't trace the author, maybe one of the sphinx team (maybe David?) would be up for re-hosting a WikiMedia wiki, then the subwiki contents can simply be copied over, and the existing wiki merged. I'd be happy to host, but have a feeling the Sphinx team would prefer to keep the admin within the core group! (it would make sense..)

      on this point I'll be happy to hand over the op-lev30 for irc.freenode.net#cmusphinx if any of the sphinx team would like to take over / op anyone who wants to help out: just send me ur nick

      Sam

       
    • Nickolay V. Shmyrev

      I completely agree more precise newbie documentation is required in speech recognition domain so this activity is welcome for sure.

      But to be honest I'm quite upset by this wiki, first of all "sphinx" is a different project dedicated to database search, our project is called cmusphinx. Second, there is official wiki, why just not use it?

      Also it would be nice to integrate wiki with offline documentation, for example on your question:

      -hmm
      Directory for specifying Sphinx 3's hmm, the following files are assummed to be present, mdef, mean, var, mixw, tmat. If -mdef, -mean, -var, -mixw or -tmat are specified, they will override this command.

      This is directory with acoustic model. mdef is a model definition file, mean contains HMM means and so on. There are variances, mixture weights and transition matrices. But to understand any component here you must know what is HMM and how is it created with Sphinxtrain. So probably there is no easy answer which can fit in a single post.

      > But I want to understand the mechanism -- how to build from scratch. I want to put in my own voicefiles and phonetic transcripts (because I want good accuracy).

      What stops you then?

       
    • pi

      pi - 2007-10-12

      > But to be honest I'm quite upset by this wiki, first of all "sphinx" is a
      > different project dedicated to database search, our project is called cmusphinx.
      > Second, there is official wiki, why just not use it?

      true it is ambiguously named
      also true that the idea of having two wikis for the same thing is a contradiction

      I have no idea who created it
      but the content is perfectly structured for a beginner to learn - I cannot find this content anywhere else
      and looking at the main wiki, it is totally unsuitable for a beginner.

      there does not seem to be any content overlap between them

      I suggest putting a hyperlink entitled " Beginner's wiki " in the main page of the main wiki, linking to sphinx.subwiki.com. I tried to do this myself but I cannot create a user account for the main wiki - it says '403 forbidden'. can anyone give me a password for username SunFish7?

      I don't want to step on anyone's toes here -- my only effort is to integrate documentation. a wiki is surely the way to go -- with so many developers currently the help files are scattered all over the place. wikis can also be merged at a later date.

      > > But I want to understand the mechanism -- how to build from scratch.
      > > I want to put in my own voicefiles and phonetic transcripts (because
      > > I want good accuracy).
      >
      > What stops you then?

      knowledge! I can't find the answer anywhere in the help/documentation!

      I'm trying to get the most basic possible run-through, from start to finish of the whole process of training ( internals can come later ). the documentation says you have to start with raw-audio and phonetic transcripts. but I cannot find any documentation of how you make the first steps with this data.

      this Tutorial goes some of the way:
      http://www.speech.cs.cmu.edu/sphinx/tutorial.html

      this seems to be the key bit:
      perl scripts_pl/make_feats.pl -ctl etc/an4_train.fileids

      .. looking in etc/an4_train.fileids it references lots of .sph files in
      .\an4\wav\an4_clstk\

      the script looks like it throws all these files thru wav2feat
      I'm assuming wav2feat constructs the 5 folders mdef, mean, var, mixw or tmat out of the .sph files

      What's in these .sph files? does this contain a rawaudio sample and a phonetic transcript? How to make them?

      Sam

       
      • Nickolay V. Shmyrev

        There is even third wiki about cmusphinx :)

        http://lima.lti.cs.cmu.edu/mediawiki/index.php/Main_Page

        And using even two is certainly not an option, moreover if you don't control second one (you don't have admin rights there, do you?) The last one is more preferable then since dfdu likes MediaWiki more than sourceforge's twiki.

        > I tried to do this myself but I cannot create a user account for the main wiki - it says '403 forbidden'. can anyone give me a password for username SunFish7?

        Me too, so clearly it's an administration problem.

        > I'm assuming wav2feat constructs the 5 folders mdef, mean, var, mixw or tmat out of the .sph files

        No, wave2feat just transforms waves to features file (mfc extension) containing cepstra, deltas and accels.

        > this Tutorial goes some of the way:
        http://www.speech.cs.cmu.edu/sphinx/tutorial.html

        It's bad, I suggest you to read sphinxtrain documentation first

        http://www.speech.cs.cmu.edu/sphinxman/scriptman1.html

        > What's in these .sph files? does this contain a rawaudio sample and a phonetic transcript? How to make them?

        It's just a wave in special NIST encoding just like ADPCM.

        The whole process goes this way. You record waves, updates config in etc subfoler, fill prompts in etc/$id.transcription and fileids in etc/$id.fileids. then you extract feats with make_feats. Then you start training with ./scripts_pl/RunAll.pl. Training will create a model for you.

        Heh and actually I think that we should care about docs but the beginner shouldn't train his model at all. Model training is very complicated task and often it gives confusing results. Most efforts should go into "just works" direction :)

         

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.