CMU Sphinx / Forums / Speech Recognition Theory: sphinx train

Stefano Canepa - 2002-04-07

Dear all,
   I am trying to add a little Italian to sphinx. I am using sphinxtrain. I'm following the manual. But my problem is that the .wav file I recorded using the GNOME sound recorder cannot be converted in feature files.
   Could you help me. I'm thinking it's an hardware problem.

   Stefano
   sc@linux.it


If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anonymous - 2002-04-08
  
  Whats the reported problem ? Can you play the sound files back..are they okay...
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Doumenc Julie - 2002-04-15
  
  Hi,
  
  I'm trying to create my own LM in english (I'm French so I try to add the French accent to the model).
  I followed the instructions of doc/tinydoc.txt of SphinxTrain. My problem is that the command: bin/make_feats etc/time.fileids doesn't give any result, even if it uses 95% of the CPU. I waited for 48 hours... First, I tried with 132 sentences (20 minutes of recording) and as it was very long, I stopped it and tried with 10 seconds of recording but the program never stopped...
  
  Do you have an idea of the problem?
  
  Thanks a lot
  
  julie
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Doumenc Julie - 2002-04-17
  
  OK, I found the solution: I changed the options of the command line: "bin/wave2feat -verbose -c $1 -nist -di wav -ei wav -do feat -eo feat" to "bin/wave2feat -verbose -c $1 -raw -di wav -ei wav -do feat -eo feat"
  And my .feat files were generated!
  
  Now, I have an other problem: the command "./script_pl/02.ci_schmm/slave_convg.pl" gives :
  WARN: "s3io.c", line 253: Unable to open /home/doumenc/SphinxTrain/time/model_parameters/time.ci_semi_flatinitial/mixture_weights for reading; No such file or directory.
  
  Indeed, this file doesn't exist. But I thought that slave_convg.pl had to create it and write in it.
  
  Any ideas?
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - prathna - 2023-09-13
    
    hello i am having same error..can you help me
    Can not open transcript file (/home/prathna/speech_recognition/other/etc/kreol_train.transcription) at /home/prathna/speech_recognition/sphinxtrain/scripts/00.verify/verify_all.pl line 245.
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anonymous - 2002-04-18
  
  Hello again,
  
  Go look at the log files carefully. Any failure at a stage is usually caused by a failure at a previous stage. Use a html browser to view time.html file.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Stefano Canepa - 2002-04-27
  
  Doumenc Julie and Andrew Hewitt I need to thank you pubblically for you help. I came really late becouse I was monitoring the wrong forum and becouse I was abroad and away from my PC a while.
  
  Thanks
  Stefano
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- datta prasad - 2002-08-22
  
  hi all
  this is dattu. i m a fresher to this Sphinx software.i was able to get feat files but Phase 01 is giving trouble.
  actually i am runing the scripts described in the tinydoc.txt.
  ./scripts_pl/01.vector_quantize/slave.VQ.pl
  
  running the above script , the program is not returning. i left that all the night but still its like that.
  i saw the log files
  i got an error saying
  1) NO mdef files
  2) line no1271 unable to open dumpfile ....../time.dmp for opening
  3) line no 1549 unabel to train
  could any one help me in this issue Pls.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Jessica P. Hekman - 2002-08-22
    
    dattu,
    
    Hopefully someone who actually knows what they are talking about will try to help you, but in the meantime I am willing to take a shot at it if you can post your actual log file instead of just a summary.
    
    As for the error that time.dmp can't be written -- have you checked to make sure there are no permissions problems?
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- datta prasad - 2002-08-23
  
  thanks jessica,
  i had seen permissions , its ok
  any how i getting the following problem
  
  at phase 01.vector quantize , there r two steps to be done 1) AGG_SEG and kmeans
  
  error is occurring at AGG_SEG. the exact one is as follows
  INFO: main.c (168): No lexical transcripts provided
  INFO: corpus.c(426): Will process all remaining utts starting at 0
  INFO: main.c (272): Will produce feat files
  INFO: main.c(426):Writing frames to one file
  Header size field: -411172864(e77e0000): filesize: 129953(1fba1)
  ERROR: corpus.c line (1513): MFCC read failed. Retrying after sleep...
  
  could u help me Pls
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anonymous - 2002-08-30
  
  g_dattu:
  
  I have had this problem myself. Usually it means that the system can't find the MFC file associated with (at least) one of the files in the control file.
  -make sure you have run wave2feat to convert your audio to mel-cepstral feature files
  -Check the value of CFG_FEATFILE_EXTENSION and make sure it matches the extension on your mfc files (from wave2feat)
  -make sure $CFG_FEATFILES_DIR actually points to the location where the mfc files are
  
  The reason it never ends is because when an mfc read fails, there is a sleep command in one of the libraries (libio.lib maybe). I actually changed that to an exit and added a couple of lines to print the file that caused the problem. I can't, unfortunately, give the actual code (it's my company's), but what you can do is write a perl script or something that verifies an mfc file exists for every file listed in your control file.
  
  --Keith
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anonymous - 2002-09-17
  
  I am training a word model, but the script agg_seg.pl goes to sleep and logs the following output:
  
  INFO: main.c(168): No lexical transcripts provided
  INFO: corpus.c(1236): Will process all remaining utts starting at 0
  INFO: main.c(272): Will produce FEAT dump
  INFO: main.c(426): Writing frames to one file
  stat_retry(/sphinxtrain/bg_tel/feat/blago_0_1
  .feat) failed
  ERROR: "corpus.c", lin
  
  does anyone know a reason for this?
  
  this feat file is 3072 bytes -> could this be the reason(too small)
  thanks
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anonymous - 2002-09-24
  
  Hi plamen (first name? Russian or some other slavic)?.
  
  Generally, this is a result of the mfc file not existing (see just previous post). If not (I am sure it isn't :) ), did you use wave2feat to produce an mfc/feat file? If you ported wave2feat to windows, there is at least one case where it loads the original audio file as a text file when it should be a binary. Do you have permission to write to the directory involved (is it specified as read-only) and the file doesn't already exist?
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- dasineni - 2008-04-19
  
  hi Doumenc Julie
  you replied like this
  
  OK, I found the solution: I changed the options of the command line: "bin/wave2feat -verbose -c $1 -nist -di wav -ei wav -do feat -eo feat" to "bin/wave2feat -verbose -c $1 -raw -di wav -ei wav -do feat -eo feat"
  And my .feat files were generated!
  
  can u please explain how to run this command after creating the dic file(by following doc/tinydoc.txt of SphinxTrain)for generating the feat files
  
  thanks for your help
  
  dasineni..
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

sphinx train

Speech Recognition Toolkit

Forums

Help

sphinx train

sphinx train

Speech Recognition Toolkit

Forums

Help

sphinx train document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

sphinx train