CMU Sphinx / Forums / Help: Sphinxtrain - ERRORS while Extracting features from segments and missing .mfc files

Giro Rarielli - 2018-08-19

I lost a lot of time while resolving errors for .dic, .phone, and .transcription, but this ones I can't solve on my own obviously. If someone can link me solutions for I would be very grateful.
first:

Extracting features from segments starting at (part 1 of 1)
ERROR: This step had 3 ERROR messages and 0 WARNING messages. Please check the log file for details.
Feature extraction is done
MODULE: 00 verify training files

second error:

Phase 3: Check general format for the fileids file; utterance length (must be positive); files exist
WARNING: Error in '/home/tutorial/mybase/etc/mybase.fileids', the feature file '/home/tutorial/mybase/feat/speaker_1/33.mfc' does not exist, or is empty

phase 7 ends without error so I am not sure if I even need to resolve this errors.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2018-08-20
  
  It tells you
  
  ** Please check the log file for details.**
  
  So you'd better check the log first.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Giro Rarielli - 2018-08-20
    
    ERROR: "sphinx_fe.c", line 122: Failed to read RIFF headerINFO: sphinx_fe.c(787): Converting ...
    And it is repeating same error every 100th line approx.
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
    - Nickolay V. Shmyrev - 2018-08-20
      
      So your files are not proper WAV format, they miss the header.
      
      If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
      - Giro Rarielli - 2018-08-20
        
        I found out that my files wasn't in correct order and I don't get this error message anymore. But when I try -s decode run I get:
        
        MODULE: DECODE Decoding using models previously trained
        Decoding 223 segments starting at 0 (part 1 of 1)
        0%
        Aligning results to find error rate
        Can't open /home/mobrob/Desktop/tutorial/veprad/result/veprad-1-1.match
        word_align.pl failed with error code 65280 at /usr/local/lib/sphinxtrain/scripts/decode/slave.pl line 173.
        I tried this https://sourceforge.net/p/cmusphinx/discussion/help/thread/f7fa87f5/#e855/5221/98b6/f92e but nothing changed.
        
        EDIT: In logdir/decode is this:
        
        pocketsphinx_batch: error while loading shared libraries: libpocketsphinx.so.3: cannot open shared object file: No such file or directory
        
        Last edit: Giro Rarielli 2018-08-20
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
        
        Nickolay V. Shmyrev - 2018-08-21
        
        This error means you didn't configure linker with ld.so.conf or LD_LIBRARY_PATH.
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Giro Rarielli - 2018-08-21

Now I got this in mybase.1.1-1.bw.log:

ERROR: "main.c", line 421: # of codebooks in mean/var files, 111, inconsistent with ts2cb mapping 108

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2018-08-21
  
  This error means your phoneset has duplicated phones or not in sync in other way.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Giro Rarielli - 2018-08-22

Much appreciate for past answers,
Somehow I removed all errors but I got message:

SENTENCE ERROR: 100.0% (223/223) WORD ERROR RATE: 100.0%

I researched a bit and find out that some of my recordings are around 2 seconds or even less (which is too short if I recall correctly), so I decided to merge every .txt and .wav file to get larger duration. The problem is that after that I can't get to this same line and I have new error during decode phase:

INFO: batch.c(729): Decoding 'speaker_1/File_0001'
ERROR: "batch.c", line 389: Failed to open /mybase/feat/speaker_1/File_0001.mfc: No such file or directory

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2018-08-22
  
  which is too short if I recall correctly
  
  No, it is acceptable
  
  so I decided to merge every .txt and .wav file to get larger duration
  
  That probably was not a great idea
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Giro Rarielli - 2018-08-23
    
    I have backup of previous files (SENTENCE ERROR: 100.0% (223/223) WORD ERROR RATE: 100.0%), so what do you reccomend?
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Giro Rarielli - 2018-09-10

I got it working somehow, thanks for help.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Sphinxtrain - ERRORS while Extracting features from segments and missing .mfc files

Speech Recognition Toolkit

Forums

Help

Sphinxtrain - ERRORS while Extracting features from segments and missing .mfc files document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

Sphinxtrain - ERRORS while Extracting features from segments and missing .mfc files