Menu

sphinx_fe duplicates first frame

Help
2016-01-25
2016-01-25
  • Sean Robertson

    Sean Robertson - 2016-01-25

    Hello Nikolay,

    Me again. I think I found a bug. I've been looking closely at the MFCCs generated by sphinx_fe (r13167). I noticed that some of the MFCC files have duplicates of the first frame at the beginning of the file (each subsequent has the same coefficients as the last). It's very unlikely that this is because the audio is identical for the first few frames, because:

    • 11 frames are equal, at 100 frames/sec this would over a second of identical audio
    • It is always 11 frames of audio.
    • The MFCC file is about 11 frames longer than expected. I set -remove_silence no so that no frames are discarded, at which point a file that is, for example, 3.84 seconds long generates 395-396 frames.

    All of these .wav files were recorded under similar conditions. They were unpacked from .caf files, but sampled at PCM mono 16000Hz. On OSX, the command to unpack used was afconvert -f WAVE -d I16 <file_name>.

    The rest of the corpus (not unpacked from .caf) also generated identical first frames, but far less often and with roughly the correct number of total frames. However, often 3-4 frames were identical, which seems very unlikely.

    This Dropbox link has a number of files related to this problem. In it, there is

    • yo and yo2 contain rows of <expected_frame_count> <actual_frame_count> <difference> <number_first_duplicated> for each file with duplicate first frames in the corpus subset and entire corpus, respectively.
    • yos and yos2 are sorted yo and yo2, respectively.
    • duplicate_frames.sh, containing excerpts of bash scripts I used to generate both the features and the output for the yo files.
    • yo.wav, an example offending file.

    I'll continue looking into this, but I would appreciate some help.

    Many thanks,
    Sean

     
    • Nickolay V. Shmyrev

      Hi Sean

      This is easy, the file you trying to convert has very large WAV header (several kilobytes) while sphinx tools expect short 44 bytes of wav header only. If you copy file with sox it will remove long zero-filled header and results will be ok

      Otherwise it tries to handle header zeros as signal and creates identical frames.

      We have plans to support complex wav headers but they are not top priority.

       
  • Sean Robertson

    Sean Robertson - 2016-01-25

    That would do it :/ Yeah, it's got a block called "FLLR" in the header. Apparently Apple recordings like to do that :/

    Thanks and sorry for the waste of time.

     

Log in to post a comment.