Menu

Multiple data chunks

2015-07-21
2015-07-21
  • Rahul Shivaji Pawar

    I have obtained single channel files from stereo data with 8 bit sample encoding. However the number of samples in the original stereo files were odd and hence I get the cannot read multiple data chunks error during mfcc feature extraction. [ from my reading about .wav format the data chunk must always have even number of bytes, so I think I should just add a sample or delete one of the samples but I could not find a sox command to do that ]

    The riff_chunk_size is 1 more byte than riff_chunk_read + data_chunk_size .

    ERROR (compute-mfcc-feats:Read():wave-reader.cc:224) Expected 1468838 bytes in RIFF chunk, but after first data block there will be 36 + 1468801 bytes (we do not support reading multiple data chunks).
    WARNING (compute-mfcc-feats:Read():feat/wave-reader.h:149) Exception caught in WaveHolder object (reading).
    WARNING (compute-mfcc-feats:LoadCurrent():util/kaldi-table-inl.h:233) TableReader: failed to load object from /home/rpawar7/kaldi-trunk/SID/Audio/train/data/short2/tduva_A.wav

    Input File : 'tduva_A.wav'
    Channels : 1
    Sample Rate : 8000
    Precision : 8-bit
    Duration : 00:03:03.60 = 1468801 samples ~ 13770 CDDA sectors
    File Size : 1.47M
    Bit Rate : 64.0k
    Sample Encoding: 8-bit Unsigned Integer PCM

    Any way I can sort this out using sox ?

     

    Last edit: Rahul Shivaji Pawar 2015-07-21
    • Jan "yenda" Trmal

      Hmmm... And just
      sox tduva_A.wav -c 1 -r 8000 -t wav output.wav
      won't help? I think I managed to "normalize" wav once, doing this.
      y.

      On Tue, Jul 21, 2015 at 10:53 AM, Rahul Shivaji Pawar rpawar7@users.sf.net
      wrote:

      I have obtained single channel files from stereo data with 8 bit sample
      encoding. However the number of samples in the original stereo files were
      odd and hence I get the cannot read multiple data chunks error during mfcc
      feature extraction. [ from my reading about .wav format the data chunk must
      always have even number of bytes, so I think I should just add a sample or
      delete one of the samples but I could not find a sox command to do that ]

      The riff_chunk_size is 1 more byte than riff_chunk_size + data_chunk_size .

      ERROR (compute-mfcc-feats:Read():wave-reader.cc:224) Expected 1468838
      bytes in RIFF chunk, but after first data block there will be 36 + 1468801
      bytes (we do not support reading multiple data chunks).
      WARNING (compute-mfcc-feats:Read():feat/wave-reader.h:149) Exception
      caught in WaveHolder object (reading).
      WARNING (compute-mfcc-feats:LoadCurrent():util/kaldi-table-inl.h:233)
      TableReader: failed to load object from
      /home/rpawar7/kaldi-trunk/SID/Audio/train/data/short2/tduva_A.wav

      Input File : 'tduva_A.wav'
      Channels : 1
      Sample Rate : 8000
      Precision : 8-bit
      Duration : 00:03:03.60 = 1468801 samples ~ 13770 CDDA sectors
      File Size : 1.47M
      Bit Rate : 64.0k
      Sample Encoding: 8-bit Unsigned Integer PCM

      Any way I can sort this out using sox ?


      Multiple data chunks


      Sent from sourceforge.net because you indicated interest in <
      https://sourceforge.net/p/kaldi/discussion/1355347/>

      To unsubscribe from further messages, please visit <
      https://sourceforge.net/auth/subscriptions/>

       
      • Rahul Shivaji Pawar

        That does not help. The number of data bytes still remain odd because the the number of samples is odd.

        I tried extracting the mfcc features, the problem remained the same.

         

        Last edit: Rahul Shivaji Pawar 2015-07-21
        • Jan "yenda" Trmal

          Ok, but the odd number of samples should not matter to kaldi.
          If the wav has a reasonable size, can you send that wav to me?
          jtrmal@gmail.com
          y.

          On Tue, Jul 21, 2015 at 2:32 PM, Rahul Shivaji Pawar rpawar7@users.sf.net
          wrote:

          That does not help. The number of data bytes still remain odd because the
          the number of samples is odd.


          Multiple data chunks


          Sent from sourceforge.net because you indicated interest in <
          https://sourceforge.net/p/kaldi/discussion/1355347/>

          To unsubscribe from further messages, please visit <
          https://sourceforge.net/auth/subscriptions/>

           
          • Rahul Shivaji Pawar

            an example of a wav file is attached.

             

            Last edit: Rahul Shivaji Pawar 2015-07-21
            • Jan "yenda" Trmal

              Rahul, I think it's just because yoour wav is slightly weird. But not in
              the sense of number of samples but the fact that the header specifies 8kHz
              8bit linear encoding. If you recode it to 16bit samples, it works:

              wav-to-duration scp:<(echo 'B sox tjvyo_B.wav -r 8000 -b 16 -t wav -|')
              ark,t:-

              not sure if you can read that command, but this is how to wav.scp should
              look like:
              ID_B /usr/bin/sox /absolute/path/to/wavs/tjvyo_B.wav -r 8000 -b 16 -t wav
              -|

              Also you should make sure that the header is not actually damaged -- is it
              really 8bit linear and not 8bit A/mu-law?
              Listen to the the output from the sox output to verify the audio will not
              be scrambled.
              y.

              On Tue, Jul 21, 2015 at 2:38 PM, Rahul Shivaji Pawar rpawar7@users.sf.net
              wrote:

              I will send a wav file to you.


              Multiple data chunks


              Sent from sourceforge.net because you indicated interest in <
              https://sourceforge.net/p/kaldi/discussion/1355347/>

              To unsubscribe from further messages, please visit <
              https://sourceforge.net/auth/subscriptions/>

               
              • Daniel Povey

                Daniel Povey - 2015-07-21

                Yenda, i think it might actually be a bug in our reading code, due to not
                having good documentation to work from (AFAIK there is no "official"
                documentation).

                It comes from a check that the header+payload size equals the entire chunk
                size.
                From
                https://www.daubnet.com/en/file-format-riff
                I got the following, that there can be 1-byte padding. I'll change the
                code.
                Dan

                Basic File FormatThe price for the flexibility of holding different types
                of data is a file structure, that isn't easy to understand. A RIFF is -
                more or less - a hierarchical structure. The 'directory entries' are
                defined by chunks. Every chunk contains either data or a list of chunks.
                This document will occasionally refer to the analogy of a file system,
                every chunk is either a file or a subdirectory. All chunks have the same
                structure:
                NameSizeDescriptionID4 bytefour ASCII character identifier, padded with
                ASCII 32 (space) if less than 4 charactersSize4 bytesize of DataDataSize
                bytesthe 'payload'unused 1 byte present, if size is odd

                On Tue, Jul 21, 2015 at 1:22 PM, Jan jtrmal@users.sf.net wrote:

                ERROR! The markdown supplied could not be parsed correctly. Did you
                forget to surround a code snippet with "~~~~"?

                Rahul, I think it's just because yoour wav is slightly weird. But not in
                the sense of number of samples but the fact that the header specifies 8kHz
                8bit linear encoding. If you recode it to 16bit samples, it works:

                wav-to-duration scp:<(echo 'B sox tjvyo_B.wav -r 8000 -b 16 -t wav -|')
                ark,t:-

                not sure if you can read that command, but this is how to wav.scp should
                look like:
                ID_B /usr/bin/sox /absolute/path/to/wavs/tjvyo_B.wav -r 8000 -b 16 -t wav
                -|

                Also you should make sure that the header is not actually damaged -- is it
                really 8bit linear and not 8bit A/mu-law?
                Listen to the the output from the sox output to verify the audio will not
                be scrambled.
                y.

                On Tue, Jul 21, 2015 at 2:38 PM, Rahul Shivaji Pawar rpawar7@users.sf.net
                wrote:

                I will send a wav file to you.


                Multiple data chunks


                Sent from sourceforge.net because you indicated interest in <
                https://sourceforge.net/p/kaldi/discussion/1355347/>

                To unsubscribe from further messages, please visit <
                https://sourceforge.net/auth/subscriptions/>


                Multiple data chunks


                Sent from sourceforge.net because you indicated interest in https://sourceforge.net/p/kaldi/discussion/1355347/

                To unsubscribe from further messages, please visit https://sourceforge.net/auth/subscriptions/

                 
                • Jan "yenda" Trmal

                  OK, it's possible I got mislead by the 8bit-linear. It's a bit of a rarity.
                  y.

                  On Tue, Jul 21, 2015 at 7:23 PM, Daniel Povey danielpovey@users.sf.net
                  wrote:

                  Yenda, i think it might actually be a bug in our reading code, due to not
                  having good documentation to work from (AFAIK there is no "official"
                  documentation).

                  It comes from a check that the header+payload size equals the entire chunk
                  size.
                  From
                  https://www.daubnet.com/en/file-format-riff
                  I got the following, that there can be 1-byte padding. I'll change the
                  code.
                  Dan

                  Basic File FormatThe price for the flexibility of holding different types
                  of data is a file structure, that isn't easy to understand. A RIFF is -
                  more or less - a hierarchical structure. The 'directory entries' are
                  defined by chunks. Every chunk contains either data or a list of chunks.
                  This document will occasionally refer to the analogy of a file system,
                  every chunk is either a file or a subdirectory. All chunks have the same
                  structure:
                  NameSizeDescriptionID4 bytefour ASCII character identifier, padded with
                  ASCII 32 (space) if less than 4 charactersSize4 bytesize of DataDataSize
                  bytesthe 'payload'unused 1 byte present, if size is odd

                  On Tue, Jul 21, 2015 at 1:22 PM, Jan jtrmal@users.sf.net wrote:

                  ERROR! The markdown supplied could not be parsed correctly. Did you
                  forget to surround a code snippet with "~~~~"?

                  Rahul, I think it's just because yoour wav is slightly weird. But not in
                  the sense of number of samples but the fact that the header specifies
                  8kHz
                  8bit linear encoding. If you recode it to 16bit samples, it works:

                  wav-to-duration scp:<(echo 'B sox tjvyo_B.wav -r 8000 -b 16 -t wav -|')
                  ark,t:-

                  not sure if you can read that command, but this is how to wav.scp should
                  look like:
                  ID_B /usr/bin/sox /absolute/path/to/wavs/tjvyo_B.wav -r 8000 -b 16 -t
                  wav
                  -|

                  Also you should make sure that the header is not actually damaged -- is
                  it
                  really 8bit linear and not 8bit A/mu-law?
                  Listen to the the output from the sox output to verify the audio will not
                  be scrambled.
                  y.

                  On Tue, Jul 21, 2015 at 2:38 PM, Rahul Shivaji Pawar rpawar7@users.sf.net
                  wrote:

                  I will send a wav file to you.


                  [Multiple data chunks](

                  https://sourceforge.net/p/kaldi/discussion/1355347/thread/4f07e084/?limit=25#1741/ba2f/54ba/04d1/fc7a
                  )


                  Sent from sourceforge.net because you indicated interest in <
                  https://sourceforge.net/p/kaldi/discussion/1355347/>

                  To unsubscribe from further messages, please visit <
                  https://sourceforge.net/auth/subscriptions/>


                  Multiple data chunks


                  Sent from sourceforge.net because you indicated interest in <
                  https://sourceforge.net/p/kaldi/discussion/1355347/>

                  To unsubscribe from further messages, please visit <
                  https://sourceforge.net/auth/subscriptions/>


                  Multiple data chunks


                  Sent from sourceforge.net because you indicated interest in <
                  https://sourceforge.net/p/kaldi/discussion/1355347/>

                  To unsubscribe from further messages, please visit <
                  https://sourceforge.net/auth/subscriptions/>

                   
            • Daniel Povey

              Daniel Povey - 2015-07-21

              It looks to me like the issue was that for some reason the riff_chunk_size
              specified in the file was one byte larger than it should have been.
              According to https://www.daubnet.com/en/file-format-riff, the number of
              bytes will always be even (so I guess they padded with one). So maybe we
              need to modify the reading-in code to not complain if there is a mismatch
              of just one byte.
              I wish there was a reasonable format for audio files. The WAV
              specification feels like it was created by people who never went to
              college, and I'm not aware of any proper documentation for it.

              Dan

              On Tue, Jul 21, 2015 at 11:38 AM, Rahul Shivaji Pawar rpawar7@users.sf.net
              wrote:

              I will send a wav file to you.

              Multiple data chunks
              https://sourceforge.net/p/kaldi/discussion/1355347/thread/4f07e084/?limit=25#1741/ba2f/54ba/04d1/fc7a


              Sent from sourceforge.net because you indicated interest in
              https://sourceforge.net/p/kaldi/discussion/1355347/

              To unsubscribe from further messages, please visit
              https://sourceforge.net/auth/subscriptions/

               
  • Rahul Shivaji Pawar

    I am currently doing sox --ignore-length and this increases the number of samples by 1 ( in the header ).

    This I think should solve the problem.

     

    Last edit: Rahul Shivaji Pawar 2015-07-21