Menu

#486 MP3 decode truncates audio from beginning of WAV file

Compatibility
closed-invalid
nobody
None
5
2021-03-10
2017-10-09
raynebc
No

I work on an application (https://github.com/raynebc/editor-on-fire/) that involves users beat syncing notes to music for creating content for rhythm games. The application natively works with the OGG format, so when the user provides MP3 format audio, the command line LAME binary is used to decode it to WAV and then OggEnc is used to encode it to OGG format. A user brought it to my attention that when feeding a particular MP3 into this chain of tools, the result is missing about 50ms of audio from the beginning of the file. I confirmed this to be the case by comparing the results of using LAME 3.99.5 and Audacity 2.1.2 to each convert the affected MP3 file to WAV.

I have more reason to believe Audacity's results were correct because the user created a beat map in a third party utility (http://goplayalong.com/) whose results line up with Audacity's decoding and are of by about 50ms when compared with LAME's decoding.

The MP3 file in question is copyrighted so I cannot post it here, but I can provide it privately if that is allowed for debugging purposes. Otherwise please let me know if there are any workarounds I can pursue, or if there is any analysis/information about the MP3 file I can provide to confirm if this is a defect/limitation with LAME.

Discussion

  • Elio Blanca

    Elio Blanca - 2017-10-11

    Lame is being updated since 3.99.5 and version 3.100 will soon be released with many fixes, also mpglib related. Give it a try.
    Having said that, mp3 decoding is not the main purpose of lame, there are already several tools for that, such as madplay or mpg123, how they behave with your troublesome input file?

     
  • raynebc

    raynebc - 2017-10-11

    I tried mpg123 since it had a readily available Windows build. Its results are very similar to that of Lame as far as the WAV file created. The file sizes between the two are EXACTLY the same, so presumably they decoded the same number of samples. A binary file comparison between them fails to match, but comparing the two WAVs in Audacity shows they have approximately the same amount of audio missing from the beginning of the file as opposed to the fully intact results from Audacity's decode. I was thinking of trying Sox the other day, but it seems to simply leverage LAME for MP3 encoding/decoding so that wouldn't be a good test.

    If there's a recent LAME 3.100 Windows build I can test, I'd be happy to do so. All I could immediately find for that were some 4-5 year old builds, and presumably if LAME is getting a new release it has been worked on more recently than that.

     
    • Josep Maria Antolín Segura

      Hello.

      The MP3 fileformat is not sample exact. Concretely, it has an encoding
      delay, a decoding delay, and an undetermined amount of extra samples due to
      the fact that it is encoded with blocks of fixed size.

      In order to avoid these limitations, encoders like LAME add metadata
      information into the file, that can be used with decoders that support it.
      There are a few standards, but the most used are the LAME tag and the
      VBRi/fraunhoffer tag. (I believe that itunes also has its own method).

      Given this, i believe that either the file doesn't have the tag and is
      encoded in a way that the delay is shorter tha usual, or that the metadata
      is incorrect, or even that the first frame should be decoded, but it has
      metadata that tells to ignore it.

      It could even happen that the file was encoded in a special way, or that it
      was cut from a longer file with a program that added the metadata without
      adding an extra frame.

      In other words, even if the audio is there, the bug might not be on the
      decoder.

       
  • raynebc

    raynebc - 2017-10-11

    Is there a utility I can use to generate metadata about the MP3 to show if one of those abnormalities is the reason for this problem?

     
    • Josep Maria Antolín Segura

      I don't know of an easy tool for that. The foobar2000 player is able to
      edit the delay metadata, but is hidden by default.

      El dia 11/10/2017 22:53, "raynebc" raynebc@users.sf.net va escriure:

      Is there a utility I can use to generate metadata about the MP3 to show if
      one of those abnormalities is the reason for this problem?


      Status: open
      Group: Compatibility
      Created: Mon Oct 09, 2017 10:11 PM UTC by raynebc
      Last Updated: Wed Oct 11, 2017 06:36 PM UTC
      Owner: nobody

      I work on an application (https://github.com/raynebc/editor-on-fire/)
      that involves users beat syncing notes to music for creating content for
      rhythm games. The application natively works with the OGG format, so when
      the user provides MP3 format audio, the command line LAME binary is used to
      decode it to WAV and then OggEnc is used to encode it to OGG format. A user
      brought it to my attention that when feeding a particular MP3 into this
      chain of tools, the result is missing about 50ms of audio from the
      beginning of the file. I confirmed this to be the case by comparing the
      results of using LAME 3.99.5 and Audacity 2.1.2 to each convert the
      affected MP3 file to WAV.

      I have more reason to believe Audacity's results were correct because the
      user created a beat map in a third party utility (http://goplayalong.com/)
      whose results line up with Audacity's decoding and are of by about 50ms
      when compared with LAME's decoding.

      The MP3 file in question is copyrighted so I cannot post it here, but I
      can provide it privately if that is allowed for debugging purposes.
      Otherwise please let me know if there are any workarounds I can pursue, or
      if there is any analysis/information about the MP3 file I can provide to
      confirm if this is a defect/limitation with LAME.


      Sent from sourceforge.net because you indicated interest in
      https://sourceforge.net/p/lame/bugs/486/

      To unsubscribe from further messages, please visit
      https://sourceforge.net/auth/subscriptions/

       
    • Elio Blanca

      Elio Blanca - 2017-10-12

      Before trying any fixing, some further diagnostics may be useful.
      Try decoding the file with madplay (here an old windows build) and check the resulting wave.
      Also, analyze the file with mp3guessenc and post its log.

       
  • raynebc

    raynebc - 2017-10-12

    When I converted it to WAV with madplay, it output the following to console:

    error: frame 0: lost synchronization
    12335 frames decoded (0:05:22.2), +0.1 dB peak amplitude, 12 clipped samples

    The resulting WAV file was also missing some of the beginning of the song compared to the Audacity decode, although only about 25ms instead of the roughly 50ms that the other affected tools were missing.

    Here is the mpe3guessenc readout starting after the ID3 tag v1.1 readout:

    ID3tag v2.3.0 found (offset 0x00000000).
    ID3tag v2 is 320437 bytes long, skipping...

    Unexpected 1 bytes before VBR tag.
    Xing tag detected into the first frame (1044 bytes long).
    Tag offset : 320438 (0x0004E3B6)
    File size : 12889860 bytes
    Number of frames : 12335
    Quality : 57
    TOC : 100 bytes (100 entries, 1 byte each)
    Lame tag : yes
    Lame tag details...
    Lame short string : LAME3.98r
    Tag revision : 0
    Bitrate strategy : CBR, 255 kbps or higher
    Lowpass value : 20500
    nspsytune : yes
    nssafejoint : yes
    nogap continued : no
    nogap continuation : no
    ATH type : 4
    Encoder delay (start) : 576 samples
    Encoder padding (end) : 2088 samples
    Encoding mode : joint stereo
    Unwise settings : not used
    Source frequency : 44.1 kHz
    Preset : 320 kbps
    Originally encoded : 12889860 bytes

    First frame found at 321482 (0x0004E7CA).

    Detected MPEG stream version 1 layer III, details follow.
    File size : 13210426 bytes
    Audio stream size : 12888816 bytes (including tag: 12889860)
    Length : 0:05:22.220 (322.220 seconds)
    Data rate : 320.0 kbps
    Number of frames : 12335
    Blocks per frame : 4 (granules per frame 2, channels per granule 2)
    Audio samples per frame : 1152
    Audio frequency : 44100 Hz
    Length of original audio : 14207256 samples
    Encoding mode : joint stereo
    Min global gain : l= 48 r= 66
    Max global gain : l=210 r=210
    Flags
    Error protection : no
    Copyrighted : no
    Original : yes
    Emphasis : none

    Mode extension: stereo mode frame count
    Simple stereo : 3693 (29.9%)
    Mid-side stereo : 8642 (70.1%)


    sum : 12335

    Block usage
    Long block granules : 47092 (95.4%)
    Switch block granules : 1352 ( 2.7%)
    Short block granules : 896 ( 1.8%)


    sum : 49340

    Ancillary data
    Total amount : 639016 bytes (5.0%)
    Bitrate : 15.9 kbps
    Min packet : 1 bytes
    Max packet : 1037 bytes
    Max reservoir : 29 bytes
    Scalefactor scaling used : yes
    Scalefactor selection information used : yes
    Padding used : yes

    Frame histogram
    320 kbps : 12335 (100.0%), size distr: [ 1259 x1044 B, 11076 x1045 B]

    0 header errors.

    Encoder string : LAME3.98.2

    Maybe this file is encoded by Lame

     
    • Elio Blanca

      Elio Blanca - 2017-10-13

      I think the relevant information here is

      Length of original audio : 14207256 samples
      

      do your decoded files match this size?
      If the used decoders do not take into account the tag contents, then the resulting file may be larger (it still contains delay and padding samples).

      Also, based on madplay output, I can figure out some (not serious) corruption perhaps happened into samples area of the very first frame.

       
  • raynebc

    raynebc - 2017-10-20

    Sorry about the delay, I forgot to check on this. When I load Audacity's decoded WAV file into it and seek to the end, it reports the position as 14,211,072 samples. When I load either the WAV decoded by LAME or by mpg123 into Audacity, it reports the end position as being the cited 14,207,256 samples. I'm not sure what makes up the 3816 sample discrepancy in its entirety, but the missing ~50ms of audio from the beginning would be in the neighborhood of 2200 samples.

     
    • Elio Blanca

      Elio Blanca - 2017-10-20

      It seems both lame and mpg123 decode your file taking into account the lame tag, so they both cut away the heading encoder delay and the encoder padding (at the file end). I would say they are the most precise decoders in this whole series.
      The 3816 sample discrepancy is easily explained, as 3816=576+2088+1152.
      576 is the amount of encoder delay, these samples should be discarded.
      2088 is the amount of encoder padding, these samples should be discarded as well.
      1152 is the amount of samples into a further mpeg frame. It is likely your very first decoder decoded also the first mpeg frame, which is the lame tag. Here lame stores useful information (shown by mp3guessenc) but no audio data at all, so these samples have actually useless content. As you can see, at the very beginning the delay becomes 1152+576=1728 samples, which is 39 ms.

      Edit: I think the wave created by madplay has a 576+2088 sample discrepancy (as it doesn't discard any samples), has it?

       

      Last edit: Elio Blanca 2017-10-20
  • raynebc

    raynebc - 2017-10-20

    The madplay decode has 14209920 samples, so yes it has a discrepancy of 2664 (576+2088) samples.

    Thank you for your information, I'll let the users know about this.

     
  • Alexander Leidinger

    • status: open --> closed-invalid
     

Log in to post a comment.

MongoDB Logo MongoDB