I work on an application (https://github.com/raynebc/editor-on-fire/) that involves users beat syncing notes to music for creating content for rhythm games. The application natively works with the OGG format, so when the user provides MP3 format audio, the command line LAME binary is used to decode it to WAV and then OggEnc is used to encode it to OGG format. A user brought it to my attention that when feeding a particular MP3 into this chain of tools, the result is missing about 50ms of audio from the beginning of the file. I confirmed this to be the case by comparing the results of using LAME 3.99.5 and Audacity 2.1.2 to each convert the affected MP3 file to WAV.
I have more reason to believe Audacity's results were correct because the user created a beat map in a third party utility (http://goplayalong.com/) whose results line up with Audacity's decoding and are of by about 50ms when compared with LAME's decoding.
The MP3 file in question is copyrighted so I cannot post it here, but I can provide it privately if that is allowed for debugging purposes. Otherwise please let me know if there are any workarounds I can pursue, or if there is any analysis/information about the MP3 file I can provide to confirm if this is a defect/limitation with LAME.
Lame is being updated since 3.99.5 and version 3.100 will soon be released with many fixes, also mpglib related. Give it a try.
Having said that, mp3 decoding is not the main purpose of lame, there are already several tools for that, such as madplay or mpg123, how they behave with your troublesome input file?
I tried mpg123 since it had a readily available Windows build. Its results are very similar to that of Lame as far as the WAV file created. The file sizes between the two are EXACTLY the same, so presumably they decoded the same number of samples. A binary file comparison between them fails to match, but comparing the two WAVs in Audacity shows they have approximately the same amount of audio missing from the beginning of the file as opposed to the fully intact results from Audacity's decode. I was thinking of trying Sox the other day, but it seems to simply leverage LAME for MP3 encoding/decoding so that wouldn't be a good test.
If there's a recent LAME 3.100 Windows build I can test, I'd be happy to do so. All I could immediately find for that were some 4-5 year old builds, and presumably if LAME is getting a new release it has been worked on more recently than that.
Hello.
The MP3 fileformat is not sample exact. Concretely, it has an encoding
delay, a decoding delay, and an undetermined amount of extra samples due to
the fact that it is encoded with blocks of fixed size.
In order to avoid these limitations, encoders like LAME add metadata
information into the file, that can be used with decoders that support it.
There are a few standards, but the most used are the LAME tag and the
VBRi/fraunhoffer tag. (I believe that itunes also has its own method).
Given this, i believe that either the file doesn't have the tag and is
encoded in a way that the delay is shorter tha usual, or that the metadata
is incorrect, or even that the first frame should be decoded, but it has
metadata that tells to ignore it.
It could even happen that the file was encoded in a special way, or that it
was cut from a longer file with a program that added the metadata without
adding an extra frame.
In other words, even if the audio is there, the bug might not be on the
decoder.
Is there a utility I can use to generate metadata about the MP3 to show if one of those abnormalities is the reason for this problem?
I don't know of an easy tool for that. The foobar2000 player is able to
edit the delay metadata, but is hidden by default.
El dia 11/10/2017 22:53, "raynebc" raynebc@users.sf.net va escriure:
Before trying any fixing, some further diagnostics may be useful.
Try decoding the file with madplay (here an old windows build) and check the resulting wave.
Also, analyze the file with mp3guessenc and post its log.
When I converted it to WAV with madplay, it output the following to console:
error: frame 0: lost synchronization
12335 frames decoded (0:05:22.2), +0.1 dB peak amplitude, 12 clipped samples
The resulting WAV file was also missing some of the beginning of the song compared to the Audacity decode, although only about 25ms instead of the roughly 50ms that the other affected tools were missing.
Here is the mpe3guessenc readout starting after the ID3 tag v1.1 readout:
ID3tag v2.3.0 found (offset 0x00000000).
ID3tag v2 is 320437 bytes long, skipping...
Unexpected 1 bytes before VBR tag.
Xing tag detected into the first frame (1044 bytes long).
Tag offset : 320438 (0x0004E3B6)
File size : 12889860 bytes
Number of frames : 12335
Quality : 57
TOC : 100 bytes (100 entries, 1 byte each)
Lame tag : yes
Lame tag details...
Lame short string : LAME3.98r
Tag revision : 0
Bitrate strategy : CBR, 255 kbps or higher
Lowpass value : 20500
nspsytune : yes
nssafejoint : yes
nogap continued : no
nogap continuation : no
ATH type : 4
Encoder delay (start) : 576 samples
Encoder padding (end) : 2088 samples
Encoding mode : joint stereo
Unwise settings : not used
Source frequency : 44.1 kHz
Preset : 320 kbps
Originally encoded : 12889860 bytes
First frame found at 321482 (0x0004E7CA).
Detected MPEG stream version 1 layer III, details follow.
File size : 13210426 bytes
Audio stream size : 12888816 bytes (including tag: 12889860)
Length : 0:05:22.220 (322.220 seconds)
Data rate : 320.0 kbps
Number of frames : 12335
Blocks per frame : 4 (granules per frame 2, channels per granule 2)
Audio samples per frame : 1152
Audio frequency : 44100 Hz
Length of original audio : 14207256 samples
Encoding mode : joint stereo
Min global gain : l= 48 r= 66
Max global gain : l=210 r=210
Flags
Error protection : no
Copyrighted : no
Original : yes
Emphasis : none
Mode extension: stereo mode frame count
Simple stereo : 3693 (29.9%)
Mid-side stereo : 8642 (70.1%)
sum : 12335
Block usage
Long block granules : 47092 (95.4%)
Switch block granules : 1352 ( 2.7%)
Short block granules : 896 ( 1.8%)
sum : 49340
Ancillary data
Total amount : 639016 bytes (5.0%)
Bitrate : 15.9 kbps
Min packet : 1 bytes
Max packet : 1037 bytes
Max reservoir : 29 bytes
Scalefactor scaling used : yes
Scalefactor selection information used : yes
Padding used : yes
Frame histogram
320 kbps : 12335 (100.0%), size distr: [ 1259 x1044 B, 11076 x1045 B]
0 header errors.
Encoder string : LAME3.98.2
Maybe this file is encoded by Lame
I think the relevant information here is
do your decoded files match this size?
If the used decoders do not take into account the tag contents, then the resulting file may be larger (it still contains delay and padding samples).
Also, based on madplay output, I can figure out some (not serious) corruption perhaps happened into samples area of the very first frame.
Sorry about the delay, I forgot to check on this. When I load Audacity's decoded WAV file into it and seek to the end, it reports the position as 14,211,072 samples. When I load either the WAV decoded by LAME or by mpg123 into Audacity, it reports the end position as being the cited 14,207,256 samples. I'm not sure what makes up the 3816 sample discrepancy in its entirety, but the missing ~50ms of audio from the beginning would be in the neighborhood of 2200 samples.
It seems both lame and mpg123 decode your file taking into account the lame tag, so they both cut away the heading encoder delay and the encoder padding (at the file end). I would say they are the most precise decoders in this whole series.
The 3816 sample discrepancy is easily explained, as 3816=576+2088+1152.
576 is the amount of encoder delay, these samples should be discarded.
2088 is the amount of encoder padding, these samples should be discarded as well.
1152 is the amount of samples into a further mpeg frame. It is likely your very first decoder decoded also the first mpeg frame, which is the lame tag. Here lame stores useful information (shown by mp3guessenc) but no audio data at all, so these samples have actually useless content. As you can see, at the very beginning the delay becomes 1152+576=1728 samples, which is 39 ms.
Edit: I think the wave created by madplay has a 576+2088 sample discrepancy (as it doesn't discard any samples), has it?
Last edit: Elio Blanca 2017-10-20
The madplay decode has 14209920 samples, so yes it has a discrepancy of 2664 (576+2088) samples.
Thank you for your information, I'll let the users know about this.