#323 ID3v1 set as ISO_8859_1 but encoded as UTF-8

Bug
open
nobody
None
5
2005-05-11
2005-05-11
No

I configured grip as suggested by the faq: id3v1 tags
encoded as ISO_8859_1 and id3v2 encoded as UTF-8.
but it looks like the id3v1 are being recorded as UTF-8.

Here is the id3v1 in the begining of the fhe file of
the music "Botăo de Rosa":
BotĂŁo de Rosa
and the id3v2 (in the end of file):
BotĂŁo de Rosa

Ops, I've just tried to encode it with "ISO-8859-1" and
everything worked fine, the track appeared as "Botăo de
Rosa".

So this bug is for two things:

1) Correct the documentation! Change the "ID3v1
Character set encoding — The character encoding to be
used for ID3v1 tags. ISO_8859_1 is recommended." to
"ID3v1 Character set encoding — The character encoding
to be used for ID3v1 tags. ISO-8859-1 is recommended."

2) Print a warning if the id3 character encoding isn't
supported (an unknown encoding, as ISO_8859_1)

Discussion

  • Logged In: YES
    user_id=129944

    I've been investigating more, and it looks like grip is
    using the encoding configure for id3v1 in id3v2 and vice-versa.

    I believe the id3v2 is in the end of the file, and the
    id3v1 in the begining, right? I'm opening the generated mp3
    and the encodings are inverted.

    In Amarok, my encondings just work when I set "decode id3v1
    tags as UTF-8". That now makes sense.

    It looks like I just discovered that I have hundreds of
    files incorrectly encoded:-(

     
  • Logged In: YES
    user_id=129944

    ignore my first comment. Id3v2 tags are above the audio
    data, and id3v1 are below. The submited bug is still valid.

     
  • Logged In: YES
    user_id=129944

    Even more info I discovered.

    It looks like the id3lib used by Gimp now just supports
    id3v2.3 and can't write UTF-8 id3 tags.

    If you configure the recommend value of UTF-8 for id3v2
    you'll have BROKEN id3 tags. The text of your tag will be
    encoded as UTF-8, but the encoding byte in the id3 frame
    aren't correctly set. The readers can't discover your
    encoding, they believe it is the default value of iso-8859-1
    (aka, latin1).

    MY RECOMENDATION:
    set your both encoding configuration to ISO-8859-1 (don't
    use underscores, in my computer it makes grip save the
    strings as utf8 but registering an incorrect encoding).

    Sure, if you don't use a western Europe language, you are fckd.
    It looks like id3lib supports utf-16, but I'm not really
    sure. Maybe someone here could give it a try. Latin-1 is
    enough for me.

    Real soon I'll post here a script to fix your already tagged
    id3 tags.

     
  • Normand Robert
    Normand Robert
    2006-07-14

    Logged In: YES
    user_id=1555345

    Please note that it appears that UTF-8 is not part of the
    id3v2 specification. Here is my understanding of the
    problem. It would be nice if someone could confirm the logic.

    Looking at the standard:
    http://www.id3.org/id3v2.3.0.txt
    It says "All Unicode strings [UNICODE] use 16-bit unicode 2.0"

    So it is incorrect to create a file using this type of
    encoding because there is no mechanism in the tag for saying
    it is UTF-8. Only that it is ISO-8859-1 or UNICODE.

    I have never been able to create an mp3 with UTF-16 tags in
    grip. By default I end up with UTF-8 which btw is correct
    for ogg/vorbis. Maybe I am really dumb? I have looked at
    the source and I believe it would have to map the UTF-8
    string used to represent text internally into UTF-16 and I
    see no such code anywhere.

    As a result rhythmbox and easytag will show strange
    characters because it is interpreting tags created by grip
    as UTF-16 which is correct when in fact grip writes them out
    as UTF-8

    The latest version of easytag (1.99.xx) allows one to
    convert multiple mp3s with UTF-8 tags to UTF-16. Make sure
    you back up your files before trying this. I have had no
    problems but you never know.

    Start easytag
    Go to Preference -> ID3 Tag Settings.
    In the box that says "For ISO-8859-1 fields override with
    the following character encoding (for expert use only)"
    Set "Use non-standard character set for reading ID3 tags" to
    "Unicode (UTF-8)". That language is another hint that UTF-8
    should not be used!

    Now rescan the directory(ies) where the files to be
    corrected are. Now your latin characters (for example) will
    finally look correct because they are interpreted using the
    non-standard UTF-8 char set.

    You need to select all of the files you want to fix and do a
    "forced save" which will rewrite the tags as UTF-16.
    See "Force Saving File(s)" in the file menu.

    Be careful to undo these setting in the preferences because
    the next time your UTF-16 tags will be interpreted as UTF-8
    which will give strange results.