Id3 tags encoding

Help
Anonymous
2012-07-24
2012-12-11
  • Anonymous - 2012-07-24

    Hello,

    Recently I noticed that recent versions of dll have a problem with returning non-english tags like Title, Album etc, though previous versions return correct output. After some code investigation I found that problem introduced in 0.7.42 with change:
    x #3173391, Id3v2: was using local encoding instead of ISO-8859-1
    In code, Get_Local calls was replaced with Get_ISO_8859_1, which get wrong output. Why this was changed?

    I know, it's not possible to determine for sure which ANSI charset was used in tag encoding. But why assuming it encoded in western encoding? Encoding with local codepage (or user defined codepage) looks quite valuable compromise in case of ANSI text in most cases.

    Is this any way now to force library to encode ANSI-strings with local codepage instead ISO_8859_1? Or maybe a way to check if output was converted from ANSI or was actual unicode - in this case I can at least deal with output.

     
  • Jerome Martinez

    Jerome Martinez - 2012-07-25

    Recently I noticed that recent versions of dll have a problem with returning non-english tags

    Actually previos versions had a problem.

    I know, it's not possible to determine for sure which ANSI charset was used in tag encoding. But why assuming it encoded in western encoding?

    I assume nothing, I am conform to specifications (previously, I was not).
    See http://id3.org/id3v2.4.0-structure
    "Frames that allow different types of text encoding contains a text
       encoding description byte. Possible encodings:

         $00   ISO-8859-1 . Terminated with $00.
         $01   UTF-16  encoded Unicode  with BOM. All
               strings in the same frame SHALL have the same byteorder.
               Terminated with $00 00.
         $02   UTF-16BE  encoded Unicode  without BOM.
               Terminated with $00 00.
         $03   UTF-8  encoded Unicode . Terminated with $00."

    You can see that local encoding (depending of your OS) is not accepted, and that ISO 8859 is explicitely indicated.

    Encoding with local codepage (or user defined codepage) looks quite valuable compromise in case of ANSI text in most cases.

    But they are not conform to specifications.
    I am aware that a lot of software write local codepage instead of ISO 8859, but it is a bug from this software.

    Is this any way now to force library to encode ANSI-strings with local codepage instead ISO_8859_1?

    Currently: no.
    You can add a feature request https://sourceforge.net/tracker/?group_id=86862&atid=581184

    Or maybe a way to check if output was converted from ANSI or was actual unicode - in this case I can at least deal with output.

    It is not possible to detect automaticly different code page (ANSI/ISO 8859 / your local code page) because the whole name space is used in any case.
    Unicode string have a different text encoding description byte, no problem.

     
  • Anonymous - 2012-07-25

    Thanks for the answer.

    But they are not conform to specifications.
    I am aware that a lot of software write local codepage instead of ISO 8859, but it is a bug from this software.

    Got the point.

    It is not possible to detect automaticly different code page (ANSI/ISO 8859 / your local code page) because the whole name space is used in any case.

    Exactly. But your parser when reading tag can determine at least three cases: UTF-16(LE/BE), UTF-8 and any other (ANSI) encoding, right? This would be enough for me, if I would know what kind of source was. Unfortunately, library output is always unicode (or already reencoded ANSI for ansi functions) and me, as end user/developer can't determine that anymore.

    Btw, simple idea: maybe it's possible to have for some encoding-critical info (Title, Performer, Album) additional options like Title/Raw, Performer/Raw etc, which will return the raw unconverted byte string? It's looks like only 5-6 id3 tags that may require that.

     
  • Jerome Martinez

    Jerome Martinez - 2012-07-25

    UTF-16(LE/BE), UTF-8 and any other (ANSI) encoding, right?

    Do you really have text encoding description byte $00 with UTF-16 or UTF-8??? The issue uses to be a problem between ISO 8859 and local codepage, and in that case it is not possible to differenciate.

    Btw, simple idea: maybe it's possible to have for some encoding-critical info (Title, Performer, Album) additional options like Title/Raw, Performer/Raw etc, which will return the raw unconverted byte string? It's looks like only 5-6 id3 tags that may require that.

    Very specific request (and only for buggy files, and the goal of MediaInfo is especially to remove such complexity to the library user, it should not take care of the code page used), it would be only on paid support. Lot of things are possible, up to very specific implementations and options, but not for free.

     
  • Anonymous - 2012-07-26

    Very specific request (and only for buggy files, and the goal of MediaInfo is especially to remove such complexity to the library user, it should not take care of the code page used), it would be only on paid support. Lot of things are possible, up to very specific implementations and options, but not for free.

    I'm not writing the commercial software, so unlikely will going this way…

    Finally, it's not difficult to write own id3 parser, but it looks little overhead for just few tags, especially because I get all other info successfully with your library. It also will cause double file processing. So I tried to find a way correctly get all info via library. Ok, will try to find workaround.

    Thanks anyway for things clarification.

     
  • Jerome Martinez

    Jerome Martinez - 2012-07-26

    It is open source software: you can modify the source code for former behavior or add the feature ;-).

     
  • Anonymous - 2012-07-26

    It is open source software: you can modify the source code for former behavior or add the feature ;-).

    Of course. Surely, I can compile own modified build (though I'm not perfect in C++). But main reason not to do that - it's difficult to maintain own library builds. I don't want users depend on it.

     
  • sam bul

    sam bul - 2012-08-18

    Why Win 7 Explorer shows ID3 Tags info in File Properties - Audio Properties and ID-3 tabs in correct language regardless of Locale set and encoding used in ID3 tags? How Windows determines encoding used, whether its mentioned in the above standard or not? Whom the user would blame, if WE wouldn't show tags correctly? Isn't it called "compatibility"?

    The impression is, ID3 tags suddenly aren't shown in correct encoding by MediaInfo to generate more revenue. Its not a minor issue, many people have large music libraries collected over the years. When updating your package, the author should ensure compatibility with previous standards and realities. Otherwise why create a program that adheres to artificial restrictions most other programs don't comply or were not complying with. Whom MediaInfo is for - normal users or sterile environment no-one has?

    That feature was purposely removed without regard to users with large music collections, so may be its time to move on…

     

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:





No, thanks