Menu

No Keyb-shortcuts, encoding issue

Martin
2007-12-27
2013-03-22
  • Martin

    Martin - 2007-12-27

    java -version gives:
    java version "1.5.0_13"
    Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_13-b05)
    Java HotSpot(TM) Client VM (build 1.5.0_13-b05, mixed mode, sharing)

    We are using it in our Linux environment.

    I've installed fixtag version 0.80. It's a great tool. Anyway we discovered
    some minor problems:

    - program is missing keyboard short cuts, e.g.
        alt-F for Menu file,
        alt-B for Menu batch,
        ...

    - changing tags from dialog "current file" using diacritic characters like
      umlaute äöü... on ID3-V1 tags works fine. On ID3-V2-tags results are strange.

    a) fixtag saves and reads them without problem.

    b) id3v2-tags containing diacritic characters written by fixtag can not be read by id3v2 or mediatomb.  "Umlaute äöüÄÖÜß" ist presented as "Umlaute dv|DV\_".

    The other way works fine. Tags written with id3v2 can be read by mediatomb and fixtag.

    id3v2 and mediatomb V0.10.0 use shared library id3lib V3.8.3.

    Regards

    Martin

     
    • Doug Laurence

      Doug Laurence - 2007-12-28

      Sorry about the lack of keyboard shortcuts. I admit I am rather lax about making sure the program is fully operable via the keyboard. I'll see if I can give the code a once-over and make sure that it can be operated without the mouse.

      As far as the encoding is concerned, it looks like a byte order problem, since you put 7 characters in and get 7 characters out. I've tried to follow the letter of the specification when it comes to writing out 16-bit unicode fields (the only encoding besides ISO-8859-1 that the spec supports).  Unfortunately, I have forced UTF-16 LE (little endian) for some fields based on bugs in Windows Media Player, which is VERY picky about handling unicode.  (I test the Windows apps, even though I develop and use FixTag in a Linux environment.)  I should be able to read any tag at all that is written according to the spec, but I have found some programs that fail to write the required byte order marker (BOM) before unicode text fields and just assume the unicode is little endian or big endian.  I've made sure I am interoperable with Windows Media Player and iTunes, but I haven't tested with mediatomb. I'll check it out and see if I can track down the problem - I'm sure it's my fault. Can you tell me which fields in particular have strange behavior (e.g. artist, title, etc.) or is it all of them?

      Doug

       
    • Martin

      Martin - 2007-12-28

      id3v2 and mediatomb share id3lib, so I use command line tool id3v2 and its output as reference (that's simpler than using mediatomb to do those tests).

      My test procedure was:

      - take a mp3 file, open fixtag, select appropriate folder, delete all tags from test mp3 file.

      - add id3v2 tags using fixtag dialog "current file":

        * Set Artiset = "Artist äöüÄÖÜß"
        * Set Album = "Album äöüÄÖÜß"
        * Set Title = "Title äöüÄÖÜß"
        * Set Comment = "Comment äöüÄÖÜß"

      - Save changes.

      - use id3v2 -l <file>, result is:

      TPE1 (Lead performer(s)/Soloist(s)): Artist dv|DV\_
      TALB (Album/Movie/Show title): Album dv|DV\_
      TRCK (Track number/Position in set):    
      TIT2 (Title/songname/content description): Title dv|DV\_
      TYER (Year):    
      COMM (Comments): ()[eng]: Comment dv|DV\_
      TCON (Content type): (255)  (255)

      So all relevant tags are affected.

      Martin

       
    • Doug Laurence

      Doug Laurence - 2007-12-29

      Well, I was easily able to duplicate your problem, but, unfortunately, I am not able to fix it...

      At first I thought I had messed up the unicode character conversion, so I dug into the hex dumps of the tags I generate and confirmed that they are all written out according to the spec.  A little further investigation into id3lib uncovered a whole slew of reported bugs concerning unicode characters. Since the characters you are trying to write are actually present in the ISO-8859-1 character set (which is 8-bit), I realized that I was writing out unicode when it was not necessary. I updated the character set logic to no longer write unicode unless absolutely necessary, and wrote some tags using your characters again in ISO-8859-1.  The bad news is that id3lib can't even handle ISO-8859-1 - it looks like it internally converts to ASCII no matter what.  I didn't use mediatomb, just the id3v2 program, so maybe it is just a screen output issue, but id3v2 writes out question marks for any characters above 0x80. The tags display okay in some other Linux applications (e.g. Rhythmbox and xmms)

      There are, apparently, some patches out there to fix id3lib if you feel up to it. Just google for "id3lib unicode" or check these out:

      https://bugs.launchpad.net/ubuntu/edgy/+source/id3lib3.8.3/+bug/54136
      http://www.wentnet.com/misc/id3lib.html

      I released an update anyway, since I think the ISO-8859-1 support is improved enough to warrant it, but I don't know if it will help you much. Download it and try anyway.

       
      • Martin

        Martin - 2008-01-05

        Thank's for your investigation. I will try your fix and report results later.

         
        • Martin

          Martin - 2008-01-05

          I've done some investigation with FixTag version 0.81. Current situation is quite good. I did some tests using FixTag 0.81 and id3v2 (0.11) using plain id3lib version 3.8.3. My environment is using plain 8 -Bit ISO-8859-1.

          remove all id3 tags from a test mp3 file:

          * id3v2 -D sample.mp3

          * Use FixTag and add ID3V2-Tags Album="Album äöüÄÖÜß", Title="Title äöüÄÖÜß" ...

          * Hexdum of sample.mp3 is:

          00000000  49 44 33 03 00 00 00 00  01 76 54 50 45 31 00 00  |ID3......vTPE1..|
          00000010  00 10 00 00 00 41 72 74  69 73 74 20 e4 f6 fc c4  |.....Artist äöüÄ|
          00000020  d6 dc df 00 54 41 4c 42  00 00 00 0f 00 00 00 41  |ÖÜß.TALB.......A|
          00000030  6c 62 75 6d 20 e4 f6 fc  c4 d6 dc df 00 54 52 43  |lbum äöüÄÖÜß.TRC|
          00000040  4b 00 00 00 06 00 00 00  20 20 20 20 00 54 49 54  |K.......    .TIT|
          00000050  32 00 00 00 0f 00 00 00  54 69 74 65 6c 20 e4 f6  |2.......Titel äö|
          00000060  fc c4 d6 dc df 00 54 59  45 52 00 00 00 06 00 00  |üÄÖÜß.TYER......|
          00000070  00 20 20 20 20 00 43 4f  4d 4d 00 00 00 15 00 00  |.    .COMM......|
          00000080  00 65 6e 67 00 43 6f 6d  6d 65 6e 74 20 e4 f6 fc  |.eng.Comment äöü|
          00000090  c4 d6 dc df 00 54 43 4f  4e 00 00 00 08 00 00 00  |ÄÖÜß.TCON.......|
          000000a0  28 32 35 35 29 20 00 00  00 00 00 00 00 00 00 00  |(255) ..........|
          000000b0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|

          What I'm wandering about is the additional 00 Byte in front of the tag content.

          * id3v2 -l sample.mp3 gives the expected result. So far ok.

          * Again removing all ID3-Tags and then adding the same content using id3v2:

          * id3v2 -2 --TPE1 "Artist äöüÄÖÜß" --TALB "Album äöüÄÖÜß" --TIT2 "Titel äöüÄÖÜß" ...

          * Hexdump of sample.mp3 is now:

          00000000  49 44 33 03 00 00 00 00  08 0c 54 50 45 31 00 00  |ID3.......TPE1..|
          00000010  00 0f 00 00 00 41 72 74  69 73 74 20 e4 f6 fc c4  |.....Artist äöüÄ|
          00000020  d6 dc df 54 41 4c 42 00  00 00 0e 00 00 00 41 6c  |ÖÜßTALB.......Al|
          00000030  62 75 6d 20 e4 f6 fc c4  d6 dc df 54 49 54 32 00  |bum äöüÄÖÜßTIT2.|
          00000040  00 00 0e 00 00 00 54 69  74 65 6c 20 e4 f6 fc c4  |......Titel äöüÄ|
          00000050  d6 dc df 43 4f 4d 4d 00  00 00 14 00 00 00 00 00  |ÖÜßCOMM.........|
          00000060  00 00 43 6f 6d 6d 65 6e  74 20 e4 f6 fc c4 d6 dc  |..Comment äöüÄÖÜ|
          00000070  df 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |ß...............|
          00000080  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|

          So from my point of view, your version 0.81 is a great improvment, thank's a lot.
          In my opinion the delimiting zero byte is not necessary, but does not seem to do
          any harm.

           
    • Doug Laurence

      Doug Laurence - 2007-12-30

      Some more info on this issue: Apparently id3lib is writing UTF-8 encoded text into these fields. UTF-8 is not permitted until ID3v2 v2.4, yet they are writing the tags marked as ID3v2 v2.3 and setting the text encoding byte to indicate ISO-8859-1 encoding. I'm not quite sure what to do about it. On one hand, I'd like to be compliant with the specification, and on the other I'd like to be compatible with the common conventions (the de-facto standards are frequently better). When I first wrote FixTag a couple years ago, I chose to write ID3v2 v2.3 tags for maximum compatibility. It may be time to write out ID3v2 v2.4 (or at least provide a user preference to select which version to emit), but I'll have to try writing out some tags and see how well the major mp3 players consume the tags. I am really not inclined to write non-standard tags by default, but another option might be to provide a user setting to write 'non-standard UTF-8' tags in the manner of id3lib since I am sure lots of players will decode UTF-8 just fine, even in v2.3 tags.

      Oh, and I am adding some keyboard shortcuts :)

       
    • Doug Laurence

      Doug Laurence - 2008-01-05

      Thanks very much for digging into this so thoroughly, it really helps make sure I've got this stuff right.

      The extra zero byte is called a "text encoding description byte" and is required by the spec for "text information frames" (start with T). It is 0x0 for ISO-8859-1 and it should be 0x1 for 16-bit unicode. (ID3v2 2.4 adds two more values for little endian unicode and UTF-8, but I am still writing tags in v2.3.) FixTag v0.80 wrote the text tags in unicode format if any of the characters had a value higher than 0x7F. This was okay as far as the spec is concerned, but to help address your issue, I modified FixTag v0.81 to stick to ISO-8859-1 for characters up to 0xFF before writing unicode. For unicode, try running charmap, pasting some Cyrillic or Chinese or whatever characters into your tags, then hexdump again and you'll see what I mean.

      To back up my explanation of the 'zero byte', here's some relevant text pasted from the spec at http://id3.org/d3v2.3.0

         This is the standard layout of the bytes in an ID3v2 2.3 frame header.

           Frame ID   $xx xx xx xx  (four characters)
           Size       $xx xx xx xx
           Flags      $xx xx

         The text information frames are the most important frames, containing
         information like artist, album and more. <snip> All text frame identifiers
         begin with "T". Only text frame identifiers begin with "T", with the
         exception of the "TXXX" frame. All the text information frames have
         the following format:

           <Header for 'Text information frame', ID: "T000" - "TZZZ", excluding "TXXX">
           Text encoding                $xx
           Information                  <text string according to encoding>

         If nothing else is said a string is represented as ISO-8859-1
         [ISO-8859-1] characters in the range $20 - $FF. Such strings are
         represented as <text string>, or <full text string> if newlines are
         allowed, in the frame descriptions. All Unicode strings [UNICODE] use
         16-bit unicode 2.0 (ISO/IEC 10646-1:1993, UCS-2). Unicode strings
         must begin with the Unicode BOM ($FF FE or $FE FF) to identify the
         byte order.

         All numeric strings and URLs [URL] are always encoded as ISO-8859-1.
         Terminated strings are terminated with $00 if encoded with ISO-8859-1
         and $00 00 if encoded as unicode. If nothing else is said newline
         character is forbidden. In ISO-8859-1 a new line is represented, when
         allowed, with $0A only. Frames that allow different types of text
         encoding have a text encoding description byte directly after the
         frame size. If ISO-8859-1 is used this byte should be $00, if Unicode
         is used it should be $01. Strings dependent on encoding is
         represented as <text string according to encoding>, or <full text
         string according to encoding> if newlines are allowed.  Any empty
         Unicode strings which are NULL-terminated may have the Unicode BOM
         followed by a Unicode NULL ($FF FE 00 00 or $FE FF 00 00).

      If you read this excerpt carefully, you will notice that there is some ambiguity about where the "text encoding description byte" should go. It says that 'frames that allow different types of text encoding have a text encoding description byte directly after the frame size', but I think it really means after the flags (basically, after the whole frame header). Since the flags are usually zero, and the text encoding byte is zero for ISO-8859-1, it doesn't matter, but when it comes to unicode, the byte needs to contain 0x1. I wasn't sure, so I tested the unicode flags with several Windows and Linux apps, and the 0x1 byte seems to work after the flags directly before the BOM. The ID3v2 2.4 spec is even less specific. But writing specs is hard, so I forgive them. 

      Thanks again for bringing this issue to my attention and also for your careful investigation.

       
    • Doug Laurence

      Doug Laurence - 2008-01-05

      Sorry, I misinterpreted which 'zero byte' you were referring to. Although my previous post is still relevant, you are actually referring to the null terminator after the text. I do not believe that the spec requires null termination for all text strings, but it does require it for some. I decided to be safe and add a null terminator to all text strings, because I don't think it does any harm. Since ID3 tag readers must be equipped to handle null termination for those strings that require it, I think it is more likely that I will run into a bug in another application if I DON'T terminate the strings.

       
      • Martin

        Martin - 2008-01-06

        Thanks for your explanations. So far I agree with you. The Unicode issue with id3lib is still open. Focussing on ISO-8859-1 was to make sure that my environment is fully operational. Again, thank you.

        Anyway, I will take time and have a look on the Unicode issue. We will see ...

        I'm greatly appreciating your effort to add keyboard short cuts.

         
    • Martin

      Martin - 2008-03-09

      Version 0.82 has keyboard short cuts. Thank you!

       

Log in to post a comment.