#11 Unicode issue?

v1.0 (example)
closed-fixed
UNHchabo
Encoding (9)
5
2014-11-19
2010-09-25
Anonymous
No

I had an album with the character: ï (U+00EF, Alt+0239) in it, and it got encoded as: ‹ (can't find it in Charmap).

This happened for both the Album Name and the title track, which had the same character in it.

LAME bug, or FlacSquisher?

Discussion

  • Nobody/Anonymous

    Whoops, addendum: the toolchain seemed fine with the character in the path (as the folder name).

    Just not in the tags.

     
  • UNHchabo

    UNHchabo - 2010-10-01

    This one's a bit complicated. The Windows version of LAME currently doesn't support Unicode input, but there are ways around it, which are used by other programs like Foobar2000. I'll see if I can implement something.

    There are a couple ways around this in the meantime: the LAME 3.99 beta has some "experimental" Unicode support if you'd like to try that. Also, OggEnc seems to have no problems with Unicode characters, so if your portable device can play Ogg Vorbis files, then that would be an excellent route to go.

     
  • UNHchabo

    UNHchabo - 2010-10-01
    • labels: --> Encoding
    • assigned_to: nobody --> unhchabo
    • status: open --> open-accepted
     
  • UNHchabo

    UNHchabo - 2011-02-12

    As of version 1.0.0, I think I've fixed all bugs regarding Unicode characters in the path or the filename. However, I'm not sure I can easily do anything about the tags; I think your best bet would be to use the 3.99 alpha.

    Maybe I'll see if I can fix the issue with the tags at some point, though.

     
  • HappyDog

    HappyDog - 2012-10-30

    I've found this problem with the latest version of FlacSquisher, and have done some digging.

    In my test case, the artist name contains the lower-case mu character (μ, U+03BC) which is ending up as a lower-case ae ligature (æ, U+00E6).

    The version of LAME shipped with FS 1.0.10 is 3.99.5, which supports unicode, so I don't think LAME is the problem.

    My understanding is that FlacSquisher uses "metaflac" to get a dump of the various meta-data fields, which it then passes into its call to "lame" using the appropriate command-line switches (--tt, --ta, etc.).

    Running "metaflac --list" on my file returns the wrong strings - it includes the ae ligature instead of the mu character - and I think this is where the breakage occurs.

    However, running "metaflac --list --no-utf8-convert" returns the mu character, albeit preceded by another marker character: µ

    If this switch were added to the metaflac call, would it fix the problem? Not directly, I think. If I manually use the switch "-ta µ" when calling lame then that literal string is what is embedded in the mp3 file. This seems to be the case even if I use any of the --id3v2-XX options (where XX = utf16, ucs2 or latin1). However, setting "-ta µ" does give the right result.

    I don't know enough about unicode to know whether it is safe to simply strip the  from these strings, or whether some more complex manipulation is required, but I suspect that using that argument to metaflac, coupled with some kind of transformation so that it is represented correctly to lame would fix the problem.

    Thoughts?

     
  • UNHchabo

    UNHchabo - 2013-11-05

    This bug seems to be fixed in FlacSquisher 1.0.12 -- this version takes advantage of the UTF-8 support added in Flac/Metaflac version 1.3.0.

     
  • UNHchabo

    UNHchabo - 2013-11-05
    • status: open-accepted --> closed-fixed
    • Group: --> v1.0 (example)
     
  • HappyDog

    HappyDog - 2014-11-19

    I can confirm that this now works for me, in the latest version of FlacSquisher.

    Thank you!

     

Log in to post a comment.