The state of Unicode in AAF

John Emmas
2012-11-16
2013-04-29
  • John Emmas

    John Emmas - 2012-11-16

    It's quite a long time since I did any AAF programming (in fact, my AAF source code is around 8 years old!!) and everything I did with it in those early days was for Windows.

    I'm now considering whether I should update my AAF apps to make them cross-platform (i.e. remove any MFC GUI stuff and replace it with GTK which is cross-platform and I'm pretty familiar with it). But I wondered about the current state of Unicode support in AAF. In my (now 8 year old) AAF source code it looks like the preferred standard is for Windows style Unicode (sometimes called UTF-16 although I believe that's a misnomer). If I had a preference it would be for UTF-8 which seems to have gained widespread credibility in other cross-platform apps and is very well implemented in some other open source libraries, such as libglib..

    From those early code sources I can see that AAF had some rudimentary support for UTF-8 but primarily by converting UTF-8 strings to Windows style strings and vice versa. The AAF toolkit itself seemed to prefer Windows style strings (which seemed odd for a cross-platform library).

    Anyway….  is that still the case within AAF or have there been any moves (or discussions) towards a more platform neutral string standard?

    With multilingual support of course!

     
  • Oliver Morgan

    Oliver Morgan - 2012-11-16

    Whatever it is called - UCS-2, UTF-16, Windows Unicode, plus or minus - using 2 byte characters covers almost every use case with a fixed  size character. UTF-8 is good, the only place it falls down is variable length. Some long time ago we added a small set of functions utf8.cpp which you can use to go back and forth.

     
  • John Emmas

    John Emmas - 2012-11-17

    Hi terabrit. I can't help agreeing that the fixed character size of Windows Unicode is a big big plus over UTF-8 so how does the AAF toolkit work on Linux or OS-X where (I assume) the OS passes strings (e.g. file paths) as UTF-8? Does the AAF toolkit convert them to Windows Unicode (for internal use) and then convert back to UTF-8 whenever the string needs to get passed back to the OS for some reason? I assume that must be how it works.

     
  • Oliver Morgan

    Oliver Morgan - 2012-11-17

    see aaffmtconv.cpp main() for a good example

     
  • John Emmas

    John Emmas - 2012-11-23

    Sorry for the delay in replying. I took a look at that example and noticed that it uses mbstowcs(). I'm curious to know what that function does on a *nix platform or OS-X. I'm guessing that on modern, *nix-like implementations it converts UTF-8 (variable width) characters to fixed width (wide) characters but I can't seem to find out for certain.

    One thing I did dixcover though was that whereas Windows wide character strings are 16-bits wide, *nix wide character strings are apparently 32-bits wide. Would that cause compatibility problems? e.g. if an AAF app running on Linux stored a wide character string in the AAF file, would that string still be understood on a Windows machine which expects the smaller size strings?

    Or am I over-thinking this??

     
  • John Emmas

    John Emmas - 2012-11-23

    Addendum…  Or does the AAF toolkit have its own implementation of mbstowcs which produces the same width strings on all platforms?

     
  • Oliver Morgan

    Oliver Morgan - 2012-11-23

    see utf8.h

     

Log in to post a comment.