contribution of multi-byte UTF support

  • laerad

    laerad - 2009-01-14


    We've begun making quite extensive use of this library to replace Matlab executables used to generate large datasets. (Excellent work, by the way. This library is drastically reducing our extraction time and increasing our reliability).

    We'd like to make some contributions to the library, as we expect to make more and more use of it as time goes on.

    Most recently we've found a bug with the writing of characters that cannot be expressed as a single byte in UTF8 (most notably the £ sign, which may not have come up in other countries!). As such I've made some modifications of the code to support both reading and writing of UTF8, 16 and 32.

    This has required some minor modifications to a number of classes, including MatFileHeader (in which I retain the ByteOrder to later determine the byte order of the character set to use, when reading UTF-16 or 32), MatFileWriter and MatFileIncrementalWriter, MLChar and MatFileReader.

    MLChar does the conversion to bytes for the writer now, storing the data internally as a primitive char[] (to permit efficient conversion to a byte[] via a call to String.getBytes()), and in MatFileReader the ISMatTag's readToCharArray method no longer delegates to the MatFileInputStream when explicitly reading UTF8, 16 or 32 (instead delegating to a Charset decoder directly).

    On another matter, I have also modified MLChar to write the empty string as a Matlab empty string (0-by-0 char matrix), rather than a 1-by-0 empty char matrix which appears to be more of a null. For null strings I have retained the 1-by-0 matrix. I can make this an optional behaviour if the current behaviour is preferred by existing users, however.

    What's the process for contributing patches...? We may well have further patches to provide in the future, including some small optimisations for writing large matrices, so it would be useful if we could become a contributor.



    • Wojciech Gradkowski


      I am very happy that you are interested in the JMatIO and that you make use of it in your own applications :) I have added you to the developers group. Any code submissions are welcome (just please make sure it won't break current test cases, and if possible try provide yours for any new functionalities).



    • kjohnston82

      kjohnston82 - 2013-07-12

      I've been having similar trouble with the writing of other characters (specifically the '°' symbol) does anyone know if the updates/patches suggested by the original post by laerad have been implemented. Or where I can get the source code for the updates so I can implement them myself?

  • Anonymous

    Anonymous - 2010-04-16

    Has those modifications been commited to the SVN? Support for int16 and int 32 is particularly interesting to me.


Log in to post a comment.