From: Mark D. <mar...@zn...> - 2007-05-13 22:09:14
|
Mattia Barbon wrote: > > use Encode; > print Encode::encode_utf8( "à\n" ); > sleep 1; > print Encode::encode_utf8( "à\n" ); > sleep 1; > print Encode::encode_utf8( "à\n" ); > print Encode::encode_utf8( "à\n" ); > > a good enough test case? > I am not sure. I must confess a high (but reducing :-) ) level of ignorance where multibyte char sets are concerned. For my own particular case with Wx::Perl::ProcessStream, I decided that the thing to do was to make sure I got an exact byte for byte representation of the output stream returned into the perl code. Then, whatever happens when the bytes are treated as a string, at least it can be controlled within your perl. I found it difficult to predict the effects of different operating systems and locale settings so reverted to ensuring I output a known series of bytes to read in and compare. The simplest way I found of ensuring no intervening encoding layers were applied was to use test output in an encoded file and then read it in and output it in binmode with 'use bytes;'. I used the attached utf8.dat and latin1.dat files sent to me by a user of Wx::Perl::ProcessStream. If you have better things to spend time on than this, if you put together a quick untested code change that more or less does the job I will be happy to learn how to construct adequate test cases and test it. With the basic idea in place, I should be able to plod through any changes flagged up by testing. Regards Mark |