From: Mark D. <mar...@zn...> - 2007-05-12 15:38:07
|
Hi, I have a problem with module Wx::Perl::ProcessStream. This reads the STDOUT from an external process executed using Wx::Process and Wx::ExecuteCommand. To do this it, it does a 'readline' on the wxInputStream available via Wx::Process. My problem is that the implementation of READLINE in Wx::InputStream works as follows: read a char from the stream append char to a wxString return wxString if char == '\n' This appears not to work if the output stream from the external process is, for example, UTF-8. I *think* what perhaps should happen is read a char from the stream add it to a charbuffer if char == '\n' { convert charbuffer to a wxString ( method determined by wxWidgets unicode/ansi build macros ) return wxString } Alas, my C is too poor to implement / test this (I've tried :-( ) - I suspect it would be v.simple for anyone with adequate C skills. I have a workaround for the Wx::Perl::ProcessStream module. In that I have stopped using readline and do 'read's on the wxInputStream in a perl loop with localised byte mode. This seems to work. I'm not sure if Wx::InputStream::READLINE needs changing or not. Any thoughts? Mark |
From: Mattia B. <mat...@li...> - 2007-05-13 18:34:18
|
On Sat, 12 May 2007 16:40:05 +0100 Mark Dootson <mar...@zn...> wrote: Hi, > I have a problem with module Wx::Perl::ProcessStream. > This reads the STDOUT from an external process executed using Wx::Process= and Wx::ExecuteCommand. > This appears not to work if the output stream from the external process i= s, for example, UTF-8. I see. =20 > I *think* what perhaps should happen is >=20 > read a char from the stream > add it to a charbuffer > if char =3D=3D '\n' { > convert charbuffer to a wxString ( method determined by wxWidgets uni= code/ansi build macros ) > return wxString > } Seems reasonable. > I'm not sure if Wx::InputStream::READLINE needs changing or not. Any thou= ghts? I believe (at least) an option to do so would be a good idea. Adding an optional 'encoding' parameter is likely the best option. For a test case, = is #!/usr/bin/perl -w use Encode; print Encode::encode_utf8( "=E0\n" ); sleep 1; print Encode::encode_utf8( "=E0\n" ); sleep 1; print Encode::encode_utf8( "=E0\n" ); print Encode::encode_utf8( "=E0\n" ); a good enough test case? Regards. Mattia |
From: Mark D. <mar...@zn...> - 2007-05-13 22:09:14
Attachments:
latin1.dat
utf8.dat
|
Mattia Barbon wrote: > > use Encode; > print Encode::encode_utf8( "à\n" ); > sleep 1; > print Encode::encode_utf8( "à\n" ); > sleep 1; > print Encode::encode_utf8( "à\n" ); > print Encode::encode_utf8( "à\n" ); > > a good enough test case? > I am not sure. I must confess a high (but reducing :-) ) level of ignorance where multibyte char sets are concerned. For my own particular case with Wx::Perl::ProcessStream, I decided that the thing to do was to make sure I got an exact byte for byte representation of the output stream returned into the perl code. Then, whatever happens when the bytes are treated as a string, at least it can be controlled within your perl. I found it difficult to predict the effects of different operating systems and locale settings so reverted to ensuring I output a known series of bytes to read in and compare. The simplest way I found of ensuring no intervening encoding layers were applied was to use test output in an encoded file and then read it in and output it in binmode with 'use bytes;'. I used the attached utf8.dat and latin1.dat files sent to me by a user of Wx::Perl::ProcessStream. If you have better things to spend time on than this, if you put together a quick untested code change that more or less does the job I will be happy to learn how to construct adequate test cases and test it. With the basic idea in place, I should be able to plod through any changes flagged up by testing. Regards Mark |
From: Mattia B. <mat...@li...> - 2007-06-19 20:16:00
|
On Sat, 12 May 2007 16:40:05 +0100 Mark Dootson <mar...@zn...> wrote: Hi, > I have a problem with module Wx::Perl::ProcessStream. > This reads the STDOUT from an external process executed using Wx::Process and Wx::ExecuteCommand. > > To do this it, it does a 'readline' on the wxInputStream available via Wx::Process. > > My problem is that the implementation of READLINE in Wx::InputStream works as follows: > > read a char from the stream > append char to a wxString > return wxString if char == '\n' > > > This appears not to work if the output stream from the external process is, for example, UTF-8. > > I *think* what perhaps should happen is > > read a char from the stream > add it to a charbuffer > if char == '\n' { > convert charbuffer to a wxString ( method determined by wxWidgets unicode/ansi build macros ) > return wxString > } I do not agree with what you write above (note: I am not saying the current implementation is correct!) what I think readline should do is work with bytes: read a char from the stream add it to a charbuffer if char == '\n' { return the buffer as a byte string, without performing any conversion } I believe that automatically interpreting program output based upon wxWidgets ideas (which usually means using the current locale) will cause trouble. Returning bytes leaves the interpretation to the calling program which is always a safe choice. Regards Mattia |
From: Mark D. <mar...@zn...> - 2007-06-19 20:39:49
|
Mattia Barbon wrote: > I do not agree with what you write above (note: I am not saying the current implementation is correct!) > what I think readline should do is work with bytes: > > read a char from the stream > add it to a charbuffer > if char == '\n' { > return the buffer as a byte string, without performing any conversion > } > > I believe that automatically interpreting program output based upon > wxWidgets ideas (which usually means using the current locale) will cause > trouble. Returning bytes leaves the interpretation to the calling program > which is always a safe choice. I think you are right. Bytes makes much more sense. Regards Mark |
From: Mattia B. <mat...@li...> - 2007-06-20 20:55:28
|
On Tue, 19 Jun 2007 21:39:29 +0100 Mark Dootson <mar...@zn...> wrote: > Mattia Barbon wrote: > > I do not agree with what you write above (note: I am not saying the current implementation is correct!) > > what I think readline should do is work with bytes: > > > > read a char from the stream > > add it to a charbuffer > > if char == '\n' { > > return the buffer as a byte string, without performing any conversion > > } > > > > I believe that automatically interpreting program output based upon > > wxWidgets ideas (which usually means using the current locale) will cause > > trouble. Returning bytes leaves the interpretation to the calling program > > which is always a safe choice. > > I think you are right. Bytes makes much more sense. Changed in Subversion. I tried it with Wx::Perl::ProcessStream 0.09 and it works as I expect. Pleas let me know if it works for you too. Regards, Mattia |
From: Mark D. <mar...@zn...> - 2007-06-21 03:02:05
|
Hi, It doesn't work as I expected on Win32. (but I might be expecting the wrong thing). Where in the READLINE routine you do: if( c == '\n' ) break; ++off; I think you need ++off; if( c == '\n' ) break; This, at least, makes it work for me. The way I read the code, currently the '\n' byte never gets returned which I assume is not what is intended. It works fine on my Linux box as it stands. So I suppose that if the byte '\n' is stripped off, the final byte in the stream that gets returned is always 'null' - which on Linux gets treated as end of string somewhere along the line, but on Win32 doesn't. Regards Mark Mattia Barbon wrote: > On Tue, 19 Jun 2007 21:39:29 +0100 > Mark Dootson <mar...@zn...> wrote: > >> Mattia Barbon wrote: >>> I do not agree with what you write above (note: I am not saying the current implementation is correct!) >>> what I think readline should do is work with bytes: >>> >>> read a char from the stream >>> add it to a charbuffer >>> if char == '\n' { >>> return the buffer as a byte string, without performing any conversion >>> } >>> >>> I believe that automatically interpreting program output based upon >>> wxWidgets ideas (which usually means using the current locale) will cause >>> trouble. Returning bytes leaves the interpretation to the calling program >>> which is always a safe choice. >> I think you are right. Bytes makes much more sense. > > Changed in Subversion. I tried it with Wx::Perl::ProcessStream 0.09 > and it works as I expect. Pleas let me know if it works for you too. > > Regards, > Mattia |
From: Mattia B. <mat...@li...> - 2007-06-21 19:38:06
|
On Thu, 21 Jun 2007 04:01:22 +0100 Mark Dootson <mar...@zn...> wrote: > It doesn't work as I expected on Win32. (but I might be expecting the wrong thing). You're expecting the right one. Fixed, thanks! Mattia |