From: John M. <joh...@ya...> - 2001-03-09 01:07:57
|
This should be easy. Reading a binary file in Jython causes changes to the data? But it works in Python. $ jython Jython 2.0 on javaVM-1.3.0.01 (JIT: null) Type "copyright", "credits" or "license" for more information. >>> tiffObj = open('john.tif', 'r').read() >>> open('junk.tif', 'w').write(tiffObj) >>> $ sum -r junk.tif john.tif 04308 25 junk.tif 16525 25 john.tif $ cmp junk.tif john.tif junk.tif john.tif differ: char 170, line 1 $ python Python 2.0 (#3, Feb 1 2001, 04:39:53) [GCC 2.7.2.3] on hp-uxB Type "copyright", "credits" or "license" for more information. >>> tiffObj = open('john.tif', 'r').read() >>> open('junk.tif', 'w').write(tiffObj) >>> $ cmp junk.tif john.tif $ __________________________________________________ Do You Yahoo!? Get email at your own domain with Yahoo! Mail. http://personal.mail.yahoo.com/ |
From: D-Man <ds...@ri...> - 2001-03-09 02:44:21
|
On Thu, Mar 08, 2001 at 05:09:42PM -0800, John Mudd wrote: | This should be easy. Reading a binary file in Jython causes changes to | the data? But it works in Python. You forgot to mention that you are using a kernel/OS that likes to mangle your data. ;-) Use _binary_ mode for writing _binary_ data. | | | $ jython | Jython 2.0 on javaVM-1.3.0.01 (JIT: null) | Type "copyright", "credits" or "license" for more information. | >>> tiffObj = open('john.tif', 'r').read() | >>> open('junk.tif', 'w').write(tiffObj) ^ This is "text" mode and Windoze likes to replace every \n with \r\n in such streams. This is what causes your data corruption. Fortunately on *nix systems the kernel doesn't mangle your streams under the hood. For other systems use open( 'junk.tiff' , 'wb' ).write( tiffObj ) ^ | >>> | $ sum -r junk.tif john.tif | 04308 25 junk.tif | 16525 25 john.tif | $ cmp junk.tif john.tif | junk.tif john.tif differ: char 170, line 1 | $ python | Python 2.0 (#3, Feb 1 2001, 04:39:53) | [GCC 2.7.2.3] on hp-uxB | Type "copyright", "credits" or "license" for more information. | >>> tiffObj = open('john.tif', 'r').read() | >>> open('junk.tif', 'w').write(tiffObj) | >>> | $ cmp junk.tif john.tif | $ | Your jython invocation gives no indication of your platform (I use the same utilities and prompt on a Win2k box via cygwin), but your CPython invocation indicates you are using HP-UX. This explains why the code works in CPython. HP-UX follows the Unix precedent of not mangling streams simply because you opened it in "text" mode. IMHO there should only be one mode for opening files -- the app, not the kernel, should do ALL stream handling, ASCII and binary. In any case, the proper documented way of writing binary data (such as your TIFF image) is to use "wb" as the file mode (in Python, C, C++, Java, etc). HTH, -D |
From: <bc...@wo...> - 2001-03-09 08:17:19
|
[D-Man] >On Thu, Mar 08, 2001 at 05:09:42PM -0800, John Mudd wrote: >| This should be easy. Reading a binary file in Jython causes changes to >| the data? But it works in Python. > >You forgot to mention that you are using a kernel/OS that likes to >mangle your data. ;-) I strongly suspect that John is using a *nix for the Jython test. >Use _binary_ mode for writing _binary_ data. Correct. >| $ jython >| Jython 2.0 on javaVM-1.3.0.01 (JIT: null) >| Type "copyright", "credits" or "license" for more information. >| >>> tiffObj = open('john.tif', 'r').read() >| >>> open('junk.tif', 'w').write(tiffObj) > ^ >This is "text" mode and Windoze likes to replace every \n with \r\n in >such streams. This is what causes your data corruption. Fortunately >on *nix systems the kernel doesn't mangle your streams under the hood. Incorrect. It is Jython that does this mangling. The issue basicly is: should a read from a file into a unicode String read the data as binary (and always set the top 8 bit to zero) or as text by passing the data through the default encoding. Without the 'b' flag, Jython is reading and writing the data as if using a Reader/Writer class. When you use the 'b' flag it uses a InputStream/OutputStream. This overloading of the 'b' flag (the flag also controls the platform dependent newline translation) is not a good thing, but it was the best I could come up with. Normally is works as expected and I think that also goes for John's case. regards, finn |
From: John M. <joh...@ya...> - 2001-03-09 14:54:08
|
Yes, binary mode of course. I think I have a mental block in this area. This is twice that I've asked about this sort of thing. Thanks for your patience. $ jython Jython 2.0 on javaVM-1.3.0.01 (JIT: null) Type "copyright", "credits" or "license" for more information. >>> tiffObj = open('john.tif', 'rb').read() >>> open('junk.tif', 'wb').write(tiffObj) >>> $ sum -r junk.tif john.tif 16525 25 junk.tif 16525 25 john.tif $ --- Finn Bock <bc...@wo...> wrote: > [D-Man] > > >On Thu, Mar 08, 2001 at 05:09:42PM -0800, John Mudd wrote: > >| This should be easy. Reading a binary file in Jython causes > changes to > >| the data? But it works in Python. > > > >You forgot to mention that you are using a kernel/OS that likes to > >mangle your data. ;-) > > I strongly suspect that John is using a *nix for the Jython test. > > >Use _binary_ mode for writing _binary_ data. > > Correct. > > >| $ jython > >| Jython 2.0 on javaVM-1.3.0.01 (JIT: null) > >| Type "copyright", "credits" or "license" for more information. > >| >>> tiffObj = open('john.tif', 'r').read() > >| >>> open('junk.tif', 'w').write(tiffObj) > > ^ > >This is "text" mode and Windoze likes to replace every \n with \r\n > in > >such streams. This is what causes your data corruption. > Fortunately > >on *nix systems the kernel doesn't mangle your streams under the > hood. > > Incorrect. It is Jython that does this mangling. The issue basicly > is: > should a read from a file into a unicode String read the data as > binary > (and always set the top 8 bit to zero) or as text by passing the data > through the default encoding. > > Without the 'b' flag, Jython is reading and writing the data as if > using > a Reader/Writer class. When you use the 'b' flag it uses a > InputStream/OutputStream. > > This overloading of the 'b' flag (the flag also controls the platform > dependent newline translation) is not a good thing, but it was the > best > I could come up with. Normally is works as expected and I think that > also goes for John's case. > > regards, > finn > > _______________________________________________ > Jython-users mailing list > Jyt...@li... > http://lists.sourceforge.net/lists/listinfo/jython-users __________________________________________________ Do You Yahoo!? Get email at your own domain with Yahoo! Mail. http://personal.mail.yahoo.com/ |
From: D-Man <ds...@ri...> - 2001-03-09 16:10:14
|
On Fri, Mar 09, 2001 at 08:17:02AM +0000, Finn Bock wrote: | | Incorrect. It is Jython that does this mangling. The issue basicly is: | should a read from a file into a unicode String read the data as binary | (and always set the top 8 bit to zero) or as text by passing the data | through the default encoding. | | Without the 'b' flag, Jython is reading and writing the data as if using | a Reader/Writer class. When you use the 'b' flag it uses a | InputStream/OutputStream. | | This overloading of the 'b' flag (the flag also controls the platform | dependent newline translation) is not a good thing, but it was the best | I could come up with. Normally is works as expected and I think that | also goes for John's case. Perhaps it would better for Jython to match CPython and C with respect to the open() function, but require people to construct the other Reader/Writer objects using the Java classes? I am curious as to why the "format string" style of argument was chosen instead of an enum style (for CPython, Jython and C). I think that it would be simpler for the libary implementer and easier on the client to use an enum type, or even different classes similar to Java (but easier to get one that does what you want ;-)). Thanks, -D |
From: <bc...@wo...> - 2001-03-09 18:52:06
|
[D-Man] >Perhaps it would better for Jython to match CPython and C with respect >to the open() function, Keep in mind that CPython have two string types (8bit and 16bit), Jython have only one (16bit). Matching CPython's current ability to write unicode string to file objects would completely cripple all file I/O in Jython <0.0 wink> Python 2.1b1 (#11, Mar 2 2001, 11:23:29) [MSC 32 bit (Intel)] on win32 Type "copyright", "credits" or "license" for more information. >>> f = open("x", "w") >>> f.write(u"\u20ac") Traceback (most recent call last): File "<stdin>", line 1, in ? UnicodeError: ASCII encoding error: ordinal not in range(128) When you think about it, what should the above write into the file? The CPython developers though (and fought) about this *a lot*, and their best answer is: You can't! Not all of the 30+ developers on CPython agree on that decision. In Jython we can write such a euro-sign to a file. What it write to the file depends on the binary flag and the encoding for your platform. When writing in binary the byte 0xAC is always written. In text mode and on my danish windows it write 0x80, which is Microsoft's position of the eurosign in the danish codepage. All in all this have worked remarkably well both during the errata's and with Jython-2.0. It is not perfect because it overload the meaning of the 'b' flag. So it is f.ex. not possible to get newline translation without the encoding. Guido has on several time said that the unicode support for file objects is not yet complete. When CPython is improved we will improve Jython as well. >but require people to construct the other >Reader/Writer objects using the Java classes? You can do that too. It just isn't a requirement. >I am curious as to why the "format string" style of argument was >chosen instead of an enum style (for CPython, Maybe Guido chose a string based mode argument because the C stdio uses it. I dunno. >Jython Because CPython uses it. >and C). I think >that it would be simpler for the libary implementer and easier on the >client to use an enum type, or even different classes similar to Java >(but easier to get one that does what you want ;-)). regards, finn |