Re: [Jython-users] stream munging : breaks imaplib (patch included)

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

[dman]

>The IMAP RFC states that all lines end in CRLF.  When I printed out
>the last 2 characters of sockfile.readline() I got something (the last
>piece of data on the line), then 0xa.  All the data was fine, except
>that the CR was missing.

Ok, I hope I understand the issue this time around. I agree that it
looks strangely asymmetric for non-windows platforms.

I have opened a bugreport but I have given it the lowest priority
because I suspect that CPython will adopt a somewhat similar behaviour
in 2.3. At the moment the python-dev people is discussing it in this
patch.

> http://sourceforge.net/tracker/?group_id=5470&atid=305470&func=detail&aid=476814

>| The basic issue is how to deal with characters (16-bits) vs. bytes
>| (8-bit). Java have two ways: Stream and Reader, but python only have one
>| open() method. I decided to override the 'b' flag for this behavior
>| because many (windows) programmers would already know about the 'b' flag
>| on the open() function. By re-using the 'b' flag the default text mode
>| was obvious because that is what windows uses.
> 
>Was the logic of input identical to text-files on windows, or is there
>more to it than that? 

Not sure about the reason for the new-line algorithm; it isn't my design
or code. I guess it is partly based on the windows way and partly on the
way java handle line seperator when doing line reading. Maybe JimH have
used a timemachine to implement a sane scheme for dealing with cross
platform text files several years before CPython got around to it.

>How does java decide what the encoding of the
>data is (ie Unicode 16-bit chars or ASCII 8-bit chars)? 

Maybe I misunderstand the question, but java's Reader/Writer classes
converts the file bytes to unicode characters while the Stream classes
reads the file bytes as bytes.

>How does it
>decide to remove the CR, but not harm any other data in the stream?

Java's readline() method(s) will remove all line-separator chars. Other
read() calls does not remove CR.

>I don't really understand much of Java's java.io package, other than
>it takes some work to figure out which class has the method that does
>what you want.  IMO Python's read() and readline() methods are so much
>simpler and get the job done just as well.

No it doesn't. At least not when you try to combine unicode strings and
file I/O. When you don't need unicode then I agree that CPython-1.5.2
was very simple to use.

But in jython it is impossible to ignore the unicode problems because
all our strings are always unicode enabled. Image that you have a string
with a non-latin-1 character in it, a euro-sign for example. What should
happen when we try to write that to a file?

   f.write(u"\u20AC")

I can think of three answers.

1) Throw an ValueError exception.
2) Silently ignore the highorder byte and write \xAC to the file.
3) Convert the chars according to the platform codec and write the
   result.

Cpython-2.0 uses #1 except it throws exceptions for all characters above
127. Jython uses #2 for binary files and #3 for textfiles. 

CPython have good technical reason for that choice, but I think the
result is very bad and make for unnatural use of unicode strings.

>I haven't forgotten because I never knew.  I've only used Jython >=
>2.0.  (and CPython, but that is irrelevant here)

My bad.

regards,
finn