From: <bc...@wo...> - 2001-11-13 19:21:50
|
[dman] >The IMAP RFC states that all lines end in CRLF. When I printed out >the last 2 characters of sockfile.readline() I got something (the last >piece of data on the line), then 0xa. All the data was fine, except >that the CR was missing. Ok, I hope I understand the issue this time around. I agree that it looks strangely asymmetric for non-windows platforms. I have opened a bugreport but I have given it the lowest priority because I suspect that CPython will adopt a somewhat similar behaviour in 2.3. At the moment the python-dev people is discussing it in this patch. > http://sourceforge.net/tracker/?group_id=5470&atid=305470&func=detail&aid=476814 >| The basic issue is how to deal with characters (16-bits) vs. bytes >| (8-bit). Java have two ways: Stream and Reader, but python only have one >| open() method. I decided to override the 'b' flag for this behavior >| because many (windows) programmers would already know about the 'b' flag >| on the open() function. By re-using the 'b' flag the default text mode >| was obvious because that is what windows uses. > >Was the logic of input identical to text-files on windows, or is there >more to it than that? Not sure about the reason for the new-line algorithm; it isn't my design or code. I guess it is partly based on the windows way and partly on the way java handle line seperator when doing line reading. Maybe JimH have used a timemachine to implement a sane scheme for dealing with cross platform text files several years before CPython got around to it. >How does java decide what the encoding of the >data is (ie Unicode 16-bit chars or ASCII 8-bit chars)? Maybe I misunderstand the question, but java's Reader/Writer classes converts the file bytes to unicode characters while the Stream classes reads the file bytes as bytes. >How does it >decide to remove the CR, but not harm any other data in the stream? Java's readline() method(s) will remove all line-separator chars. Other read() calls does not remove CR. >I don't really understand much of Java's java.io package, other than >it takes some work to figure out which class has the method that does >what you want. IMO Python's read() and readline() methods are so much >simpler and get the job done just as well. No it doesn't. At least not when you try to combine unicode strings and file I/O. When you don't need unicode then I agree that CPython-1.5.2 was very simple to use. But in jython it is impossible to ignore the unicode problems because all our strings are always unicode enabled. Image that you have a string with a non-latin-1 character in it, a euro-sign for example. What should happen when we try to write that to a file? f.write(u"\u20AC") I can think of three answers. 1) Throw an ValueError exception. 2) Silently ignore the highorder byte and write \xAC to the file. 3) Convert the chars according to the platform codec and write the result. Cpython-2.0 uses #1 except it throws exceptions for all characters above 127. Jython uses #2 for binary files and #3 for textfiles. CPython have good technical reason for that choice, but I think the result is very bad and make for unnatural use of unicode strings. >I haven't forgotten because I never knew. I've only used Jython >= >2.0. (and CPython, but that is irrelevant here) My bad. regards, finn |