From: Alan W. I. <ir...@be...> - 2013-10-19 16:50:05
|
Hi Arjen: Thanks very much for running the Python-C comparisons for both MinGW/MSYS (and later) MSVC on 32-bit Windows. More below in context. On 2013-10-19 14:08-0000 Arjen Markus wrote: >> From: Alan W. Irwin [mailto:ir...@be...] >> So the next step in trying to figure out the cause of these problems should be to do >> some additional platform checks for 32-bit Python. >> Arjen is moving ahead with those for both the MinGW/MSYS and MSVC Microsoft >> Windows cases, and I am hoping Andrew will be able to check the 32-bit Python >> case on Linux. >> > > I have used the MinGW 32-bits platform on Windows to do the same test. I see minor differences > in the following examples: > 00 03 04 05 06 08 09 11 12 15 16 17 18 20 21 22 23 25 26 27 29 and 33. > > So that list is almost the same as yours, except for example 19 - that is completely clean. Although Wine has a pretty good track record for being a good Windows test platform, I was concerned that I might have stumbled over some issue that was due to some Wine bug. But from your MinGW/MSYS results (and later MSVC results) on Microsoft Windows, this problem first turned up with Wine turns out to be a widespread issue on 32-bit Windows. And the "jury is still out" on whether this is also an issue with 32-bit Linux. > > I did not run example 14 yet and there is no example x14a. > > The examples that show significant differences are: 17 and 25 - the PostScript files show extra > lines or completely different lines in these two cases. > > I also noted that many differences occur with lines like: > > 1349 220 M (Python) -- 1350 220 M (C) > ... 1800 M (Python) -- ... 1799 M (C) > > I have not checked what PLplot command is responsible for them, but they occur in many examples, > so I guess they have to do with the frames. Those are PostScript commands generated by the ps device driver. That driver issues alias commands like /M {moveto} def in the top of the file so the first "M" command you see above simply mean move the pen to coordinate 1349 220. So differences like above mean that C and Python (on all 32-bit Windows platforms we have tested between us) have slightly different views of the overall coordinate transformations that yield the above positions. But the puzzle is that all those transformations are done in our C library and the "psc" C device in both cases depending only on the input coordinates entered, for example, in the plenv call for x00. And I also proved with the gdb debugging tool that those input coordinates were identical in that particular case. So in the 32-bit case we have proved that our core C library, libplplotd and/or the psc device, gives different answers with the _same_ input data in the C and Python cases for at least one example, and that might be the cause of all the above issues. My working hypothesis to explain this unusual result is some memory management issue (e.g., an uninitialized variable) in our core C library and/or our ps device that generates some unpredictability in the results. A further hypothesis is that Python has a very large memory footprint so it leaves non-zero bit patterns behind scattered over a wide memory space, and one of those non-zero bit patterns is being interpreted differently by the uninitialized variable than when any other language is being used to run the examples. In later e-mail concerning the MSVC case you stated the following results: <quote> - Most of the Python examples produce the same results as the corresponding C examples. - The ones that do not simply crash at some point. These are: 08, 09, 11, 15, 16, 20, 21 and 22. I have not looked into the possible cause of this. It may be a single function that is causing this. It may be a set of functions. Anyway, this means that the mystery is only larger: The Python installation was _exactly_ the same under Windows/MSVC as under Windows/MinGW. </quote> I think these results are also consistent with the working hypothesis that there is a lurking memory management issue in our core C library and/or the ps device driver code for the 32-bit case. So Arjen, if you have access to a static or dynamic memory debugging tool for Windows (see a partial list of such tools for all operating systems at http://en.wikipedia.org/wiki/Memory_debugging), I suggest you run it on x00c (our simplest 2D plot example written in C) to see if it spots some memory management issue (uninitialized variable or whatever) in libplplotd, our core C library. And, of course, if MinGW or MSVC is giving any warning messages at all concerning the compilation of our core C library, we should try to address those warnings. The only memory debugging tool I have access to is valgrind (a dynamic analysis tool) which can only be run on Linux (and Mac OS X). That gives totally clean results for x00c for the 64-bit Linux case where Python and C results are identical. For the record, here are those results: software@raven> valgrind examples/c/x00c -dev psc -o test.ps ==19591== Memcheck, a memory error detector ==19591== Copyright (C) 2002-2011, and GNU GPL'd, by Julian Seward et al. ==19591== Using Valgrind-3.7.0 and LibVEX; rerun with -h for copyright info ==19591== Command: examples/c/x00c -dev psc -o test.ps ==19591== ==19591== ==19591== HEAP SUMMARY: ==19591== in use at exit: 0 bytes in 0 blocks ==19591== total heap usage: 457 allocs, 457 frees, 132,045 bytes allocated ==19591== ==19591== All heap blocks were freed -- no leaks are possible ==19591== ==19591== For counts of detected and suppressed errors, rerun with: -v ==19591== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 4 from 4) However, now that our combined results show there is a widespread 32-bit issue for the Windows case (triggered by Python, but I think that is the result of the large Python memory footprint and nothing to do with bad Python code or bad code interfacing Python with our core C library), we need followup with extensive testing for the 32-bit Linux case. Therefore, I strongly encourage Andrew (or someone else here with access to 32-bit Linux) to compare Python and C results and if those are different do valgrind runs to help us get to the bottom of these issues. Alan __________________________ Alan W. Irwin Astronomical research affiliation with Department of Physics and Astronomy, University of Victoria (astrowww.phys.uvic.ca). Programming affiliations with the FreeEOS equation-of-state implementation for stellar interiors (freeeos.sf.net); the Time Ephemerides project (timeephem.sf.net); PLplot scientific plotting software package (plplot.sf.net); the libLASi project (unifont.org/lasi); the Loads of Linux Links project (loll.sf.net); and the Linux Brochure Project (lbproject.sf.net). __________________________ Linux-powered Science __________________________ |