From: Alan W. I. <ir...@be...> - 2017-02-21 19:02:01
|
Yesterday for the second time in two months an interactive comprehensive test failed with a "ValueError: bad marshal data (unknown type code)" error for bindings/python/Plframe.py. This is a rather common error with python and typically means the associated *.pyc that is generated by python has been corrupted. I moved that corresponding *.pyc file out of the way, and the comprehensive test (with *.pyc regenerated by python) sailed through afterward without issues. For the record, this issue occurred on my Debian Jessie platform with python version string of irwin@raven> python --version Python 2.7.9 There are lots of potential reasons for such *.pyc corruption issues such as a change in python version and hardware issues, but these errors are so common that the python developers list in 2013 became concerned that python would be subject to race conditions when generating these files and thus was the author of at least some of these corruptions (see discussion thread at <https://mail.python.org/pipermail/python-dev/2013-May/126241.html> with the subject line "[Python-Dev] Mysterious Python pyc file corruption problems". I did an octal dump of the corrupted file versus the uncorrupted regenerated one, and as far as I can tell the only difference is a missing byte in the corrupted file. (If anyone is interested I can send those files to you for inspection.) Yesterday I did do some obvious tests (with memtest, fsck, and git fsck) of my PC hardware (which is 9 [!] years old, but still going strong), and all was well. Furthermore, the above octal dumps showed no i/o issue with the corrupted file, and the problem always occurs (so far) with just this particular file. And these rare errors only started when I started enabling testing of examples/python/pytkdemo (our only file that imports PLframe which would generate the *.pyc as a byproduct of that import) with the test_pytkdemo target. So I am pretty sure this evidence largely rules out any hardware issue. And I have not been fiddling with my python versions, and in any case those changes should just change a version stamp (at least two bytes) in the file and not simply remove one byte. So by a process of elimination, I think this is likely one more candidate for the mysterious python pyc corruption issue. However, if the source of this corruption is a race condition in the python generation of these files, I believe that would only be an issue if there are simultaneous attempts to generate this file. The tests I run do use parallel builds but the test_pytkdemo target is implemented with a CMake custom target where there should be no build race conditions (attempting to build that target twice) unless there is a bug in either CMake or make. But if that were the case, we would be seeing similar errors for our other python test targets, and we don't. However, if you look at examples/python/pytkdemo, it is interesting that it imports PLframe in two ways, i.e. import Plframe from Plframe import * This is a fairly common (but sloppy) python idiom for importing both a namespaced and unnamespaced version of PLframe (because some of our code uses the namespaced version and some of our code does not). However, the only way you get a race out of that is if python looks ahead and starts doing the second import (which would attempt to also generate a PLframe.pyc file) before the first import was finished, and I have no idea whether that is a possibility or not. Anyhow, in the near future I plan to track down all our references to the version of PLframe that is not namespaced and convert it to the namespaced version so that second import can be eliminated. And it will be interesting to see if that makes this corruption issue disappear. Meanwhile, if anyone else can replicate this issue that would even be stronger evidence it is not due to my hardware. So if you want to help out with that, you should run the test_pytkdemo target but only after touching examples/python/Plframe.py (which would force python to regenerate the *.pyc file when the test_pytkdemo target is run). And you should do this test from time to time under a variety of load conditions so generating the above error even once may be difficult to accomplish. Alan __________________________ Alan W. Irwin Astronomical research affiliation with Department of Physics and Astronomy, University of Victoria (astrowww.phys.uvic.ca). Programming affiliations with the FreeEOS equation-of-state implementation for stellar interiors (freeeos.sf.net); the Time Ephemerides project (timeephem.sf.net); PLplot scientific plotting software package (plplot.sf.net); the libLASi project (unifont.org/lasi); the Loads of Linux Links project (loll.sf.net); and the Linux Brochure Project (lbproject.sf.net). __________________________ Linux-powered Science __________________________ |