From: Owen M. <owe...@bc...> - 2012-10-08 15:19:26
|
Hi Anthony, On 8 October 2012 15:54, Anthony Scopatz <sc...@gm...> wrote: > Hmmm, Are you actually copying the data (f.root.data[:]) or are you > simply passing a reference as arguments (f.root.data)? > I call f.root.data.read() on any arrays to load them into the process target args dictionary. I had assumed this returns a copy of the data. The documentation doesn't specify which, or even if there is any difference from __getitem__. So if you are opening a file in the master process and then > writing/creating/flushing from the workers this may cause a problem. > Multiprocess creates a fork of the original process so you are relying on > the file handle from the master process to not accidentally change somehow. > Can you try to open the files in the workers rather than the master? I > hope that this clears up the issue. > I am not accessing the master file from the worker processes. At least not by design, though as you say some kind of strange behaviour could be arising due to the copy-on-fork of Linux. In principle, each process has its own file and there is no sharing of files between processes. > Basically, I am advocating a more conservative approach where all data > that is read or written to in a worker must come from that worker, rather > than being generated by the master. If you are *still* experiencing > these problems, then we know we have a real problem. > I'm being about as conservative as can be with my system. Unless read() returns a reference to the master file there should be absolutely no sharing between processes. And even if my args dictionary contains a reference to the in-memory HDF5 file, how could reading it possibly trigger a call to openFile? Can you clarify the semantics of read() vs. __getitem__()? Thanks. Regards, Owen |