From: Anthony S. <sc...@gm...> - 2012-10-08 15:38:07
|
On Mon, Oct 8, 2012 at 11:19 AM, Owen Mackwood <owe...@bc... > wrote: > Hi Anthony, > > On 8 October 2012 15:54, Anthony Scopatz <sc...@gm...> wrote: > >> Hmmm, Are you actually copying the data (f.root.data[:]) or are you >> simply passing a reference as arguments (f.root.data)? >> > > I call f.root.data.read() on any arrays to load them into the process > target args dictionary. I had assumed this returns a copy of the data. The > documentation doesn't specify which, or even if there is any difference > from __getitem__. > > So if you are opening a file in the master process and then >> writing/creating/flushing from the workers this may cause a problem. >> Multiprocess creates a fork of the original process so you are relying on >> the file handle from the master process to not accidentally change somehow. >> Can you try to open the files in the workers rather than the master? I >> hope that this clears up the issue. >> > > I am not accessing the master file from the worker processes. At least not > by design, though as you say some kind of strange behaviour could be > arising due to the copy-on-fork of Linux. In principle, each process has > its own file and there is no sharing of files between processes. > > >> Basically, I am advocating a more conservative approach where all data >> that is read or written to in a worker must come from that worker, rather >> than being generated by the master. If you are *still* experiencing >> these problems, then we know we have a real problem. >> > > I'm being about as conservative as can be with my system. Unless read() > returns a reference to the master file there should be absolutely no > sharing between processes. And even if my args dictionary contains a > reference to the in-memory HDF5 file, how could reading it possibly trigger > a call to openFile? > > Can you clarify the semantics of read() vs. __getitem__()? Thanks. Hello Owen, So __getitem__() calls read() on the items it needs. Both should return a copy in-memory of the data that is on disk. Frankly, I am not really sure what is going on, given what you have said. A minimal example which reproduces the error would be really helpful. From the error that you have provided, though, the only thing that I can think of is that it is related to file opening on the worker processes. Be Well Anthony > > Regards, > Owen > > > ------------------------------------------------------------------------------ > Don't let slow site performance ruin your business. Deploy New Relic APM > Deploy New Relic app performance management and know exactly > what is happening inside your Ruby, Python, PHP, Java, and .NET app > Try New Relic at no cost today and get our sweet Data Nerd shirt too! > http://p.sf.net/sfu/newrelic-dev2dev > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-users > > |