From: Anthony S. <sc...@gm...> - 2012-10-06 22:29:00
|
Hi Owen, How many pools do you have? Is this a random runtime failure? What kind of system is this one? Is there some particular fucntion in Python that you are running? (It seems to be openFile(), but I can't be sure...) The error is definitely happening down in the H5open() routine. Now whether this is HDF5's fault or ours, I am not yet sure. Be Well Anthony On Sat, Oct 6, 2012 at 4:56 AM, Owen Mackwood <owe...@bc...>wrote: > Hi Anthony, > > I'm not trying to write in parallel. Each worker process has its own file > to write to. After all tasks are completed, I collect the results in the > master process. So the problem I'm seeing (a hang in the worker process) > shouldn't have anything to do with parallel writes. Do you have any other > suggestions? > > Regards, > Owen > > On 5 October 2012 18:38, Anthony Scopatz <sc...@gm...> wrote: > >> Hello Owen, >> >> While you can use process pools to read from a file in parallel just >> fine, writing is another story completely. While HDF5 itself supports >> parallel writing though MPI, this comes at the high cost of compression no >> longer being available and a much more complicated code base. So for the >> time being, PyTables only supports the serial HDF5 library. >> >> Therefore if you want to write to a file in parallel, you adopt a >> strategy where you have one process which is responsible for all of the >> writing and all other processes send their data to this process instead of >> writing to file directly. This is a very effective way >> of accomplishing basically what you need. In fact, we have an example to >> do just that [1]. (As a side note: HDF5 may soon be adding an API for >> exactly this pattern because it comes up so often.) >> >> So if I were you, I would look at [1] and adopt it to my use case. >> >> Be Well >> Anthony >> >> 1. >> https://github.com/PyTables/PyTables/blob/develop/examples/multiprocess_access_queues.py >> >> On Fri, Oct 5, 2012 at 9:55 AM, Owen Mackwood < >> owe...@bc...> wrote: >> >>> Hello, >>> >>> I'm using a multiprocessing.Pool to parallelize a set of tasks which >>> record their results into separate hdf5 files. Occasionally (less than 2% >>> of the time) the worker process will hang. According to gdb, the problem >>> occurs while opening the hdf5 file, when it attempts to obtain the >>> associated mutex. Here's part of the backtrace: >>> >>> #0 0x00007fb2ceaa716c in pthread_cond_wait@@GLIBC_2.3.2 () from >>> /lib/libpthread.so.0 >>> #1 0x00007fb2be61c215 in H5TS_mutex_lock () from /usr/lib/libhdf5.so.6 >>> #2 0x00007fb2be32bff0 in H5open () from /usr/lib/libhdf5.so.6 >>> #3 0x00007fb2b96226a4 in __pyx_pf_6tables_13hdf5Extension_4File__g_new >>> (__pyx_v_self=0x7fb2b04867d0, __pyx_args=<value optimized out>, >>> __pyx_kwds=<value optimized out>) >>> at tables/hdf5Extension.c:2820 >>> #4 0x00000000004abf62 in ext_do_call (f=0x4cb2430, throwflag=<value >>> optimized out>) at Python/ceval.c:4331 >>> >>> Nothing else is trying to open this file, so can someone suggest why >>> this is occurring? This is a very annoying problem as there is no way to >>> recover from this error, and consequently the worker process is permanently >>> occupied, which effectively removes one of my processors from the pool. >>> >>> Regards, >>> Owen Mackwood >>> >>> >>> ------------------------------------------------------------------------------ >>> Don't let slow site performance ruin your business. Deploy New Relic APM >>> Deploy New Relic app performance management and know exactly >>> what is happening inside your Ruby, Python, PHP, Java, and .NET app >>> Try New Relic at no cost today and get our sweet Data Nerd shirt too! >>> http://p.sf.net/sfu/newrelic-dev2dev >>> _______________________________________________ >>> Pytables-users mailing list >>> Pyt...@li... >>> https://lists.sourceforge.net/lists/listinfo/pytables-users >>> >>> >> >> >> ------------------------------------------------------------------------------ >> Don't let slow site performance ruin your business. Deploy New Relic APM >> Deploy New Relic app performance management and know exactly >> what is happening inside your Ruby, Python, PHP, Java, and .NET app >> Try New Relic at no cost today and get our sweet Data Nerd shirt too! >> http://p.sf.net/sfu/newrelic-dev2dev >> _______________________________________________ >> Pytables-users mailing list >> Pyt...@li... >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> >> > > > ------------------------------------------------------------------------------ > Don't let slow site performance ruin your business. Deploy New Relic APM > Deploy New Relic app performance management and know exactly > what is happening inside your Ruby, Python, PHP, Java, and .NET app > Try New Relic at no cost today and get our sweet Data Nerd shirt too! > http://p.sf.net/sfu/newrelic-dev2dev > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-users > > |