From: Kenneth A. <ken...@gm...> - 2009-07-30 14:33:09
|
On Thu, Jul 30, 2009 at 9:00 AM, Francesc Alted<fa...@py...> wrote: > A Thursday 30 July 2009 14:32:42 escriguéreu: >> Yes, but I can't figure out how to have multiple apache processes >> connect to the same Queue object. > > Ah, correct, you are using different *processes*, not threads. Well, I think > that some kind of communications package must be used then. There are plenty > of options, but Pyro [1] or Ice [2] (I recently was told about this), seems to > be powerful and easy enough to program. If you want more performance, you may > want to use MPI via mpi4py [3], but I don't really think you are going to need > this. > > [1] http://pyro.sourceforge.net/ > [2] http://www.zeroc.com/icepy.html > [3] http://mpi4py.scipy.org/ Insert obligatory warning against over-engineering here: simple problems should have simple solutions. Simplest: a single-threaded, single-process Python server that directly handles the HTTP input and writes to PyTables. Concurrent requests just have to wait. Downside: a slow client can tie up the server for a long time. Multithreading/multiprocessing (both in the Python standard library) can help, but if that's an issue, try: Also pretty simple: a lightweight mod_python or fcgi script, written with, say, Django/CherryPy/web.py, that buffers the data in some temporary place while waiting for it to be written. Could be in memory or a file, or even a conventional relational database. Then the PyTables writer process just needs to know about that data. Files are easy; when you're done writing a file, move it into an "incoming" directory; then the PyTables writer can just poll 'incoming' for a file, process it, move it out of the way, repeat. If you're concerned about being able to report failure, you have to consider all the possible points of failure. The first solution has very simple failure reporting: "I wrote this to PyTables" or "I didn't". The second is a two-stage process, where all the client can report is "I passed this on to the writer process". But if your buffer is somewhere persistent and reliable (like disk), that's perhaps a _better_ report: even if the PyTables db gets corrupted somehow, you still have the data at least until you clean out the old stuff (which you can do after backing up the HDF5 file, for example). >> Not innovative as opposed to Pro, but OPSI itself seems to be >> innovative. And fast, apparently. Oh well, certainly don't want you to >> stop working on pytables, :-) > > Thank you :-) OPSI, on my brief look at it, seemed to be optimized for write-once, read-many. There are many other scenarios possible; for example, we have one scenario that requires checking if an item is already stored before writing it. The re-indexing that OPSI would require would hurt performance, though there may be ways around that. The point: if you know your problem well, you can probably make a more efficient implementation of just about anything than commercial general-purpose products. Regards, -Ken |