From: Francesc A. <fa...@py...> - 2004-06-22 10:47:20
|
A Dimarts 22 Juny 2004 02:04, Andrew Straw va escriure: > Sorry, the process never completed while writing the email. Playing > around with hdparm, I can now get ~6.5 MB/sec. With no compression, and > UCL, LZO, and zlib all reduce that rate. Mmm, just out of curiosity, I run some benchmarks simulating your scenario (see attachment). After some runs I can reproduce your numbers (I can get up to 5.5 MB/s on my laptop, a P4 Mobile @2 GHz, with a hard disk spinning at 4200 RPM, with a maximum throughput of 8 MB/s during writings). However, using ._append() instead of .append() *do* helped a lot in my case, improving the output from 2.8 MB/s to 5.5 MB/s (maybe this is because you have a faster CPU). These figures has been collected without compression. As I'm doing tests with a very slow hard disk, I used very repetitive data (all zeros) to bypass the bottleneck, but the results are barely the same. So, the bottleneck seems to be in the I/O calls, indeed. In order to determine if the problem was PyTables or the HDF5 layer, I used a small C program that opens the EArray only once, write all the data, and then close the array (PyTables, on its hand, always open and close on every append() operation). With that, I was able to achieve 7.7 MB/s, so very close of the writing limits of my disk. When using compression (zlib, complevel=1) and shuffling, however, I was able to achieve 22 MB/s. So, perhaps it would be feasible to reach 30 MB/s or more without using compression by using this kind of optimized writing on a system that supports faster writing speeds, like yours. So, most probably HDF5 would be able to achieve the speed that you need. PyTables is quite slower because of the way that it does I/O (i.e. opening and closing the EArray object on every append). Of course, as I said you in my firts message, that would be speeded-up by writing a specialized method that would open first the object, write frame objects and close at the end. > >If suggestion 2 is not enough (although I'd doubt it), things can be further ^^^^^^^^^^^^^^^^^^^^^ Ooops, I have a mouth too big ;) > Finally, as a suggestion, you may want to incorporate the following code > into the C source for PyTables, which will allow other Python threads to > continue running when performing long-running HDF5 tasks. See > http://docs.python.org/api/threads.html for more information. > > PyThreadState *_save; > _save = PyEval_SaveThread(); > /* Do work accessing HDF5 library which does not touch the Python API */ > PyEval_RestoreThread(_save); > > (The file_write function in Objects/fileobject.c in the Python > sourcecode uses the Py_BEGIN_ALLOW_THREADS and Py_END_ALLOW_THREADS > macros to acheive to do the same thing, but do to the error handling of > PyTables, you'll probably need 2 copies of > "PyEval_RestoreThread(_save)": one in the normal return path and one in > the error handling path.) Ok. Thanks for the suggestion. This is very interesting indeed :) Cheers, -- Francesc Alted |