From: braingateway <bra...@gm...> - 2010-10-13 22:45:23
|
Hi everyone, I used to work with numpy.memmap, the speed was roughly OK for me, but I always need to save corresponding metadata (such as variable names, variable shapes, experiment descriptions, etc.) into a separate file, which is a very bad approach when I have lots of data files and change their names from time to time. I heard a lot amazing characteristics about Pytables recently. It sounds perfectly match my application, It is based on HDF5, can be compressed by Blosc, and even faster I/O speed that numpy.memmap. So I decide to shift my project to Pytables. When I tried the official bench mark code (poly.py), it seems OK, at least without compression the I/O speed is faster than nump.memmap. However, when I try to dig a little big deeper, I got problems immediately. I did several different experiments to get familiar with performance spec of Pytables. First, I try to just read data chunks (smaller than (1E+6,24)) into RAM from a random location in a larger data file which containing (3E+6,24) random float64 numbers, about 549MB. For each reading operation, I obtained the average speed from 10 experiments. It took numpy.memmap 56ms to read 1E+6 long single column, and 73ms to read data chunk (1E+6,24). Pytables (with chunkshape (65536, 24) complib = None) scored 1470ms for (1E+6,) and 257ms for (1E+6,24).The standard deviations of all the results are always very low, which suggests the performance is stable. Surprisingly, Pytables are 3 times slower than numpy.memmap. I thought maybe pytables will show better or at least same performance as numpy.memmap when I need to stream data to the disk and there is some calculation involved. So next test, I used the same expr as official bench mark code (poly.py) to operate on the entire array and streamed the result onto disk. Averagely numpy.memmap+numexpr took 1.5s to finish the calculation, but Pytables took 9.0s. Then I start to think, this might because I used the wrong chunkshape for Pytables. So I did all the tests again with chunkshape = None which let the Pytables decide its optimized chunkshape (1365, 24). The results are actually much worse than bigger chunkshape except for reading (1E+6,) data into RAM, which is 225ms comparing to 1470ms with bigger chunkshape. It took 358ms for reading a chunk with size (1E+6,24) into RAM, and 14s to finish the expr calculation. In all the tests, the pytables use far less RAM (<300MB) than numpy.memmap (around 1GB). I am almost sure there is something I did wrong to make pytables so slow. So if you could give me some hint, I shall highly appreciate your assistance. I attached my test code and results. Thanks a lot, BigLittleBrain |
From: Francesc A. <fa...@py...> - 2010-10-14 07:30:43
|
Hi braingateway, A Thursday 14 October 2010 00:45:05 braingateway escrigué: > Hi everyone, > > > I used to work with numpy.memmap, the speed was roughly OK for me, > but I always need to save corresponding metadata (such as variable > names, variable shapes, experiment descriptions, etc.) into a > separate file, which is a very bad approach when I have lots of data > files and change their names from time to time. I heard a lot > amazing characteristics about Pytables recently. It sounds perfectly > match my application, It is based on HDF5, can be compressed by > Blosc, and even faster I/O speed that numpy.memmap. So I decide to > shift my project to Pytables. When I tried the official bench mark > code (poly.py), it seems OK, at least without compression the I/O > speed is faster than nump.memmap. However, when I try to dig a > little big deeper, I got problems immediately. Mmh, you rather meant *performance* problems probably :-) > I did several > different experiments to get familiar with performance spec of > Pytables. First, I try to just read data chunks (smaller than > (1E+6,24)) into RAM from a random location in a larger data file > which containing (3E+6,24) random float64 numbers, about 549MB. For > each reading operation, I obtained the average speed from 10 > experiments. It took numpy.memmap 56ms to read 1E+6 long single > column, and 73ms to read data chunk (1E+6,24). Pytables (with > chunkshape (65536, 24) complib = None) scored 1470ms for (1E+6,) and > 257ms for (1E+6,24).The standard deviations of all the results are > always very low, which suggests the performance is stable. I've been reading your code, and you are accessing your data column- wise, instead of row-wise. In the C-world (and hence Python, NumPy, PyTables...) you want to make sure that you access data by row, not column, to get maximum performance. For an explanation on why see: https://portal.g-node.org/python- autumnschool/_media/materials/starving_cpus/starvingcpus.pdf and specifically slides 23 and 31. > Surprisingly, Pytables are 3 times slower than numpy.memmap. I > thought maybe pytables will show better or at least same performance > as numpy.memmap when I need to stream data to the disk and there is > some calculation involved. So next test, I used the same expr as > official bench mark code (poly.py) to operate on the entire array > and streamed the result onto disk. Averagely numpy.memmap+numexpr > took 1.5s to finish the calculation, but Pytables took 9.0s. Then I > start to think, this might because I used the wrong chunkshape for > Pytables. So I did all the tests again with chunkshape = None which > let the Pytables decide its optimized chunkshape (1365, 24). The > results are actually much worse than bigger chunkshape except for > reading (1E+6,) data into RAM, which is 225ms comparing to 1470ms > with bigger chunkshape. It took 358ms for reading a chunk with size > (1E+6,24) into RAM, and 14s to finish the expr calculation. In all > the tests, the pytables use far less RAM (<300MB) than numpy.memmap > (around 1GB). PyTables should not use as much as 300 MB for your problem. You are probably speaking about virtual memory, but you should get the amount of *resident* memory instead. > I am almost sure there is something I did wrong to > make pytables so slow. So if you could give me some hint, I shall > highly appreciate your assistance. I attached my test code and > results. Another thing about your "performance problems" when using compression is that you are trying your benchmarks with completely random data, and in this case, compression is rather useless. Make sure that you use real data for your benchmarks. If it is compressible, things might change a lot. BTW, in order to make your messages more readable, it would help if you can make a proper use of paragraphing. You know, trying to read a big paragraph with 40 lines is not exactly easy. Cheers, -- Francesc Alted |
From: braingateway <bra...@gm...> - 2010-10-14 19:09:38
|
Francesc Alted: > Hi braingateway, > > A Thursday 14 October 2010 00:45:05 braingateway escrigué: > >> Hi everyone, >> >> >> I used to work with numpy.memmap, the speed was roughly OK for me, >> but I always need to save corresponding metadata (such as variable >> names, variable shapes, experiment descriptions, etc.) into a >> separate file, which is a very bad approach when I have lots of data >> files and change their names from time to time. I heard a lot >> amazing characteristics about Pytables recently. It sounds perfectly >> match my application, It is based on HDF5, can be compressed by >> Blosc, and even faster I/O speed that numpy.memmap. So I decide to >> shift my project to Pytables. When I tried the official bench mark >> code (poly.py), it seems OK, at least without compression the I/O >> speed is faster than nump.memmap. However, when I try to dig a >> little big deeper, I got problems immediately. >> > > Mmh, you rather meant *performance* problems probably :-) > > >> I did several >> different experiments to get familiar with performance spec of >> Pytables. First, I try to just read data chunks (smaller than >> (1E+6,24)) into RAM from a random location in a larger data file >> which containing (3E+6,24) random float64 numbers, about 549MB. For >> each reading operation, I obtained the average speed from 10 >> experiments. It took numpy.memmap 56ms to read 1E+6 long single >> column, and 73ms to read data chunk (1E+6,24). Pytables (with >> chunkshape (65536, 24) complib = None) scored 1470ms for (1E+6,) and >> 257ms for (1E+6,24).The standard deviations of all the results are >> always very low, which suggests the performance is stable. >> > > I've been reading your code, and you are accessing your data column- > wise, instead of row-wise. In the C-world (and hence Python, NumPy, > PyTables...) you want to make sure that you access data by row, not > column, to get maximum performance. For an explanation on why see: > > https://portal.g-node.org/python- > autumnschool/_media/materials/starving_cpus/starvingcpus.pdf > > and specifically slides 23 and 31. > > >> Surprisingly, Pytables are 3 times slower than numpy.memmap. I >> thought maybe pytables will show better or at least same performance >> as numpy.memmap when I need to stream data to the disk and there is >> some calculation involved. So next test, I used the same expr as >> official bench mark code (poly.py) to operate on the entire array >> and streamed the result onto disk. Averagely numpy.memmap+numexpr >> took 1.5s to finish the calculation, but Pytables took 9.0s. Then I >> start to think, this might because I used the wrong chunkshape for >> Pytables. So I did all the tests again with chunkshape = None which >> let the Pytables decide its optimized chunkshape (1365, 24). The >> results are actually much worse than bigger chunkshape except for >> reading (1E+6,) data into RAM, which is 225ms comparing to 1470ms >> with bigger chunkshape. It took 358ms for reading a chunk with size >> (1E+6,24) into RAM, and 14s to finish the expr calculation. In all >> the tests, the pytables use far less RAM (<300MB) than numpy.memmap >> (around 1GB). >> > > PyTables should not use as much as 300 MB for your problem. You are > probably speaking about virtual memory, but you should get the amount of > *resident* memory instead. > > >> I am almost sure there is something I did wrong to >> make pytables so slow. So if you could give me some hint, I shall >> highly appreciate your assistance. I attached my test code and >> results. >> > > Another thing about your "performance problems" when using compression > is that you are trying your benchmarks with completely random data, and > in this case, compression is rather useless. Make sure that you use > real data for your benchmarks. If it is compressible, things might > change a lot. > > BTW, in order to make your messages more readable, it would help if you > can make a proper use of paragraphing. You know, trying to read a big > paragraph with 40 lines is not exactly easy. > > Cheers, > > Sorry about the super big paragraph! Thanks a lot for your detailed response! I was aware it is pointless to compress pure random data, so I did not mention the compression rate at all in my post. Unfortunately, the dynamic range of my data is very large and it is very “random”-like. Blosc only reduces 10% file size of my real dataset , so I am not a fan of the compression feature. I am really confused about the dimension order. I cannot see the freedom to change the Column-major or Row-major, because the HDF5 is Row-major. For example, I got N different sensors, each sensor generate 1E9 samples/s, the fixed length dimension (fastest dimension) should always store N-samples from sensor network, so the time always has to be the column. And in most case, we always want to access data from all sensors during certain period of time. In some case, we only want to access data just from one or two sensors. So I think it is correct to make raw stores data from all sensors at the same time point. In my opinion, almost for all kind of real-world data, the slowest dimension should always represent the time. Probably, I should inverse the dimension order when I load them into RAM. Even though I did invert the dimension order,. the speed did not improve for accessing all channels, but did improve a lot for only accessing data from one sensor for both memmap and pytables. However, the pytables is still much slower than memmap: Read 24x1e6 data chunk at random position: Memmap: 128ms, (without shift dimension order: 81ms) Pytables (automatic chunshape = (1, 32768)) 327ms, (without shift dimension order: 358ms) Pytables ( chunshape = (24, 65536)): 270ms, (without shift dimension order: 255ms) Pytables ( chunshape = (1, 65535)): 328ms Calculate expr on the whole array: Memmap: 1.4~1.8s (without shift dimension order: 1.4~1.6s) Pytables (automatic chunshape = (1, 32768)): 9.4s, (without shift dimension order: 14s) Pytables ( chunshape = (24, 65536)): 16s, (without shift dimension order: 9s) Pytables ( chunshape = (1, 65535)): 13s Should I change some default parameters such as buffersize, etc, to improve the performance? By the way, after I change shape to (24,3e6), the pytables.Expr returns an Error: The error was --> <type 'exceptions.AttributeError'>: 'Expr' object has no attribute 'BUFFERTIMES'. I think this is because you have not updated the expression.py for new ‘BUFFER_TIMES’ parameter? So I add: from tables.parameters import BUFFER_TIMES change self.BUFFERTIMES to BUFFER_TIMES I hope this is correct. Thanks a lot LittleBigBrain |
From: Francesc A. <fa...@py...> - 2010-10-15 08:24:52
|
A Thursday 14 October 2010 21:09:18 braingateway escrigué: > Sorry about the super big paragraph! Thanks a lot for your detailed > response! > I was aware it is pointless to compress pure random data, so I did > not mention the compression rate at all in my post. Unfortunately, > the dynamic range of my data is very large and it is very > “random”-like. Blosc only reduces 10% file size of my real dataset , > so I am not a fan of the compression feature. I see. Then it should be better to take compression out of the measurements and focus on I/O speed. > I am really confused about the dimension order. I cannot see the > freedom to change the Column-major or Row-major, because the HDF5 is > Row-major. For example, I got N different sensors, each sensor > generate 1E9 samples/s, the fixed length dimension (fastest > dimension) should always store N-samples from sensor network, so the > time always has to be the column. And in most case, we always want > to access data from all sensors during certain period of time. In > some case, we only want to access data just from one or two sensors. > So I think it is correct to make raw stores data from all sensors at > the same time point. In my opinion, almost for all kind of > real-world data, the slowest dimension should always represent the > time. Probably, I should inverse the dimension order when I load > them into RAM. You always have to find the best way to combine convenience and performance. If in some cases you cannot do this, then you have to choose: convenience *or* performance. > Even though I did invert the dimension order,. the speed did not > improve for accessing all channels, but did improve a lot for only > accessing data from one sensor for both memmap and pytables. Exactly. That was my point. > However, the pytables is still much slower than memmap: > > Read 24x1e6 data chunk at random position: > > Memmap: 128ms, (without shift dimension order: 81ms) > Pytables (automatic chunshape = (1, 32768)) 327ms, (without shift > dimension order: 358ms) > Pytables ( chunshape = (24, 65536)): 270ms, (without shift dimension > order: 255ms) > Pytables ( chunshape = (1, 65535)): 328ms That 'bad' performance of Pytables regarding numpy.memmap is kind of expected. You should be aware that HDF5 is quite more complex (but more featured too) than memmap, so the overhead is significant, specially in this case where you are using a chunked dataset (CArray) for doing the comparison. It is also important the fact that you are benchmarking with datasets that fits well into OS filesystem cache (for example, when using a machine with > 1 GB of RAM), and in this case the disk is touched very little. But as soon as your datasets exceeds the amount of your available memory, the performance of memmap and HDF5 would become much closer. In case you are interested only in situations where your datasets fits well in-memory, probably you will get much better results if you use a non-chunked dataset (i.e. a plain Array) in PyTables. But still, if you are expecting PyTables to be faster than the much simpler memmap approach, then I'm going to disappoint you: that simply will not happen (unless your data is very compressible, but that's not your case). > Calculate expr on the whole array: > > Memmap: 1.4~1.8s (without shift dimension order: 1.4~1.6s) > Pytables (automatic chunshape = (1, 32768)): 9.4s, (without shift > dimension order: 14s) > Pytables ( chunshape = (24, 65536)): 16s, (without shift dimension > order: 9s) > Pytables ( chunshape = (1, 65535)): 13s > > Should I change some default parameters such as buffersize, etc, to > improve the performance? No, I don't think you are able to get much more performance out or tables.Expr. But, mind you, you are not comparing apples with apples here. numpy.memmap and tables.Expr are paradigms for performing out-of- core computations (i.e. computations with operands that do not fit in memory). But in your example, for the numpy.memmap case, you are loading everything in-memory and then calling numexpr for performing operations, while you are using tables.Expr for doing the same operations but *on-disk* and hence the big difference in performance. In order to compare apples with apples my advice is to use tables.Array+numexpr if you want to compare it with numpy.memmap+numexpr for an in-memory paradigm. Or, if you really want to do out-of-core computations, then use numpy.memmap and perform operations directly on- disk (i.e. without using numexpr). See: https://portal.g-node.org/python- autumnschool/_media/materials/starving_cpus/poly2.py for an example of apples-with-apples comparison for out-of-core computations. BTW, I've seen that you are still using numexpr 1.3; you may want to use 1.4 instead, where I've implemented multi-threading sometime ago. That might boost your computations quite a bit. > By the way, after I change shape to (24,3e6), the pytables.Expr > returns an Error: > The error was --> <type 'exceptions.AttributeError'>: 'Expr' object > has no attribute 'BUFFERTIMES'. Hmm, that should be a bug. Could you send a small self-contained example so that I can fix that? > I think this is because you have not updated the expression.py for > new ‘BUFFER_TIMES’ parameter? So I add: > from tables.parameters import BUFFER_TIMES > change self.BUFFERTIMES to BUFFER_TIMES > > I hope this is correct. Hope that helps, -- Francesc Alted |
From: Josh A. <jos...@gm...> - 2010-10-15 15:58:50
|
> > > BTW, I've seen that you are still using numexpr 1.3; you may want to use > 1.4 instead, where I've implemented multi-threading sometime ago. That > might boost your computations quite a bit. > Any chance you could release a Python 2.7, Win32 binary for numexpr 1.4? I'm still using 1.3.1 as well, since that's the latest available for Python 2.7. |
From: Francesc A. <fa...@py...> - 2010-10-18 09:06:18
|
A Friday 15 October 2010 17:58:41 Josh Ayers escrigué: > > BTW, I've seen that you are still using numexpr 1.3; you may want > > to use 1.4 instead, where I've implemented multi-threading > > sometime ago. That might boost your computations quite a bit. > > Any chance you could release a Python 2.7, Win32 binary for numexpr > 1.4? I'm still using 1.3.1 as well, since that's the latest > available for Python 2.7. I'll provide one for forthcoming Numexpr 1.4.1, although provided the excellent work that Christoph Golke is doing lately building binaries for common scientific libraries for Windows: http://www.lfd.uci.edu/~gohlke/pythonlibs/ I don't know if there is any point in building Windows binaries myself anymore :-) -- Francesc Alted |
From: braingateway <bra...@gm...> - 2010-10-19 22:47:46
|
Hi Francesc, Sorry for the delay. I do not have time to work this last few days. Now, I attached a example and a possible fixation. LittleBigBrain Francesc Alted: >> By the way, after I change shape to (24,3e6), the pytables.Expr >> returns an Error: >> The error was --> <type 'exceptions.AttributeError'>: 'Expr' object >> has no attribute 'BUFFERTIMES'. >> > > Hmm, that should be a bug. Could you send a small self-contained > example so that I can fix that? > > >> I think this is because you have not updated the expression.py for >> new ‘BUFFER_TIMES’ parameter? So I add: >> from tables.parameters import BUFFER_TIMES >> change self.BUFFERTIMES to BUFFER_TIMES >> >> I hope this is correct. >> > > Hope that helps, > > |
From: Francesc A. <fa...@py...> - 2010-10-21 08:04:55
|
A Wednesday 20 October 2010 00:47:32 braingateway escrigué: > Hi Francesc, > > Sorry for the delay. I do not have time to work this last few days. > Now, I attached a example and a possible fixation. > > LittleBigBrain > > Francesc Alted: > >> By the way, after I change shape to (24,3e6), the pytables.Expr > >> returns an Error: > >> The error was --> <type 'exceptions.AttributeError'>: 'Expr' > >> object has no attribute 'BUFFERTIMES'. > > > > Hmm, that should be a bug. Could you send a small self-contained > > example so that I can fix that? > > > >> I think this is because you have not updated the expression.py for > >> new ‘BUFFER_TIMES’ parameter? So I add: > >> from tables.parameters import BUFFER_TIMES > >> change self.BUFFERTIMES to BUFFER_TIMES > >> > >> I hope this is correct. > > > > Hope that helps, Thanks. This has been fixed. For details see: http://www.pytables.org/trac/ticket/300 -- Francesc Alted |