Thread: [Pytables-users] Can pytables be as fast as numpy.memmap?

Brought to you by: a_valentino, falted, ivilata, joshmoore

pytables-users

[Pytables-users] Can pytables be as fast as numpy.memmap?

From: braingateway <bra...@gm...> - 2010-10-13 22:45:23

Attachments: pyIOspeedCompareAll.py pyIOspeedCompareAllV2.py pytables_vs_memmap_benchmarkTest.txt pyIOspeedCompareAll.txt pyIOspeedCompareAllV2.txt

Hi everyone,


I used to work with numpy.memmap, the speed was roughly OK for me, but I
always need to save corresponding metadata (such as variable names,
variable shapes, experiment descriptions, etc.) into a separate file,
which is a very bad approach when I have lots of data files and change
their names from time to time. I heard a lot amazing characteristics
about Pytables recently. It sounds perfectly match my application, It is
based on HDF5, can be compressed by Blosc, and even faster I/O speed
that numpy.memmap. So I decide to shift my project to Pytables. When I
tried the official bench mark code (poly.py), it seems OK, at least
without compression the I/O speed is faster than nump.memmap. However,
when I try to dig a little big deeper, I got problems immediately. I did
several different experiments to get familiar with performance spec of
Pytables. First, I try to just read data chunks (smaller than (1E+6,24))
into RAM from a random location in a larger data file which containing
(3E+6,24) random float64 numbers, about 549MB. For each reading
operation, I obtained the average speed from 10 experiments. It took
numpy.memmap 56ms to read 1E+6 long single column, and 73ms to read data
chunk (1E+6,24). Pytables (with chunkshape (65536, 24) complib = None)
scored 1470ms for (1E+6,) and 257ms for (1E+6,24).The standard
deviations of all the results are always very low, which suggests the
performance is stable. Surprisingly, Pytables are 3 times slower than
numpy.memmap. I thought maybe pytables will show better or at least same
performance as numpy.memmap when I need to stream data to the disk and
there is some calculation involved. So next test, I used the same expr
as official bench mark code (poly.py) to operate on the entire array and
streamed the result onto disk. Averagely numpy.memmap+numexpr took 1.5s
to finish the calculation, but Pytables took 9.0s. Then I start to
think, this might because I used the wrong chunkshape for Pytables. So I
did all the tests again with chunkshape = None which let the Pytables
decide its optimized chunkshape (1365, 24). The results are actually
much worse than bigger chunkshape except for reading (1E+6,) data into
RAM, which is 225ms comparing to 1470ms with bigger chunkshape. It took
358ms for reading a chunk with size (1E+6,24) into RAM, and 14s to
finish the expr calculation. In all the tests, the pytables use far less
RAM (<300MB) than numpy.memmap (around 1GB). I am almost sure there is
something I did wrong to make pytables so slow. So if you could give me
some hint, I shall highly appreciate your assistance. I attached my test
code and results.

Thanks a lot,

BigLittleBrain

Re: [Pytables-users] Can pytables be as fast as numpy.memmap?

From: Francesc A. <fa...@py...> - 2010-10-14 07:30:43

Hi braingateway,

A Thursday 14 October 2010 00:45:05 braingateway escrigué:
> Hi everyone,
> 
> 
> I used to work with numpy.memmap, the speed was roughly OK for me,
> but I always need to save corresponding metadata (such as variable
> names, variable shapes, experiment descriptions, etc.) into a
> separate file, which is a very bad approach when I have lots of data
> files and change their names from time to time. I heard a lot
> amazing characteristics about Pytables recently. It sounds perfectly
> match my application, It is based on HDF5, can be compressed by
> Blosc, and even faster I/O speed that numpy.memmap. So I decide to
> shift my project to Pytables. When I tried the official bench mark
> code (poly.py), it seems OK, at least without compression the I/O
> speed is faster than nump.memmap. However, when I try to dig a
> little big deeper, I got problems immediately.

Mmh, you rather meant *performance* problems probably :-)

> I did several
> different experiments to get familiar with performance spec of
> Pytables. First, I try to just read data chunks (smaller than
> (1E+6,24)) into RAM from a random location in a larger data file
> which containing (3E+6,24) random float64 numbers, about 549MB. For
> each reading operation, I obtained the average speed from 10
> experiments. It took numpy.memmap 56ms to read 1E+6 long single
> column, and 73ms to read data chunk (1E+6,24). Pytables (with
> chunkshape (65536, 24) complib = None) scored 1470ms for (1E+6,) and
> 257ms for (1E+6,24).The standard deviations of all the results are
> always very low, which suggests the performance is stable.

I've been reading your code, and you are accessing your data column-
wise, instead of row-wise.  In the C-world (and hence Python, NumPy, 
PyTables...) you want to make sure that you access data by row, not 
column, to get maximum performance.  For an explanation on why see:

https://portal.g-node.org/python-
autumnschool/_media/materials/starving_cpus/starvingcpus.pdf

and specifically slides 23 and 31.

> Surprisingly, Pytables are 3 times slower than numpy.memmap. I
> thought maybe pytables will show better or at least same performance
> as numpy.memmap when I need to stream data to the disk and there is
> some calculation involved. So next test, I used the same expr as
> official bench mark code (poly.py) to operate on the entire array
> and streamed the result onto disk. Averagely numpy.memmap+numexpr
> took 1.5s to finish the calculation, but Pytables took 9.0s. Then I
> start to think, this might because I used the wrong chunkshape for
> Pytables. So I did all the tests again with chunkshape = None which
> let the Pytables decide its optimized chunkshape (1365, 24). The
> results are actually much worse than bigger chunkshape except for
> reading (1E+6,) data into RAM, which is 225ms comparing to 1470ms
> with bigger chunkshape. It took 358ms for reading a chunk with size
> (1E+6,24) into RAM, and 14s to finish the expr calculation. In all
> the tests, the pytables use far less RAM (<300MB) than numpy.memmap
> (around 1GB).

PyTables should not use as much as 300 MB for your problem.  You are 
probably speaking about virtual memory, but you should get the amount of 
*resident* memory instead.

> I am almost sure there is something I did wrong to
> make pytables so slow. So if you could give me some hint, I shall
> highly appreciate your assistance. I attached my test code and
> results.

Another thing about your "performance problems" when using compression 
is that you are trying your benchmarks with completely random data, and 
in this case, compression is rather useless.  Make sure that you use 
real data for your benchmarks.  If it is compressible, things might 
change a lot.

BTW, in order to make your messages more readable, it would help if you 
can make a proper use of paragraphing.  You know, trying to read a big 
paragraph with 40 lines is not exactly easy.

Cheers,

-- 
Francesc Alted

Re: [Pytables-users] Can pytables be as fast as numpy.memmap?

From: braingateway <bra...@gm...> - 2010-10-14 19:09:38

Attachments: pyIOspeedCompareAllV2Cm.py pyIOspeedCompareAll-v2Cm-201010141923.txt pyIOspeedCompareAll-v1bCm-201010142038.txt pyIOspeedCompareAll-v1Cm-201010142020.txt

Francesc Alted:
> Hi braingateway,
>
> A Thursday 14 October 2010 00:45:05 braingateway escrigué:
>   
>> Hi everyone,
>>
>>
>> I used to work with numpy.memmap, the speed was roughly OK for me,
>> but I always need to save corresponding metadata (such as variable
>> names, variable shapes, experiment descriptions, etc.) into a
>> separate file, which is a very bad approach when I have lots of data
>> files and change their names from time to time. I heard a lot
>> amazing characteristics about Pytables recently. It sounds perfectly
>> match my application, It is based on HDF5, can be compressed by
>> Blosc, and even faster I/O speed that numpy.memmap. So I decide to
>> shift my project to Pytables. When I tried the official bench mark
>> code (poly.py), it seems OK, at least without compression the I/O
>> speed is faster than nump.memmap. However, when I try to dig a
>> little big deeper, I got problems immediately.
>>     
>
> Mmh, you rather meant *performance* problems probably :-)
>
>   
>> I did several
>> different experiments to get familiar with performance spec of
>> Pytables. First, I try to just read data chunks (smaller than
>> (1E+6,24)) into RAM from a random location in a larger data file
>> which containing (3E+6,24) random float64 numbers, about 549MB. For
>> each reading operation, I obtained the average speed from 10
>> experiments. It took numpy.memmap 56ms to read 1E+6 long single
>> column, and 73ms to read data chunk (1E+6,24). Pytables (with
>> chunkshape (65536, 24) complib = None) scored 1470ms for (1E+6,) and
>> 257ms for (1E+6,24).The standard deviations of all the results are
>> always very low, which suggests the performance is stable.
>>     
>
> I've been reading your code, and you are accessing your data column-
> wise, instead of row-wise.  In the C-world (and hence Python, NumPy, 
> PyTables...) you want to make sure that you access data by row, not 
> column, to get maximum performance.  For an explanation on why see:
>
> https://portal.g-node.org/python-
> autumnschool/_media/materials/starving_cpus/starvingcpus.pdf
>
> and specifically slides 23 and 31.
>
>   
>> Surprisingly, Pytables are 3 times slower than numpy.memmap. I
>> thought maybe pytables will show better or at least same performance
>> as numpy.memmap when I need to stream data to the disk and there is
>> some calculation involved. So next test, I used the same expr as
>> official bench mark code (poly.py) to operate on the entire array
>> and streamed the result onto disk. Averagely numpy.memmap+numexpr
>> took 1.5s to finish the calculation, but Pytables took 9.0s. Then I
>> start to think, this might because I used the wrong chunkshape for
>> Pytables. So I did all the tests again with chunkshape = None which
>> let the Pytables decide its optimized chunkshape (1365, 24). The
>> results are actually much worse than bigger chunkshape except for
>> reading (1E+6,) data into RAM, which is 225ms comparing to 1470ms
>> with bigger chunkshape. It took 358ms for reading a chunk with size
>> (1E+6,24) into RAM, and 14s to finish the expr calculation. In all
>> the tests, the pytables use far less RAM (<300MB) than numpy.memmap
>> (around 1GB).
>>     
>
> PyTables should not use as much as 300 MB for your problem.  You are 
> probably speaking about virtual memory, but you should get the amount of 
> *resident* memory instead.
>
>   
>> I am almost sure there is something I did wrong to
>> make pytables so slow. So if you could give me some hint, I shall
>> highly appreciate your assistance. I attached my test code and
>> results.
>>     
>
> Another thing about your "performance problems" when using compression 
> is that you are trying your benchmarks with completely random data, and 
> in this case, compression is rather useless.  Make sure that you use 
> real data for your benchmarks.  If it is compressible, things might 
> change a lot.
>
> BTW, in order to make your messages more readable, it would help if you 
> can make a proper use of paragraphing.  You know, trying to read a big 
> paragraph with 40 lines is not exactly easy.
>
> Cheers,
>
>   

Sorry about the super big paragraph! Thanks a lot for your detailed 
response!
I was aware it is pointless to compress pure random data, so I did not 
mention the compression rate at all in my post. Unfortunately, the 
dynamic range of my data is very large and it is very “random”-like. 
Blosc only reduces 10% file size of my real dataset , so I am not a fan 
of the compression feature.

I am really confused about the dimension order. I cannot see the freedom 
to change the Column-major or Row-major, because the HDF5 is Row-major. 
For example, I got N different sensors, each sensor generate 1E9 
samples/s, the fixed length dimension (fastest dimension) should always 
store N-samples from sensor network, so the time always has to be the 
column. And in most case, we always want to access data from all sensors 
during certain period of time. In some case, we only want to access data 
just from one or two sensors. So I think it is correct to make raw 
stores data from all sensors at the same time point. In my opinion, 
almost for all kind of real-world data, the slowest dimension should 
always represent the time. Probably, I should inverse the dimension 
order when I load them into RAM.

Even though I did invert the dimension order,. the speed did not improve 
for accessing all channels, but did improve a lot for only accessing 
data from one sensor for both memmap and pytables. However, the pytables 
is still much slower than memmap:

Read 24x1e6 data chunk at random position:

Memmap: 128ms, (without shift dimension order: 81ms)
Pytables (automatic chunshape = (1, 32768)) 327ms, (without shift 
dimension order: 358ms)
Pytables ( chunshape = (24, 65536)): 270ms, (without shift dimension 
order: 255ms)
Pytables ( chunshape = (1, 65535)): 328ms

Calculate expr on the whole array:

Memmap: 1.4~1.8s (without shift dimension order: 1.4~1.6s)
Pytables (automatic chunshape = (1, 32768)): 9.4s, (without shift 
dimension order: 14s)
Pytables ( chunshape = (24, 65536)): 16s, (without shift dimension 
order: 9s)
Pytables ( chunshape = (1, 65535)): 13s

Should I change some default parameters such as buffersize, etc, to 
improve the performance?

By the way, after I change shape to (24,3e6), the pytables.Expr returns 
an Error:
The error was --> <type 'exceptions.AttributeError'>: 'Expr' object has 
no attribute 'BUFFERTIMES'.

I think this is because you have not updated the expression.py for new 
‘BUFFER_TIMES’ parameter? So I add:
from tables.parameters import BUFFER_TIMES
change self.BUFFERTIMES to BUFFER_TIMES

I hope this is correct.

Thanks a lot

LittleBigBrain

Re: [Pytables-users] Can pytables be as fast as numpy.memmap?

From: Francesc A. <fa...@py...> - 2010-10-15 08:24:52

A Thursday 14 October 2010 21:09:18 braingateway escrigué:
> Sorry about the super big paragraph! Thanks a lot for your detailed
> response!
> I was aware it is pointless to compress pure random data, so I did
> not mention the compression rate at all in my post. Unfortunately,
> the dynamic range of my data is very large and it is very
> “random”-like. Blosc only reduces 10% file size of my real dataset ,
> so I am not a fan of the compression feature.

I see.  Then it should be better to take compression out of the 
measurements and focus on I/O speed.

> I am really confused about the dimension order. I cannot see the
> freedom to change the Column-major or Row-major, because the HDF5 is
> Row-major. For example, I got N different sensors, each sensor
> generate 1E9 samples/s, the fixed length dimension (fastest
> dimension) should always store N-samples from sensor network, so the
> time always has to be the column. And in most case, we always want
> to access data from all sensors during certain period of time. In
> some case, we only want to access data just from one or two sensors.
> So I think it is correct to make raw stores data from all sensors at
> the same time point. In my opinion, almost for all kind of
> real-world data, the slowest dimension should always represent the
> time. Probably, I should inverse the dimension order when I load
> them into RAM.

You always have to find the best way to combine convenience and 
performance.  If in some cases you cannot do this, then you have to 
choose: convenience *or* performance.

> Even though I did invert the dimension order,. the speed did not
> improve for accessing all channels, but did improve a lot for only
> accessing data from one sensor for both memmap and pytables.

Exactly.  That was my point.

> However, the pytables is still much slower than memmap:
> 
> Read 24x1e6 data chunk at random position:
> 
> Memmap: 128ms, (without shift dimension order: 81ms)
> Pytables (automatic chunshape = (1, 32768)) 327ms, (without shift
> dimension order: 358ms)
> Pytables ( chunshape = (24, 65536)): 270ms, (without shift dimension
> order: 255ms)
> Pytables ( chunshape = (1, 65535)): 328ms

That 'bad' performance of Pytables regarding numpy.memmap is kind of 
expected.  You should be aware that HDF5 is quite more complex (but more 
featured too) than memmap, so the overhead is significant, specially in 
this case where you are using a chunked dataset (CArray) for doing the 
comparison.

It is also important the fact that you are benchmarking with datasets 
that fits well into OS filesystem cache (for example, when using a 
machine with > 1 GB of RAM), and in this case the disk is touched very 
little.  But as soon as your datasets exceeds the amount of your 
available memory, the performance of memmap and HDF5 would become much 
closer.

In case you are interested only in situations where your datasets fits 
well in-memory, probably you will get much better results if you use a 
non-chunked dataset (i.e. a plain Array) in PyTables.  But still, if you 
are expecting PyTables to be faster than the much simpler memmap 
approach, then I'm going to disappoint you: that simply will not happen 
(unless your data is very compressible, but that's not your case).

> Calculate expr on the whole array:
> 
> Memmap: 1.4~1.8s (without shift dimension order: 1.4~1.6s)
> Pytables (automatic chunshape = (1, 32768)): 9.4s, (without shift
> dimension order: 14s)
> Pytables ( chunshape = (24, 65536)): 16s, (without shift dimension
> order: 9s)
> Pytables ( chunshape = (1, 65535)): 13s
> 
> Should I change some default parameters such as buffersize, etc, to
> improve the performance?

No, I don't think you are able to get much more performance out or 
tables.Expr.  But, mind you, you are not comparing apples with apples 
here.  numpy.memmap and tables.Expr are paradigms for performing out-of-
core computations (i.e. computations with operands that do not fit in 
memory).  But in your example, for the numpy.memmap case, you are 
loading everything in-memory and then calling numexpr for performing 
operations, while you are using tables.Expr for doing the same 
operations but *on-disk* and hence the big difference in performance.

In order to compare apples with apples my advice is to use 
tables.Array+numexpr if you want to compare it with numpy.memmap+numexpr 
for an in-memory paradigm.  Or, if you really want to do out-of-core 
computations, then use numpy.memmap and perform operations directly on-
disk (i.e. without using numexpr).  See:

https://portal.g-node.org/python-
autumnschool/_media/materials/starving_cpus/poly2.py

for an example of apples-with-apples comparison for out-of-core 
computations.

BTW, I've seen that you are still using numexpr 1.3; you may want to use 
1.4 instead, where I've implemented multi-threading sometime ago.  That 
might boost your computations quite a bit.

> By the way, after I change shape to (24,3e6), the pytables.Expr
> returns an Error:
> The error was --> <type 'exceptions.AttributeError'>: 'Expr' object
> has no attribute 'BUFFERTIMES'.

Hmm, that should be a bug.  Could you send a small self-contained 
example so that I can fix that?

> I think this is because you have not updated the expression.py for
> new ‘BUFFER_TIMES’ parameter? So I add:
> from tables.parameters import BUFFER_TIMES
> change self.BUFFERTIMES to BUFFER_TIMES
> 
> I hope this is correct.

Hope that helps,

-- 
Francesc Alted

Re: [Pytables-users] Can pytables be as fast as numpy.memmap?

From: Josh A. <jos...@gm...> - 2010-10-15 15:58:50

>
>
> BTW, I've seen that you are still using numexpr 1.3; you may want to use
> 1.4 instead, where I've implemented multi-threading sometime ago.  That
> might boost your computations quite a bit.
>

Any chance you could release a Python 2.7, Win32 binary for numexpr 1.4?
I'm still using 1.3.1 as well, since that's the latest available for Python
2.7.

Re: [Pytables-users] Can pytables be as fast as numpy.memmap?

From: Francesc A. <fa...@py...> - 2010-10-18 09:06:18

A Friday 15 October 2010 17:58:41 Josh Ayers escrigué:
> > BTW, I've seen that you are still using numexpr 1.3; you may want
> > to use 1.4 instead, where I've implemented multi-threading
> > sometime ago.  That might boost your computations quite a bit.
> 
> Any chance you could release a Python 2.7, Win32 binary for numexpr
> 1.4? I'm still using 1.3.1 as well, since that's the latest
> available for Python 2.7.

I'll provide one for forthcoming Numexpr 1.4.1, although provided the 
excellent work that Christoph Golke is doing lately building binaries 
for common scientific libraries for Windows:

http://www.lfd.uci.edu/~gohlke/pythonlibs/

I don't know if there is any point in building Windows binaries myself 
anymore :-)

-- 
Francesc Alted

[Pytables-users] possible bug: 'Expr' object, has no attribute 'BUFFERTIMES'

From: braingateway <bra...@gm...> - 2010-10-19 22:47:46

Attachments: ExprbugInPytalbes(2.2).txt pytableExprBug1.py exceptions.py

Hi Francesc,

Sorry for the delay. I do not have time to work this last few days. Now, 
I attached a example and a possible fixation.

LittleBigBrain

Francesc Alted:

>> By the way, after I change shape to (24,3e6), the pytables.Expr
>> returns an Error:
>> The error was --> <type 'exceptions.AttributeError'>: 'Expr' object
>> has no attribute 'BUFFERTIMES'.
>>     
>
> Hmm, that should be a bug.  Could you send a small self-contained 
> example so that I can fix that?
>
>   
>> I think this is because you have not updated the expression.py for
>> new ‘BUFFER_TIMES’ parameter? So I add:
>> from tables.parameters import BUFFER_TIMES
>> change self.BUFFERTIMES to BUFFER_TIMES
>>
>> I hope this is correct.
>>     
>
> Hope that helps,
>
>

Re: [Pytables-users] possible bug: 'Expr' object, has no attribute 'BUFFERTIMES'

From: Francesc A. <fa...@py...> - 2010-10-21 08:04:55

A Wednesday 20 October 2010 00:47:32 braingateway escrigué:
> Hi Francesc,
> 
> Sorry for the delay. I do not have time to work this last few days.
> Now, I attached a example and a possible fixation.
> 
> LittleBigBrain
> 
> Francesc Alted:
> >> By the way, after I change shape to (24,3e6), the pytables.Expr
> >> returns an Error:
> >> The error was --> <type 'exceptions.AttributeError'>: 'Expr'
> >> object has no attribute 'BUFFERTIMES'.
> > 
> > Hmm, that should be a bug.  Could you send a small self-contained
> > example so that I can fix that?
> > 
> >> I think this is because you have not updated the expression.py for
> >> new ‘BUFFER_TIMES’ parameter? So I add:
> >> from tables.parameters import BUFFER_TIMES
> >> change self.BUFFERTIMES to BUFFER_TIMES
> >> 
> >> I hope this is correct.
> > 
> > Hope that helps,

Thanks.  This has been fixed.  For details see:

http://www.pytables.org/trac/ticket/300

-- 
Francesc Alted