pytables-users Mailing List for PyTables - Hierarchical datasets (Page 17)

Brought to you by: a_valentino, falted, ivilata, joshmoore

pytables-users — PyTables users discussion list

You can subscribe to this list here.

2002	Jan	Feb	Mar	Apr	May	Jun	Jul	Aug	Sep	Oct	Nov (5)	Dec
2003	Jan	Feb (2)	Mar	Apr (5)	May (11)	Jun (7)	Jul (18)	Aug (5)	Sep (15)	Oct (4)	Nov (1)	Dec (4)
2004	Jan (5)	Feb (2)	Mar (5)	Apr (8)	May (8)	Jun (10)	Jul (4)	Aug (4)	Sep (20)	Oct (11)	Nov (31)	Dec (41)
2005	Jan (79)	Feb (22)	Mar (14)	Apr (17)	May (35)	Jun (24)	Jul (26)	Aug (9)	Sep (57)	Oct (64)	Nov (25)	Dec (37)
2006	Jan (76)	Feb (24)	Mar (79)	Apr (44)	May (33)	Jun (12)	Jul (15)	Aug (40)	Sep (17)	Oct (21)	Nov (46)	Dec (23)
2007	Jan (18)	Feb (25)	Mar (41)	Apr (66)	May (18)	Jun (29)	Jul (40)	Aug (32)	Sep (34)	Oct (17)	Nov (46)	Dec (17)
2008	Jan (17)	Feb (42)	Mar (23)	Apr (11)	May (65)	Jun (28)	Jul (28)	Aug (16)	Sep (24)	Oct (33)	Nov (16)	Dec (5)
2009	Jan (19)	Feb (25)	Mar (11)	Apr (32)	May (62)	Jun (28)	Jul (61)	Aug (20)	Sep (61)	Oct (11)	Nov (14)	Dec (53)
2010	Jan (17)	Feb (31)	Mar (39)	Apr (43)	May (49)	Jun (47)	Jul (35)	Aug (58)	Sep (55)	Oct (91)	Nov (77)	Dec (63)
2011	Jan (50)	Feb (30)	Mar (67)	Apr (31)	May (17)	Jun (83)	Jul (17)	Aug (33)	Sep (35)	Oct (19)	Nov (29)	Dec (26)
2012	Jan (53)	Feb (22)	Mar (118)	Apr (45)	May (28)	Jun (71)	Jul (87)	Aug (55)	Sep (30)	Oct (73)	Nov (41)	Dec (28)
2013	Jan (19)	Feb (30)	Mar (14)	Apr (63)	May (20)	Jun (59)	Jul (40)	Aug (33)	Sep (1)	Oct	Nov	Dec

Flat | Threaded

<< < 1 .. 15 16 17 18 19 .. 165 > >> (Page 17 of 165)

Re: [Pytables-users] PyTables hangs while opening file in worker process

From: Anthony S. <sc...@gm...> - 2012-10-11 16:12:29

Hmm sorry to hear that Owen.... Let me know how it goes.

On Thu, Oct 11, 2012 at 11:07 AM, Owen Mackwood <
owe...@bc...> wrote:

> Hi Anthony,
>
> I tried your suggestion and it has not solved the problem. It could be
> that it makes the problem go away in the test code because it changes the
> timing of the processes. I'll see if I can modify the test code to
> reproduce the hang even with reloading the tables module.
>
> Regards,
> Owen
>
>
> On 10 October 2012 22:00, Anthony Scopatz <sc...@gm...> wrote:
>
>> So Owen,
>>
>> I am still not sure what the underlying problem is, but I altered your
>> parallel function to forciably reload pytables each time it is called.
>>  This seemed to work perfectly on my larger system but not at all on my
>> smaller one.  If there is a way that you can isolate pytables and not
>> import it globally at all, it might work even better.  Below is the code
>> snippet.  I hope this helps.
>>
>> Be Well
>> Anthony
>>
>> def run_simulation_single((paramspace_pt, params)):
>>     import sys
>>     rmkeys = [key for key in sys.modules if key.startswith('tables')]
>>     for key in rmkeys:
>>         del sys.modules[key]
>>     import traceback
>>     import tables
>>     try:
>>         filename = params['results_file']
>>
>>
>
>
> ------------------------------------------------------------------------------
> Don't let slow site performance ruin your business. Deploy New Relic APM
> Deploy New Relic app performance management and know exactly
> what is happening inside your Ruby, Python, PHP, Java, and .NET app
> Try New Relic at no cost today and get our sweet Data Nerd shirt too!
> http://p.sf.net/sfu/newrelic-dev2dev
> _______________________________________________
> Pytables-users mailing list
> Pyt...@li...
> https://lists.sourceforge.net/lists/listinfo/pytables-users
>
>

Re: [Pytables-users] PyTables hangs while opening file in worker process

From: Owen M. <owe...@bc...> - 2012-10-11 16:07:27

Hi Anthony,

I tried your suggestion and it has not solved the problem. It could be that
it makes the problem go away in the test code because it changes the timing
of the processes. I'll see if I can modify the test code to reproduce the
hang even with reloading the tables module.

Regards,
Owen

On 10 October 2012 22:00, Anthony Scopatz <sc...@gm...> wrote:

> So Owen,
>
> I am still not sure what the underlying problem is, but I altered your
> parallel function to forciably reload pytables each time it is called.
>  This seemed to work perfectly on my larger system but not at all on my
> smaller one.  If there is a way that you can isolate pytables and not
> import it globally at all, it might work even better.  Below is the code
> snippet.  I hope this helps.
>
> Be Well
> Anthony
>
> def run_simulation_single((paramspace_pt, params)):
>     import sys
>     rmkeys = [key for key in sys.modules if key.startswith('tables')]
>     for key in rmkeys:
>         del sys.modules[key]
>     import traceback
>     import tables
>     try:
>         filename = params['results_file']
>
>

Re: [Pytables-users] PyTables hangs while opening file in worker process

From: Anthony S. <sc...@gm...> - 2012-10-10 20:01:18

So Owen,

I am still not sure what the underlying problem is, but I altered your
parallel function to forciably reload pytables each time it is called.
 This seemed to work perfectly on my larger system but not at all on my
smaller one.  If there is a way that you can isolate pytables and not
import it globally at all, it might work even better.  Below is the code
snippet.  I hope this helps.

Be Well
Anthony

def run_simulation_single((paramspace_pt, params)):
    import sys
    rmkeys = [key for key in sys.modules if key.startswith('tables')]
    for key in rmkeys:
        del sys.modules[key]
    import traceback
    import tables
    try:
        filename = params['results_file']


On Wed, Oct 10, 2012 at 2:06 PM, Owen Mackwood <owe...@bc...
> wrote:

> On 10 October 2012 20:08, Anthony Scopatz <sc...@gm...> wrote:
>
>> So just to confirm this behavior, having run your sample on a couple of
>> my machines, what you see is that the code looks like it gets all the way
>> to the end, and then it stalls right before it is about to exit, leaving
>> some small number of processes (here names python tables_test.py) in the
>> OS.  Is this correct?
>>
>
> More or less. What's really happening is that if your processor pool has N
> processes, then each time one of the workers hangs the pool will have N-1
> processes running thereafter. Eventually when all the tasks have completed
> (or all workers are hung, something that has happened to me when processing
> many tasks), the main process will just block waiting for the hung
> processes.
>
> If you're running Linux, when the test is finished and the main process is
> still waiting on the hung processes, you can just kill the main process.
> The orphaned processes that are still there afterward are the ones of
> interest.
>
>
>> It seems to be the case that these failures do not happen when I set the
>> processor pool size to be less than or equal to the number of processors
>> (physical or hyperthreaded) that I have on the machine.  I was testing this
>> both on an 32 proc cluster and my dual core laptop.  Is this also
>> the behavior you have seen?
>>
>
> No, I've never noticed that to be the case. It appears that the greater
> the true parallelism (ie - physical cores on which there are workers
> executing in parallel) the greater the odds of there being a hang. I don't
> have any real proof of this though; as with most concurrency bugs, it's
> tough to be certain of anything.
>
> Regards,
> Owen
>
>
> ------------------------------------------------------------------------------
> Don't let slow site performance ruin your business. Deploy New Relic APM
> Deploy New Relic app performance management and know exactly
> what is happening inside your Ruby, Python, PHP, Java, and .NET app
> Try New Relic at no cost today and get our sweet Data Nerd shirt too!
> http://p.sf.net/sfu/newrelic-dev2dev
> _______________________________________________
> Pytables-users mailing list
> Pyt...@li...
> https://lists.sourceforge.net/lists/listinfo/pytables-users
>
>

Re: [Pytables-users] PyTables hangs while opening file in worker process

From: Owen M. <owe...@bc...> - 2012-10-10 19:06:32

On 10 October 2012 20:08, Anthony Scopatz <sc...@gm...> wrote:

> So just to confirm this behavior, having run your sample on a couple of my
> machines, what you see is that the code looks like it gets all the way to
> the end, and then it stalls right before it is about to exit, leaving some
> small number of processes (here names python tables_test.py) in the OS.  Is
> this correct?
>

More or less. What's really happening is that if your processor pool has N
processes, then each time one of the workers hangs the pool will have N-1
processes running thereafter. Eventually when all the tasks have completed
(or all workers are hung, something that has happened to me when processing
many tasks), the main process will just block waiting for the hung
processes.

If you're running Linux, when the test is finished and the main process is
still waiting on the hung processes, you can just kill the main process.
The orphaned processes that are still there afterward are the ones of
interest.

> It seems to be the case that these failures do not happen when I set the
> processor pool size to be less than or equal to the number of processors
> (physical or hyperthreaded) that I have on the machine.  I was testing this
> both on an 32 proc cluster and my dual core laptop.  Is this also
> the behavior you have seen?
>

No, I've never noticed that to be the case. It appears that the greater the
true parallelism (ie - physical cores on which there are workers executing
in parallel) the greater the odds of there being a hang. I don't have any
real proof of this though; as with most concurrency bugs, it's tough to be
certain of anything.

Regards,
Owen

Re: [Pytables-users] PyTables hangs while opening file in worker process

From: Anthony S. <sc...@gm...> - 2012-10-10 18:08:53

Hi Owen,

So just to confirm this behavior, having run your sample on a couple of my
machines, what you see is that the code looks like it gets all the way to
the end, and then it stalls right before it is about to exit, leaving some
small number of processes (here names python tables_test.py) in the OS.  Is
this correct?

It seems to be the case that these failures do not happen when I set the
processor pool size to be less than or equal to the number of processors
(physical or hyperthreaded) that I have on the machine.  I was testing this
both on an 32 proc cluster and my dual core laptop.  Is this also
the behavior you have seen?

Be Well
Anthony

On Tue, Oct 9, 2012 at 8:08 AM, Owen Mackwood
<owe...@bc...>wrote:

> Hi Anthony,
>
> I've created a reduced example which reproduces the error. I suppose the
> more processes you can run in parallel the more likely it is you'll see the
> hang. On a machine with 8 cores, I see 5-6 processes hang out of 2000.
>
> All of the hung tasks had a call stack that looked like this:
>
> #0  0x00007fc8ecfd01fc in pthread_cond_wait@@GLIBC_2.3.2 () from
> /lib/libpthread.so.0
> #1  0x00007fc8ebd9d215 in H5TS_mutex_lock () from /usr/lib/libhdf5.so.6
> #2  0x00007fc8ebaacff0 in H5open () from /usr/lib/libhdf5.so.6
> #3  0x00007fc8e224c6a4 in __pyx_pf_6tables_13hdf5Extension_4File__g_new
> (__pyx_v_self=0x28b35a0, __pyx_args=<value optimized out>,
> __pyx_kwds=<value optimized out>) at tables/hdf5Extension.c:2820
> #4  0x00000000004abf62 in ext_do_call (f=0x271f4c0, throwflag=<value
> optimized out>) at Python/ceval.c:4331
> #5  PyEval_EvalFrameEx (f=0x271f4c0, throwflag=<value optimized out>) at
> Python/ceval.c:2705
> #6  0x00000000004ada51 in PyEval_EvalCodeEx (co=0x247aeb0, globals=<value
> optimized out>, locals=<value optimized out>, args=0x288cea0, argcount=0,
> kws=<value optimized out>, kwcount=0,
>     defs=0x25ffd78, defcount=4, closure=0x0) at Python/ceval.c:3253
>
> I've attached the code to reproduce this. It probably isn't quite minimal,
> but it is reasonably simple (and stereotypical of the kind of operations I
> use). Let me know if you need anything else, or have questions about my
> code.
>
> Regards,
> Owen
>
>
>
> On 8 October 2012 17:37, Anthony Scopatz <sc...@gm...> wrote:
>
>> Hello Owen,
>>
>> So __getitem__() calls read() on the items it needs.  Both should return
>> a copy in-memory of the data that is on disk.
>>
>> Frankly, I am not really sure what is going on, given what you have said.
>>  A minimal example which reproduces the error would be really helpful.
>>  From the error that you have provided, though, the only thing that I can
>> think of is that it is related to file opening on the worker processes.
>>
>> Be Well
>> Anthony
>>
>>
>
>
> ------------------------------------------------------------------------------
> Don't let slow site performance ruin your business. Deploy New Relic APM
> Deploy New Relic app performance management and know exactly
> what is happening inside your Ruby, Python, PHP, Java, and .NET app
> Try New Relic at no cost today and get our sweet Data Nerd shirt too!
> http://p.sf.net/sfu/newrelic-dev2dev
> _______________________________________________
> Pytables-users mailing list
> Pyt...@li...
> https://lists.sourceforge.net/lists/listinfo/pytables-users
>
>

Re: [Pytables-users] Installation test failed: ImportError

From: Anthony S. <sc...@gm...> - 2012-10-08 21:58:51

Hello John,

You probably installed globally and are trying to test locally.  Either
leave off the Pythonpath or try testing from a location other than the root
pytables dir.

Be Well
Anthony

On Mon, Oct 8, 2012 at 4:23 PM, Dickson, John Robert <
Joh...@hm...> wrote:

> Hello,
>
> I am trying to install PyTables, but when testing it with the command:
>
> env PYTHONPATH=. python -c "import tables; tables.test()"
>
> It returned the following:
>
> Traceback (most recent call last):
>   File "<string>", line 1, in <module>
>   File "tables/__init__.py", line 30, in <module>
>     from tables.utilsExtension import getPyTablesVersion, getHDF5Version
> ImportError: No module named utilsExtension
>
> I am using Mac OS X 10.8.2.
>
> Please let me know if you need any additional information.
>
> I would appreciate any suggestions on what the problem may be and how to
> correct it.
>
> Thanks,
> John
>
> ------------------------------------------------------------------------------
> Don't let slow site performance ruin your business. Deploy New Relic APM
> Deploy New Relic app performance management and know exactly
> what is happening inside your Ruby, Python, PHP, Java, and .NET app
> Try New Relic at no cost today and get our sweet Data Nerd shirt too!
> http://p.sf.net/sfu/newrelic-dev2dev
> _______________________________________________
> Pytables-users mailing list
> Pyt...@li...
> https://lists.sourceforge.net/lists/listinfo/pytables-users
>

[Pytables-users] Installation test failed: ImportError

From: Dickson, J. R. <Joh...@hm...> - 2012-10-08 20:57:47

Hello,

I am trying to install PyTables, but when testing it with the command:  

env PYTHONPATH=. python -c "import tables; tables.test()"

It returned the following:  

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "tables/__init__.py", line 30, in <module>
    from tables.utilsExtension import getPyTablesVersion, getHDF5Version
ImportError: No module named utilsExtension

I am using Mac OS X 10.8.2.

Please let me know if you need any additional information.

I would appreciate any suggestions on what the problem may be and how to correct it.

Thanks,
John

Re: [Pytables-users] PyTables hangs while opening file in worker process

From: Anthony S. <sc...@gm...> - 2012-10-08 15:38:07

On Mon, Oct 8, 2012 at 11:19 AM, Owen Mackwood <owe...@bc...
> wrote:

> Hi Anthony,
>
> On 8 October 2012 15:54, Anthony Scopatz <sc...@gm...> wrote:
>
>> Hmmm, Are you actually copying the data (f.root.data[:])  or are you
>> simply passing a reference as arguments (f.root.data)?
>>
>
> I call f.root.data.read() on any arrays to load them into the process
> target args dictionary. I had assumed this returns a copy of the data. The
> documentation doesn't specify which, or even if there is any difference
> from __getitem__.
>
> So if you are opening a file in the master process and then
>> writing/creating/flushing from the workers this may cause a problem.
>>  Multiprocess creates a fork of the original process so you are relying on
>> the file handle from the master process to not accidentally change somehow.
>>  Can you try to open the files in the workers rather than the master?  I
>> hope that this clears up the issue.
>>
>
> I am not accessing the master file from the worker processes. At least not
> by design, though as you say some kind of strange behaviour could be
> arising due to the copy-on-fork of Linux. In principle, each process has
> its own file and there is no sharing of files between processes.
>
>
>> Basically, I am advocating a more conservative approach where all data
>> that is read or written to in a worker must come from that worker, rather
>> than being generated by the master.  If you are *still* experiencing
>> these problems, then we know we have a real problem.
>>
>
> I'm being about as conservative as can be with my system. Unless read()
> returns a reference to the master file there should be absolutely no
> sharing between processes. And even if my args dictionary contains a
> reference to the in-memory HDF5 file, how could reading it possibly trigger
> a call to openFile?
>
> Can you clarify the semantics of read() vs. __getitem__()? Thanks.


Hello Owen,

So __getitem__() calls read() on the items it needs.  Both should return a
copy in-memory of the data that is on disk.

Frankly, I am not really sure what is going on, given what you have said.
 A minimal example which reproduces the error would be really helpful.
 From the error that you have provided, though, the only thing that I can
think of is that it is related to file opening on the worker processes.

Be Well
Anthony


>
> Regards,
> Owen
>
>
> ------------------------------------------------------------------------------
> Don't let slow site performance ruin your business. Deploy New Relic APM
> Deploy New Relic app performance management and know exactly
> what is happening inside your Ruby, Python, PHP, Java, and .NET app
> Try New Relic at no cost today and get our sweet Data Nerd shirt too!
> http://p.sf.net/sfu/newrelic-dev2dev
> _______________________________________________
> Pytables-users mailing list
> Pyt...@li...
> https://lists.sourceforge.net/lists/listinfo/pytables-users
>
>

Re: [Pytables-users] PyTables hangs while opening file in worker process

From: Owen M. <owe...@bc...> - 2012-10-08 15:19:26

Hi Anthony,

On 8 October 2012 15:54, Anthony Scopatz <sc...@gm...> wrote:

> Hmmm, Are you actually copying the data (f.root.data[:])  or are you
> simply passing a reference as arguments (f.root.data)?
>

I call f.root.data.read() on any arrays to load them into the process
target args dictionary. I had assumed this returns a copy of the data. The
documentation doesn't specify which, or even if there is any difference
from __getitem__.

So if you are opening a file in the master process and then
> writing/creating/flushing from the workers this may cause a problem.
>  Multiprocess creates a fork of the original process so you are relying on
> the file handle from the master process to not accidentally change somehow.
>  Can you try to open the files in the workers rather than the master?  I
> hope that this clears up the issue.
>

I am not accessing the master file from the worker processes. At least not
by design, though as you say some kind of strange behaviour could be
arising due to the copy-on-fork of Linux. In principle, each process has
its own file and there is no sharing of files between processes.

> Basically, I am advocating a more conservative approach where all data
> that is read or written to in a worker must come from that worker, rather
> than being generated by the master.  If you are *still* experiencing
> these problems, then we know we have a real problem.
>

I'm being about as conservative as can be with my system. Unless read()
returns a reference to the master file there should be absolutely no
sharing between processes. And even if my args dictionary contains a
reference to the in-memory HDF5 file, how could reading it possibly trigger
a call to openFile?

Can you clarify the semantics of read() vs. __getitem__()? Thanks.

Regards,
Owen

Re: [Pytables-users] PyTables hangs while opening file in worker process

From: Anthony S. <sc...@gm...> - 2012-10-08 13:55:23

On Mon, Oct 8, 2012 at 5:13 AM, Owen Mackwood
<owe...@bc...>wrote:

> Hi Anthony,
>
> There is a single multiprocessing.Pool which usually has 6-8 processes,
> each of which is used to run a single task, after which a new process is
> created for the next task (maxtasksperchild=1 for the Pool constructor).
> There is a master process that regularly opens an HDF5 file to read out
> information for the worker processes (data that gets copied into a
> dictionary and passed as args to the worker's target function). There are
> no problems with the master process, it never hangs.
>

Hello Owen,

Hmmm, Are you actually copying the data (f.root.data[:])  or are you simply
passing a reference as arguments (f.root.data)?


> The failure appears to be random, affecting less than 2% of my tasks (all
> tasks are highly similar and should call the same tables functions in the
> same order). This is running on Debian Squeeze, Python 2.7.3, PyTables
> 2.4.0. As far as the particular function that hangs... tough to say since I
> haven't yet been able to properly debug the issue. The interpreter hangs
> which limits my ability to diagnose the source of the problem. I call a
> number of functions in the tables module from the worker process, including
> openFile, createVLArray, createCArray, createGroup, flush, and of course
> close.
>

So if you are opening a file in the master process and then
writing/creating/flushing from the workers this may cause a problem.
 Multiprocess creates a fork of the original process so you are relying on
the file handle from the master process to not accidentally change somehow.
 Can you try to open the files in the workers rather than the master?  I
hope that this clears up the issue.

Basically, I am advocating a more conservative approach where all data that
is read or written to in a worker must come from that worker, rather than
being generated by the master.  If you are *still* experiencing these
problems, then we know we have a real problem.

Also if this doesn't fix it, if you could send us a small sample module
which reproduces this issue, that would be great too!

Be Well
Anthony


>
> I'll continue to try and find out more about when and how the hang occurs.
> I have to rebuild Python to allow the gdb pystack macro to work. If you
> have any suggestions for me, I'd love to hear them.
>
> Regards,
> Owen
>
>
> On 7 October 2012 00:28, Anthony Scopatz <sc...@gm...> wrote:
>
>> Hi Owen,
>>
>> How many pools do you have?  Is this a random runtime failure?  What kind
>> of system is this one?  Is there some particular fucntion in Python that
>> you are running?  (It seems to be openFile(), but I can't be sure...)  The
>> error is definitely happening down in the H5open() routine.  Now whether
>> this is HDF5's fault or ours, I am not yet sure.
>>
>> Be Well
>> Anthony
>>
>>
>
>
> ------------------------------------------------------------------------------
> Don't let slow site performance ruin your business. Deploy New Relic APM
> Deploy New Relic app performance management and know exactly
> what is happening inside your Ruby, Python, PHP, Java, and .NET app
> Try New Relic at no cost today and get our sweet Data Nerd shirt too!
> http://p.sf.net/sfu/newrelic-dev2dev
> _______________________________________________
> Pytables-users mailing list
> Pyt...@li...
> https://lists.sourceforge.net/lists/listinfo/pytables-users
>
>

Re: [Pytables-users] PyTables hangs while opening file in worker process

From: Owen M. <owe...@bc...> - 2012-10-08 09:13:47

Hi Anthony,

There is a single multiprocessing.Pool which usually has 6-8 processes,
each of which is used to run a single task, after which a new process is
created for the next task (maxtasksperchild=1 for the Pool constructor).
There is a master process that regularly opens an HDF5 file to read out
information for the worker processes (data that gets copied into a
dictionary and passed as args to the worker's target function). There are
no problems with the master process, it never hangs.

The failure appears to be random, affecting less than 2% of my tasks (all
tasks are highly similar and should call the same tables functions in the
same order). This is running on Debian Squeeze, Python 2.7.3, PyTables
2.4.0. As far as the particular function that hangs... tough to say since I
haven't yet been able to properly debug the issue. The interpreter hangs
which limits my ability to diagnose the source of the problem. I call a
number of functions in the tables module from the worker process, including
openFile, createVLArray, createCArray, createGroup, flush, and of course
close.

I'll continue to try and find out more about when and how the hang occurs.
I have to rebuild Python to allow the gdb pystack macro to work. If you
have any suggestions for me, I'd love to hear them.

Regards,
Owen

On 7 October 2012 00:28, Anthony Scopatz <sc...@gm...> wrote:

> Hi Owen,
>
> How many pools do you have?  Is this a random runtime failure?  What kind
> of system is this one?  Is there some particular fucntion in Python that
> you are running?  (It seems to be openFile(), but I can't be sure...)  The
> error is definitely happening down in the H5open() routine.  Now whether
> this is HDF5's fault or ours, I am not yet sure.
>
> Be Well
> Anthony
>
>

Re: [Pytables-users] PyTables hangs while opening file in worker process

From: Anthony S. <sc...@gm...> - 2012-10-06 22:29:00

Hi Owen,

How many pools do you have?  Is this a random runtime failure?  What kind
of system is this one?  Is there some particular fucntion in Python that
you are running?  (It seems to be openFile(), but I can't be sure...)  The
error is definitely happening down in the H5open() routine.  Now whether
this is HDF5's fault or ours, I am not yet sure.

Be Well
Anthony

On Sat, Oct 6, 2012 at 4:56 AM, Owen Mackwood
<owe...@bc...>wrote:

> Hi Anthony,
>
> I'm not trying to write in parallel. Each worker process has its own file
> to write to. After all tasks are completed, I collect the results in the
> master process. So the problem I'm seeing (a hang in the worker process)
> shouldn't have anything to do with parallel writes. Do you have any other
> suggestions?
>
> Regards,
> Owen
>
> On 5 October 2012 18:38, Anthony Scopatz <sc...@gm...> wrote:
>
>> Hello Owen,
>>
>> While you can use process pools to read from a file in parallel just
>> fine, writing is another story completely.  While HDF5 itself supports
>> parallel writing though MPI, this comes at the high cost of compression no
>> longer being available and a much more complicated code base.  So for the
>> time being, PyTables only supports the serial HDF5 library.
>>
>> Therefore if you want to write to a file in parallel, you adopt a
>> strategy where you have one process which is responsible for all of the
>> writing and all other processes send their data to this process instead of
>> writing to file directly.  This is a very effective way
>> of accomplishing basically what you need.  In fact, we have an example to
>> do just that [1].  (As a side note: HDF5 may soon be adding an API for
>> exactly this pattern because it comes up so often.)
>>
>> So if I were you, I would look at [1] and adopt it to my use case.
>>
>> Be Well
>> Anthony
>>
>> 1.
>> https://github.com/PyTables/PyTables/blob/develop/examples/multiprocess_access_queues.py
>>
>> On Fri, Oct 5, 2012 at 9:55 AM, Owen Mackwood <
>> owe...@bc...> wrote:
>>
>>> Hello,
>>>
>>> I'm using a multiprocessing.Pool to parallelize a set of tasks which
>>> record their results into separate hdf5 files. Occasionally (less than 2%
>>> of the time) the worker process will hang. According to gdb, the problem
>>> occurs while opening the hdf5 file, when it attempts to obtain the
>>> associated mutex. Here's part of the backtrace:
>>>
>>> #0  0x00007fb2ceaa716c in pthread_cond_wait@@GLIBC_2.3.2 () from
>>> /lib/libpthread.so.0
>>> #1  0x00007fb2be61c215 in H5TS_mutex_lock () from /usr/lib/libhdf5.so.6
>>> #2  0x00007fb2be32bff0 in H5open () from /usr/lib/libhdf5.so.6
>>> #3  0x00007fb2b96226a4 in __pyx_pf_6tables_13hdf5Extension_4File__g_new
>>> (__pyx_v_self=0x7fb2b04867d0, __pyx_args=<value optimized out>,
>>> __pyx_kwds=<value optimized out>)
>>>     at tables/hdf5Extension.c:2820
>>> #4  0x00000000004abf62 in ext_do_call (f=0x4cb2430, throwflag=<value
>>> optimized out>) at Python/ceval.c:4331
>>>
>>> Nothing else is trying to open this file, so can someone suggest why
>>> this is occurring? This is a very annoying problem as there is no way to
>>> recover from this error, and consequently the worker process is permanently
>>> occupied, which effectively removes one of my processors from the pool.
>>>
>>> Regards,
>>> Owen Mackwood
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> Don't let slow site performance ruin your business. Deploy New Relic APM
>>> Deploy New Relic app performance management and know exactly
>>> what is happening inside your Ruby, Python, PHP, Java, and .NET app
>>> Try New Relic at no cost today and get our sweet Data Nerd shirt too!
>>> http://p.sf.net/sfu/newrelic-dev2dev
>>> _______________________________________________
>>> Pytables-users mailing list
>>> Pyt...@li...
>>> https://lists.sourceforge.net/lists/listinfo/pytables-users
>>>
>>>
>>
>>
>> ------------------------------------------------------------------------------
>> Don't let slow site performance ruin your business. Deploy New Relic APM
>> Deploy New Relic app performance management and know exactly
>> what is happening inside your Ruby, Python, PHP, Java, and .NET app
>> Try New Relic at no cost today and get our sweet Data Nerd shirt too!
>> http://p.sf.net/sfu/newrelic-dev2dev
>> _______________________________________________
>> Pytables-users mailing list
>> Pyt...@li...
>> https://lists.sourceforge.net/lists/listinfo/pytables-users
>>
>>
>
>
> ------------------------------------------------------------------------------
> Don't let slow site performance ruin your business. Deploy New Relic APM
> Deploy New Relic app performance management and know exactly
> what is happening inside your Ruby, Python, PHP, Java, and .NET app
> Try New Relic at no cost today and get our sweet Data Nerd shirt too!
> http://p.sf.net/sfu/newrelic-dev2dev
> _______________________________________________
> Pytables-users mailing list
> Pyt...@li...
> https://lists.sourceforge.net/lists/listinfo/pytables-users
>
>

Re: [Pytables-users] EArray

From: Andre' Walker-L. <wal...@gm...> - 2012-10-06 22:26:47

Hi Anthony,

> You can use tuple addition to accomplish what you want:
> 
> (0,) + data.shape == (0,256,1,2)
> 
> Be Well
> Anthony

Thanks!  I knew there had to be a better way.


Cheers,

Andre





> 
> On Sat, Oct 6, 2012 at 12:42 PM, Andre' Walker-Loud <wal...@gm...> wrote:
> Hi All,
> 
> I have a bunch of hdf5 files I am using to create one hdf5 file.
> Each individual file has many different pieces of data, and they are all the same shape in each file.
> 
> I am using createEArray to make the large array in the final file.
> 
> if the data files in the individual h5 files are of shape (256,1,2), then I have to use
> 
> createEArray('/path/','name',tables.floatAtom64(),(0,256,1,2),expectedrows=len(data_files))
> 
> if the np array I have grabbed from an individual file to append to my EArray is defined as data, is there a way to use data.shape to create the shape of my EArray?
> 
> In spirit, I want to do something like (0,data.shape)  but this does not work.
> I have been scouring the numpy manual to see how to convert
> 
> data.shape
>  (256,1,2)
> 
> to (0,256,1,2)
> 
> but failed to figure this out (if I don't know ahead of time the shape of data - in which case I can manually reshape).
> 
> 
> Thanks,
> 
> Andre
> 
> 
> ------------------------------------------------------------------------------
> Don't let slow site performance ruin your business. Deploy New Relic APM
> Deploy New Relic app performance management and know exactly
> what is happening inside your Ruby, Python, PHP, Java, and .NET app
> Try New Relic at no cost today and get our sweet Data Nerd shirt too!
> http://p.sf.net/sfu/newrelic-dev2dev
> _______________________________________________
> Pytables-users mailing list
> Pyt...@li...
> https://lists.sourceforge.net/lists/listinfo/pytables-users
> 
> ------------------------------------------------------------------------------
> Don't let slow site performance ruin your business. Deploy New Relic APM
> Deploy New Relic app performance management and know exactly
> what is happening inside your Ruby, Python, PHP, Java, and .NET app
> Try New Relic at no cost today and get our sweet Data Nerd shirt too!
> http://p.sf.net/sfu/newrelic-dev2dev_______________________________________________
> Pytables-users mailing list
> Pyt...@li...
> https://lists.sourceforge.net/lists/listinfo/pytables-users

Re: [Pytables-users] EArray

From: Anthony S. <sc...@gm...> - 2012-10-06 22:24:21

Hi Andre,

You can use tuple addition to accomplish what you want:

(0,) + data.shape == (0,256,1,2)

Be Well
Anthony

On Sat, Oct 6, 2012 at 12:42 PM, Andre' Walker-Loud <wal...@gm...>wrote:

> Hi All,
>
> I have a bunch of hdf5 files I am using to create one hdf5 file.
> Each individual file has many different pieces of data, and they are all
> the same shape in each file.
>
> I am using createEArray to make the large array in the final file.
>
> if the data files in the individual h5 files are of shape (256,1,2), then
> I have to use
>
>
> createEArray('/path/','name',tables.floatAtom64(),(0,256,1,2),expectedrows=len(data_files))
>
> if the np array I have grabbed from an individual file to append to my
> EArray is defined as data, is there a way to use data.shape to create the
> shape of my EArray?
>
> In spirit, I want to do something like (0,data.shape)  but this does not
> work.
> I have been scouring the numpy manual to see how to convert
>
> data.shape
>  (256,1,2)
>
> to (0,256,1,2)
>
> but failed to figure this out (if I don't know ahead of time the shape of
> data - in which case I can manually reshape).
>
>
> Thanks,
>
> Andre
>
>
>
> ------------------------------------------------------------------------------
> Don't let slow site performance ruin your business. Deploy New Relic APM
> Deploy New Relic app performance management and know exactly
> what is happening inside your Ruby, Python, PHP, Java, and .NET app
> Try New Relic at no cost today and get our sweet Data Nerd shirt too!
> http://p.sf.net/sfu/newrelic-dev2dev
> _______________________________________________
> Pytables-users mailing list
> Pyt...@li...
> https://lists.sourceforge.net/lists/listinfo/pytables-users
>

[Pytables-users] EArray

From: Andre' Walker-L. <wal...@gm...> - 2012-10-06 17:42:49

Hi All,

I have a bunch of hdf5 files I am using to create one hdf5 file.
Each individual file has many different pieces of data, and they are all the same shape in each file.

I am using createEArray to make the large array in the final file.

if the data files in the individual h5 files are of shape (256,1,2), then I have to use 

createEArray('/path/','name',tables.floatAtom64(),(0,256,1,2),expectedrows=len(data_files))

if the np array I have grabbed from an individual file to append to my EArray is defined as data, is there a way to use data.shape to create the shape of my EArray?

In spirit, I want to do something like (0,data.shape)  but this does not work.
I have been scouring the numpy manual to see how to convert 

data.shape
 (256,1,2)

to (0,256,1,2)

but failed to figure this out (if I don't know ahead of time the shape of data - in which case I can manually reshape).


Thanks,

Andre

Re: [Pytables-users] PyTables hangs while opening file in worker process

From: Owen M. <owe...@bc...> - 2012-10-06 09:56:51

Hi Anthony,

I'm not trying to write in parallel. Each worker process has its own file
to write to. After all tasks are completed, I collect the results in the
master process. So the problem I'm seeing (a hang in the worker process)
shouldn't have anything to do with parallel writes. Do you have any other
suggestions?

Regards,
Owen

On 5 October 2012 18:38, Anthony Scopatz <sc...@gm...> wrote:

> Hello Owen,
>
> While you can use process pools to read from a file in parallel just fine,
> writing is another story completely.  While HDF5 itself supports parallel
> writing though MPI, this comes at the high cost of compression no longer
> being available and a much more complicated code base.  So for the time
> being, PyTables only supports the serial HDF5 library.
>
> Therefore if you want to write to a file in parallel, you adopt a strategy
> where you have one process which is responsible for all of the writing and
> all other processes send their data to this process instead of writing to
> file directly.  This is a very effective way of accomplishing basically
> what you need.  In fact, we have an example to do just that [1].  (As a
> side note: HDF5 may soon be adding an API for exactly this pattern because
> it comes up so often.)
>
> So if I were you, I would look at [1] and adopt it to my use case.
>
> Be Well
> Anthony
>
> 1.
> https://github.com/PyTables/PyTables/blob/develop/examples/multiprocess_access_queues.py
>
> On Fri, Oct 5, 2012 at 9:55 AM, Owen Mackwood <
> owe...@bc...> wrote:
>
>> Hello,
>>
>> I'm using a multiprocessing.Pool to parallelize a set of tasks which
>> record their results into separate hdf5 files. Occasionally (less than 2%
>> of the time) the worker process will hang. According to gdb, the problem
>> occurs while opening the hdf5 file, when it attempts to obtain the
>> associated mutex. Here's part of the backtrace:
>>
>> #0  0x00007fb2ceaa716c in pthread_cond_wait@@GLIBC_2.3.2 () from
>> /lib/libpthread.so.0
>> #1  0x00007fb2be61c215 in H5TS_mutex_lock () from /usr/lib/libhdf5.so.6
>> #2  0x00007fb2be32bff0 in H5open () from /usr/lib/libhdf5.so.6
>> #3  0x00007fb2b96226a4 in __pyx_pf_6tables_13hdf5Extension_4File__g_new
>> (__pyx_v_self=0x7fb2b04867d0, __pyx_args=<value optimized out>,
>> __pyx_kwds=<value optimized out>)
>>     at tables/hdf5Extension.c:2820
>> #4  0x00000000004abf62 in ext_do_call (f=0x4cb2430, throwflag=<value
>> optimized out>) at Python/ceval.c:4331
>>
>> Nothing else is trying to open this file, so can someone suggest why this
>> is occurring? This is a very annoying problem as there is no way to recover
>> from this error, and consequently the worker process is permanently
>> occupied, which effectively removes one of my processors from the pool.
>>
>> Regards,
>> Owen Mackwood
>>
>>
>> ------------------------------------------------------------------------------
>> Don't let slow site performance ruin your business. Deploy New Relic APM
>> Deploy New Relic app performance management and know exactly
>> what is happening inside your Ruby, Python, PHP, Java, and .NET app
>> Try New Relic at no cost today and get our sweet Data Nerd shirt too!
>> http://p.sf.net/sfu/newrelic-dev2dev
>> _______________________________________________
>> Pytables-users mailing list
>> Pyt...@li...
>> https://lists.sourceforge.net/lists/listinfo/pytables-users
>>
>>
>
>
> ------------------------------------------------------------------------------
> Don't let slow site performance ruin your business. Deploy New Relic APM
> Deploy New Relic app performance management and know exactly
> what is happening inside your Ruby, Python, PHP, Java, and .NET app
> Try New Relic at no cost today and get our sweet Data Nerd shirt too!
> http://p.sf.net/sfu/newrelic-dev2dev
> _______________________________________________
> Pytables-users mailing list
> Pyt...@li...
> https://lists.sourceforge.net/lists/listinfo/pytables-users
>
>

Re: [Pytables-users] PyTables hangs while opening file in worker process

From: Anthony S. <sc...@gm...> - 2012-10-05 16:39:27

Hello Owen,

While you can use process pools to read from a file in parallel just fine,
writing is another story completely.  While HDF5 itself supports parallel
writing though MPI, this comes at the high cost of compression no longer
being available and a much more complicated code base.  So for the time
being, PyTables only supports the serial HDF5 library.

Therefore if you want to write to a file in parallel, you adopt a strategy
where you have one process which is responsible for all of the writing and
all other processes send their data to this process instead of writing to
file directly.  This is a very effective way of accomplishing basically
what you need.  In fact, we have an example to do just that [1].  (As a
side note: HDF5 may soon be adding an API for exactly this pattern because
it comes up so often.)

So if I were you, I would look at [1] and adopt it to my use case.

Be Well
Anthony

1.
https://github.com/PyTables/PyTables/blob/develop/examples/multiprocess_access_queues.py

On Fri, Oct 5, 2012 at 9:55 AM, Owen Mackwood
<owe...@bc...>wrote:

> Hello,
>
> I'm using a multiprocessing.Pool to parallelize a set of tasks which
> record their results into separate hdf5 files. Occasionally (less than 2%
> of the time) the worker process will hang. According to gdb, the problem
> occurs while opening the hdf5 file, when it attempts to obtain the
> associated mutex. Here's part of the backtrace:
>
> #0  0x00007fb2ceaa716c in pthread_cond_wait@@GLIBC_2.3.2 () from
> /lib/libpthread.so.0
> #1  0x00007fb2be61c215 in H5TS_mutex_lock () from /usr/lib/libhdf5.so.6
> #2  0x00007fb2be32bff0 in H5open () from /usr/lib/libhdf5.so.6
> #3  0x00007fb2b96226a4 in __pyx_pf_6tables_13hdf5Extension_4File__g_new
> (__pyx_v_self=0x7fb2b04867d0, __pyx_args=<value optimized out>,
> __pyx_kwds=<value optimized out>)
>     at tables/hdf5Extension.c:2820
> #4  0x00000000004abf62 in ext_do_call (f=0x4cb2430, throwflag=<value
> optimized out>) at Python/ceval.c:4331
>
> Nothing else is trying to open this file, so can someone suggest why this
> is occurring? This is a very annoying problem as there is no way to recover
> from this error, and consequently the worker process is permanently
> occupied, which effectively removes one of my processors from the pool.
>
> Regards,
> Owen Mackwood
>
>
> ------------------------------------------------------------------------------
> Don't let slow site performance ruin your business. Deploy New Relic APM
> Deploy New Relic app performance management and know exactly
> what is happening inside your Ruby, Python, PHP, Java, and .NET app
> Try New Relic at no cost today and get our sweet Data Nerd shirt too!
> http://p.sf.net/sfu/newrelic-dev2dev
> _______________________________________________
> Pytables-users mailing list
> Pyt...@li...
> https://lists.sourceforge.net/lists/listinfo/pytables-users
>
>

[Pytables-users] PyTables hangs while opening file in worker process

From: Owen M. <owe...@bc...> - 2012-10-05 14:55:40

Hello,

I'm using a multiprocessing.Pool to parallelize a set of tasks which record
their results into separate hdf5 files. Occasionally (less than 2% of the
time) the worker process will hang. According to gdb, the problem occurs
while opening the hdf5 file, when it attempts to obtain the associated
mutex. Here's part of the backtrace:

#0  0x00007fb2ceaa716c in pthread_cond_wait@@GLIBC_2.3.2 () from
/lib/libpthread.so.0
#1  0x00007fb2be61c215 in H5TS_mutex_lock () from /usr/lib/libhdf5.so.6
#2  0x00007fb2be32bff0 in H5open () from /usr/lib/libhdf5.so.6
#3  0x00007fb2b96226a4 in __pyx_pf_6tables_13hdf5Extension_4File__g_new
(__pyx_v_self=0x7fb2b04867d0, __pyx_args=<value optimized out>,
__pyx_kwds=<value optimized out>)
    at tables/hdf5Extension.c:2820
#4  0x00000000004abf62 in ext_do_call (f=0x4cb2430, throwflag=<value
optimized out>) at Python/ceval.c:4331

Nothing else is trying to open this file, so can someone suggest why this
is occurring? This is a very annoying problem as there is no way to recover
from this error, and consequently the worker process is permanently
occupied, which effectively removes one of my processors from the pool.

Regards,
Owen Mackwood

Re: [Pytables-users] Optimizing pytables for reading entire columns at a time

From: Anthony S. <sc...@gm...> - 2012-09-28 14:19:03

On Fri, Sep 28, 2012 at 2:46 AM, Francesc Alted <fa...@py...> wrote:

> On 9/27/12 8:10 PM, Anthony Scopatz wrote:
> >
> > I think I remember seeing there was a performance limit with tables >
> > 255 columns.  I can't find a reference to that so it's possible I made
> > it up.  However, I was wondering if carrays had some limitation like
> > that.
> >
> > Tables are a different data set.  The issue with tables is that column
> > metadata (names, etc.) needs to fit in the attribute space.  The size
> > of this space is statically limited to 64 kb.  In my experience, this
> > number is in the thousands of columns (not hundreds).
>
> For the record, the PerformanceWarning issued by PyTables has nothing to
> do with the attribute space, but rather to the fact that putting too
> many columns in the same table means that you have to retrieve much more
> data even if you are retrieving only one single column.  Also, internal
> I/O buffers have to be much more larger, and compressors tend to work
> much less efficiently too.
>
> > On the other hand CArrays don't have much of any column metadata.
> >  CArrays should scale to an infinite number of columns without any issue.
>
> Yeah, they should scale better, although saying they can reach infinite
> scalability is a bit audacious :)  All the CArrays are datasets that
> have to be saved internally by HDF5, and that requires quite a few of
> resources to have track of them.
>

True, but I would ague that this is effectively infinite if you set your
chunksize
appropriately large.  I have never the run into an issue with HDF5 where
the
number of rows or columns on its own becomes too large for arrays.
 However,
it is relatively easy to reach this limit with tables (both in PyTables and
the HL
interface).  So maybe I should have said "effectively infinite" ;)


>
> --
> Francesc Alted
>
>
>
> ------------------------------------------------------------------------------
> Got visibility?
> Most devs has no idea what their production app looks like.
> Find out how fast your code is with AppDynamics Lite.
> http://ad.doubleclick.net/clk;262219671;13503038;y?
> http://info.appdynamics.com/FreeJavaPerformanceDownload.html
> _______________________________________________
> Pytables-users mailing list
> Pyt...@li...
> https://lists.sourceforge.net/lists/listinfo/pytables-users
>

Re: [Pytables-users] Optimizing pytables for reading entire columns at a time

From: Francesc A. <fa...@py...> - 2012-09-28 07:46:23

On 9/27/12 8:10 PM, Anthony Scopatz wrote:
>
> I think I remember seeing there was a performance limit with tables > 
> 255 columns.  I can't find a reference to that so it's possible I made 
> it up.  However, I was wondering if carrays had some limitation like 
> that.
>
> Tables are a different data set.  The issue with tables is that column 
> metadata (names, etc.) needs to fit in the attribute space.  The size 
> of this space is statically limited to 64 kb.  In my experience, this 
> number is in the thousands of columns (not hundreds).

For the record, the PerformanceWarning issued by PyTables has nothing to 
do with the attribute space, but rather to the fact that putting too 
many columns in the same table means that you have to retrieve much more 
data even if you are retrieving only one single column.  Also, internal 
I/O buffers have to be much more larger, and compressors tend to work 
much less efficiently too.

> On the other hand CArrays don't have much of any column metadata. 
>  CArrays should scale to an infinite number of columns without any issue.

Yeah, they should scale better, although saying they can reach infinite 
scalability is a bit audacious :)  All the CArrays are datasets that 
have to be saved internally by HDF5, and that requires quite a few of 
resources to have track of them.

-- 
Francesc Alted

Re: [Pytables-users] Optimizing pytables for reading entire columns at a time

From: Anthony S. <sc...@gm...> - 2012-09-27 18:11:23

On Thu, Sep 27, 2012 at 11:02 AM, Luke Lee <dur...@gm...> wrote:

> Are there any performance issues with relatively large carrays?  For
> example, say I have a carray with 300,000 float64s in it.  Is there some
> threshold where I could expect performance to degrade or anything?
>

Hello Luke,

The breakdowns happen when you have too many chunks.  However you are well
away from this threshold (which is ~20,000).  I believe that the PyTables
will issue a warning or error when you reach this point anyways.

> I think I remember seeing there was a performance limit with tables > 255
> columns.  I can't find a reference to that so it's possible I made it up.
>  However, I was wondering if carrays had some limitation like that.
>

Tables are a different data set.  The issue with tables is that column
metadata (names, etc.) needs to fit in the attribute space.  The size of
this space is statically limited to 64 kb.  In my experience, this number
is in the thousands of columns (not hundreds). On the other hand CArrays
don't have much of any column metadata.  CArrays should scale to an
infinite number of columns without any issue.

Be Well
Anthony

>
>
> ------------------------------------------------------------------------------
> Everyone hates slow websites. So do we.
> Make your web apps faster with AppDynamics
> Download AppDynamics Lite for free today:
> http://ad.doubleclick.net/clk;258768047;13503038;j?
> http://info.appdynamics.com/FreeJavaPerformanceDownload.html
> _______________________________________________
> Pytables-users mailing list
> Pyt...@li...
> https://lists.sourceforge.net/lists/listinfo/pytables-users
>
>

[Pytables-users] Optimizing pytables for reading entire columns at a time

From: Luke L. <dur...@gm...> - 2012-09-27 16:02:18

Are there any performance issues with relatively large carrays?  For
example, say I have a carray with 300,000 float64s in it.  Is there some
threshold where I could expect performance to degrade or anything?

I think I remember seeing there was a performance limit with tables > 255
columns.  I can't find a reference to that so it's possible I made it up.
 However, I was wondering if carrays had some limitation like that.

Re: [Pytables-users] where() with start/stop args returning incorrect result set

From: Anthony S. <sc...@gm...> - 2012-09-25 18:33:18

Hello Derek, and devs,

After playing around with your data,  I am able to reproduce this error on
my system.
I am not sure exactly where the problem is but I do know how to fix it!

It turns out that this is an issue with the indexes not being properly in
sync with the
original table OR the start and stop values are not
being propagated properly down
to the indexes.  When I tried to reindex by calling table.reIndex(), this
did not fix the
issue.  This makes me think that the problem is propagating start, stop,
and step all
the way through correctly.  I'll go ahead an make a ticket reflecting this.

That said, the way to fix this in the short term is to do one of the
following

1)  Only use start=0, and step=1 (I bet that other stop values work)
2) Don't use indexes.  When I removed the indexes from the file using
    "ptrepack analysis.h5 analysis2.h5", everything worked fine.

Thanks a ton for reporting this!
Be Well
Anthony

On Tue, Sep 25, 2012 at 12:30 PM, Derek Shockey <der...@gm...>wrote:

> Hi Anthony,
>
> It doesn't happen if I set start=0 or seemingly any number below 3257
> (though I didn't try them *all*). I am new to PyTables and hdf5, so
> I'm not sure about the chunksize or if I'm at a boundary. I did
> however notice that the table's chunkshape is 203, and this happens
> for exactly 203 sequential records, so I doubt that's a coincidence.
> The table description is below.
>
> Thanks,
> Derek
>
> /events (Table(5988,)) ''
>   description := {
>   "client_id": StringCol(itemsize=24, shape=(), dflt='', pos=0),
>   "data_01": StringCol(itemsize=36, shape=(), dflt='', pos=1),
>   "data_02": StringCol(itemsize=36, shape=(), dflt='', pos=2),
>   "data_03": StringCol(itemsize=36, shape=(), dflt='', pos=3),
>   "data_04": StringCol(itemsize=36, shape=(), dflt='', pos=4),
>   "data_05": StringCol(itemsize=36, shape=(), dflt='', pos=5),
>   "device_id": StringCol(itemsize=36, shape=(), dflt='', pos=6),
>   "id": StringCol(itemsize=36, shape=(), dflt='', pos=7),
>   "timestamp": Time64Col(shape=(), dflt=0.0, pos=8),
>   "type": UInt16Col(shape=(), dflt=0, pos=9),
>   "user_id": StringCol(itemsize=36, shape=(), dflt='', pos=10)}
>   byteorder := 'little'
>   chunkshape := (203,)
>   autoIndex := True
>   colindexes := {
>     "timestamp": Index(9, full, shuffle, zlib(1)).is_CSI=True,
>     "type": Index(9, full, shuffle, zlib(1)).is_CSI=True,
>     "id": Index(9, full, shuffle, zlib(1)).is_CSI=True,
>     "user_id": Index(9, full, shuffle, zlib(1)).is_CSI=True}
>
> On Tue, Sep 25, 2012 at 9:32 AM, Anthony Scopatz <sc...@gm...>
> wrote:
> > Hi Derek,
> >
> > Ok That is very strange.  I cannot reproduce this on any of my data.  A
> > quick couple of extra questions:
> >
> > 1) Does this still happen when you set start=0?
> > 2) What is the chunksize of this data set (are you at a boundary)?
> > 3) Could you send us the full table information, ie repr(table).
> >
> > Be Well
> > Anthony
> >
> >
> > On Tue, Sep 25, 2012 at 12:42 AM, Derek Shockey <der...@gm...
> >
> > wrote:
> >>
> >> I ran the tests. All 4988 passed. The information it output is:
> >>
> >> PyTables version:  2.4.0
> >> HDF5 version:      1.8.9
> >> NumPy version:     1.6.2
> >> Numexpr version:   2.0.1 (not using Intel's VML/MKL)
> >> Zlib version:      1.2.5 (in Python interpreter)
> >> LZO version:       2.06 (Aug 12 2011)
> >> BZIP2 version:     1.0.6 (6-Sept-2010)
> >> Blosc version:     1.1.3 (2010-11-16)
> >> Cython version:    0.16
> >> Python version:    2.7.3 (default, Jul  6 2012, 00:17:51)
> >> [GCC 4.2.1 Compatible Apple Clang 3.1 (tags/Apple/clang-318.0.58)]
> >> Platform:          darwin-x86_64
> >> Byte-ordering:     little
> >> Detected cores:    4
> >>
> >> -Derek
> >>
> >> On Mon, Sep 24, 2012 at 9:09 PM, Anthony Scopatz <sc...@gm...>
> >> wrote:
> >> > Hi Derek,
> >> >
> >> > Can you please run the following command and report back what you see?
> >> >
> >> > python -c "import tables; tables.test()"
> >> >
> >> > Be Well
> >> > Anthony
> >> >
> >> > On Mon, Sep 24, 2012 at 10:56 PM, Derek Shockey
> >> > <der...@gm...>
> >> > wrote:
> >> >>
> >> >> Hello,
> >> >>
> >> >> I'm hoping someone can help me. When I specify start and stop values
> >> >> for calls to where() and readWhere(), it is returning blatantly
> >> >> incorrect results:
> >> >>
> >> >> >>> table.readWhere("id == 'ceec536a-394e-4dd7-a182-eea557f3bb93'",
> >> >> >>> start=3257, stop=table.nrows)[0]['id']
> >> >> '7f589d3e-a0e1-4882-b69b-0223a7de3801'
> >> >>
> >> >> >>> table.where("id == 'ceec536a-394e-4dd7-a182-eea557f3bb93'",
> >> >> >>> start=3257, stop=table.nrows).next()['id']
> >> >> '7f589d3e-a0e1-4882-b69b-0223a7de3801'
> >> >>
> >> >> This happens with a sequential block of about 150 rows of data, and
> >> >> each time it seems to be 8 rows off (i.e. the row it returns is 8
> rows
> >> >> ahead of the row it should be returning). If I remove the start and
> >> >> stop args, it behaves correctly. This seems to be a bug, unless I am
> >> >> misunderstanding something. I'm using Python 2.7.3, PyTables 2.4.0,
> >> >> and hdf5 1.8.9 on OS X 10.8.2.
> >> >>
> >> >> Any ideas?
> >> >>
> >> >> Thanks,
> >> >> Derek
> >> >>
> >> >>
> >> >>
> >> >>
> ------------------------------------------------------------------------------
> >> >> Live Security Virtual Conference
> >> >> Exclusive live event will cover all the ways today's security and
> >> >> threat landscape has changed and how IT managers can respond.
> >> >> Discussions
> >> >> will include endpoint security, mobile security and the latest in
> >> >> malware
> >> >> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> >> >> _______________________________________________
> >> >> Pytables-users mailing list
> >> >> Pyt...@li...
> >> >> https://lists.sourceforge.net/lists/listinfo/pytables-users
> >> >
> >> >
> >> >
> >> >
> >> >
> ------------------------------------------------------------------------------
> >> > Live Security Virtual Conference
> >> > Exclusive live event will cover all the ways today's security and
> >> > threat landscape has changed and how IT managers can respond.
> >> > Discussions
> >> > will include endpoint security, mobile security and the latest in
> >> > malware
> >> > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> >> > _______________________________________________
> >> > Pytables-users mailing list
> >> > Pyt...@li...
> >> > https://lists.sourceforge.net/lists/listinfo/pytables-users
> >> >
> >>
> >>
> >>
> ------------------------------------------------------------------------------
> >> Live Security Virtual Conference
> >> Exclusive live event will cover all the ways today's security and
> >> threat landscape has changed and how IT managers can respond.
> Discussions
> >> will include endpoint security, mobile security and the latest in
> malware
> >> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> >> _______________________________________________
> >> Pytables-users mailing list
> >> Pyt...@li...
> >> https://lists.sourceforge.net/lists/listinfo/pytables-users
> >
> >
> >
> >
> ------------------------------------------------------------------------------
> > Live Security Virtual Conference
> > Exclusive live event will cover all the ways today's security and
> > threat landscape has changed and how IT managers can respond. Discussions
> > will include endpoint security, mobile security and the latest in malware
> > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> > _______________________________________________
> > Pytables-users mailing list
> > Pyt...@li...
> > https://lists.sourceforge.net/lists/listinfo/pytables-users
> >
>
>
> ------------------------------------------------------------------------------
> Live Security Virtual Conference
> Exclusive live event will cover all the ways today's security and
> threat landscape has changed and how IT managers can respond. Discussions
> will include endpoint security, mobile security and the latest in malware
> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> _______________________________________________
> Pytables-users mailing list
> Pyt...@li...
> https://lists.sourceforge.net/lists/listinfo/pytables-users
>

[Pytables-users] [ANN] Blosc 1.1.5 re

From: Francesc A. <fa...@py...> - 2012-09-25 17:43:16

Oh, I think this is the one that I wanted to announce.  Sorry about that...

===============================================================
  Announcing Blosc 1.1.5
  A blocking, shuffling and lossless compression library
===============================================================

What is new?
============

This is maintenance release fixing an issue that avoided compilation
with MSVC.

For more info, please see the release notes in:

https://github.com/FrancescAlted/blosc/wiki/Release-notes

What is it?
===========

Blosc (http://blosc.pytables.org) is a high performance compressor
optimized for binary data.  It has been designed to transmit data to
the processor cache faster than the traditional, non-compressed,
direct memory fetch approach via a memcpy() OS call.

Blosc is the first compressor (that I'm aware of) that is meant not
only to reduce the size of large datasets on-disk or in-memory, but
also to accelerate object manipulations that are memory-bound.

It also comes with a filter for HDF5 (http://www.hdfgroup.org/HDF5) so
that you can easily implement support for Blosc in your favourite HDF5
tool.

Download sources
================

Please go to main web site:

http://blosc.pytables.org/sources/

or the github repository:

https://github.com/FrancescAlted/blosc

and download the most recent release from there.

Blosc is distributed using the MIT license, see LICENSES/BLOSC.txt for
details.

Mailing list
============

There is an official Blosc blosc mailing list at:

bl...@go...
http://groups.google.es/group/blosc

-- 
Francesc Alted

[Pytables-users] [ANN] python-blosc 1.0.6 released

From: Francesc A. <fa...@py...> - 2012-09-25 17:30:44

Ups, forgot to announce this here...

=============================
Announcing python-blosc 1.0.6
=============================

What is it?
===========

A Python wrapper for the Blosc compression library.

Blosc (http://blosc.pytables.org) is a high performance compressor
optimized for binary data.  It has been designed to transmit data to
the processor cache faster than the traditional, non-compressed,
direct memory fetch approach via a memcpy() OS call.

Blosc works well for compressing numerical arrays that contains data
with relatively low entropy, like sparse data, time series, grids with
regular-spaced values, etc.

What is new?
============

Updated to use Blosc 1.1.5 (fix compile error with MSVC compilers). 
Thanks to Christoph Gohlke.

For more info, you can see the release notes in:

https://github.com/FrancescAlted/python-blosc/wiki/Release-notes

More docs and examples are available in the Quick User's Guide wiki page:

https://github.com/FrancescAlted/python-blosc/wiki/Quick-User's-Guide

Download sources
================

Go to:

http://github.com/FrancescAlted/python-blosc

and download the most recent release from there.

Blosc is distributed using the MIT license, see LICENSES/BLOSC.txt for
details.

Mailing list
============

There is an official mailing list for Blosc at:

bl...@go...
http://groups.google.es/group/blosc

-- 
Francesc Alted

22 messages has been excluded from this view by a project administrator.

Flat | Threaded

<< < 1 .. 15 16 17 18 19 .. 165 > >> (Page 17 of 165)