Thread: [Pytables-users] conflict between multiprocessing and pytables

Brought to you by: a_valentino, falted, ivilata, joshmoore

pytables-users

[Pytables-users] conflict between multiprocessing and pytables

From: lionel c. <lio...@gm...> - 2011-05-25 09:39:35

Hi All,

we tried to use pytables with multiprocessing (multiprocessing module).
When reading various rows of a hdf5 file in various process there is no
problem but if we to write the result of the row calculus in an other hdf5
file it crashes unexpectedily.
If not using multiprocessing there is no issue. Did someone had the same
problem by reading and writing in DIFFERENT files?
Thanks!

Cheers
Lionel

Re: [Pytables-users] conflict between multiprocessing and pytables

From: Anthony S. <sc...@gm...> - 2011-05-25 16:20:39

Hi Lionel,

Consistent, atomic, file i/o is sort of a fundamentally serial task.
Trying to do this in parallel is almost guaranteed to fail in one
way or another.

What you need is a caching / blocking mechanism on top of the
HDF5 file.  All of your processes would write to this queue which
would then write to the table when it gets the spare cycles.

It wouldn't be too hard to do.  I would look into ZeroMQ and pyzmq.

Perhaps other people have other suggestions...

Be Well
Anthony

On Wed, May 25, 2011 at 4:39 AM, lionel chiron <lio...@gm...>wrote:

> Hi All,
>
> we tried to use pytables with multiprocessing (multiprocessing module).
> When reading various rows of a hdf5 file in various process there is no
> problem but if we to write the result of the row calculus in an other hdf5
> file it crashes unexpectedily.
> If not using multiprocessing there is no issue. Did someone had the same
> problem by reading and writing in DIFFERENT files?
> Thanks!
>
> Cheers
> Lionel
>
>
> ------------------------------------------------------------------------------
> vRanger cuts backup time in half-while increasing security.
> With the market-leading solution for virtual backup and recovery,
> you get blazing-fast, flexible, and affordable data protection.
> Download your free trial now.
> http://p.sf.net/sfu/quest-d2dcopy1
> _______________________________________________
> Pytables-users mailing list
> Pyt...@li...
> https://lists.sourceforge.net/lists/listinfo/pytables-users
>
>

Re: [Pytables-users] conflict between multiprocessing and pytables

From: lionel c. <lio...@gm...> - 2011-05-25 16:52:26

Hi Antony,
thank you for your answer and propositions to debug the issue.
We knew there could be problems with i/o with hdf5 but what we do seems
consistent with a caching / blocking procedure..
The way it is done is :
1)to realize a pool..
2)to associate a calculus to a function and iterator (index on columns of
hdf5 first file).. with pool.imap
3)to make a loop on the imap with writing  each result in the hdf5 (second
file)..
The loop allows in principle the calculus to be called one after an other..
in a way it should be a blocking maneer to fill our matrix.. no?

Best
Lionel


2011/5/25 Anthony Scopatz <sc...@gm...>

> Hi Lionel,
>
> Consistent, atomic, file i/o is sort of a fundamentally serial task.
> Trying to do this in parallel is almost guaranteed to fail in one
> way or another.
>
> What you need is a caching / blocking mechanism on top of the
> HDF5 file.  All of your processes would write to this queue which
> would then write to the table when it gets the spare cycles.
>
> It wouldn't be too hard to do.  I would look into ZeroMQ and pyzmq.
>
> Perhaps other people have other suggestions...
>
> Be Well
> Anthony
>
> On Wed, May 25, 2011 at 4:39 AM, lionel chiron <lio...@gm...>wrote:
>
>> Hi All,
>>
>> we tried to use pytables with multiprocessing (multiprocessing module).
>> When reading various rows of a hdf5 file in various process there is no
>> problem but if we to write the result of the row calculus in an other hdf5
>> file it crashes unexpectedily.
>> If not using multiprocessing there is no issue. Did someone had the same
>> problem by reading and writing in DIFFERENT files?
>> Thanks!
>>
>> Cheers
>> Lionel
>>
>>
>> ------------------------------------------------------------------------------
>> vRanger cuts backup time in half-while increasing security.
>> With the market-leading solution for virtual backup and recovery,
>> you get blazing-fast, flexible, and affordable data protection.
>> Download your free trial now.
>> http://p.sf.net/sfu/quest-d2dcopy1
>> _______________________________________________
>> Pytables-users mailing list
>> Pyt...@li...
>> https://lists.sourceforge.net/lists/listinfo/pytables-users
>>
>>
>
>
> ------------------------------------------------------------------------------
> vRanger cuts backup time in half-while increasing security.
> With the market-leading solution for virtual backup and recovery,
> you get blazing-fast, flexible, and affordable data protection.
> Download your free trial now.
> http://p.sf.net/sfu/quest-d2dcopy1
> _______________________________________________
> Pytables-users mailing list
> Pyt...@li...
> https://lists.sourceforge.net/lists/listinfo/pytables-users
>
>

Re: [Pytables-users] conflict between multiprocessing and pytables

From: Anthony S. <sc...@gm...> - 2011-05-25 16:59:45

Hmm without knowing the code base at all, it seems like there could be other
issues.
Here are a few ideas:

1) Are you making sure to flush() or close() the file after each write?
2) It may be safest to each write event to also have its own, new file
handler (ie call openFile again)
3) Have a separate process that is the only process that is allowed to touch
the file

Be Well
Anthony

On Wed, May 25, 2011 at 11:52 AM, lionel chiron <lio...@gm...>wrote:

> Hi Antony,
> thank you for your answer and propositions to debug the issue.
> We knew there could be problems with i/o with hdf5 but what we do seems
> consistent with a caching / blocking procedure..
> The way it is done is :
> 1)to realize a pool..
> 2)to associate a calculus to a function and iterator (index on columns of
> hdf5 first file).. with pool.imap
> 3)to make a loop on the imap with writing  each result in the hdf5 (second
> file)..
> The loop allows in principle the calculus to be called one after an other..
>
> in a way it should be a blocking maneer to fill our matrix.. no?
>
> Best
> Lionel
>
>
>
> 2011/5/25 Anthony Scopatz <sc...@gm...>
>
>> Hi Lionel,
>>
>> Consistent, atomic, file i/o is sort of a fundamentally serial task.
>> Trying to do this in parallel is almost guaranteed to fail in one
>> way or another.
>>
>> What you need is a caching / blocking mechanism on top of the
>> HDF5 file.  All of your processes would write to this queue which
>> would then write to the table when it gets the spare cycles.
>>
>> It wouldn't be too hard to do.  I would look into ZeroMQ and pyzmq.
>>
>> Perhaps other people have other suggestions...
>>
>> Be Well
>> Anthony
>>
>> On Wed, May 25, 2011 at 4:39 AM, lionel chiron <lio...@gm...>wrote:
>>
>>> Hi All,
>>>
>>> we tried to use pytables with multiprocessing (multiprocessing module).
>>> When reading various rows of a hdf5 file in various process there is no
>>> problem but if we to write the result of the row calculus in an other hdf5
>>> file it crashes unexpectedily.
>>> If not using multiprocessing there is no issue. Did someone had the same
>>> problem by reading and writing in DIFFERENT files?
>>> Thanks!
>>>
>>> Cheers
>>> Lionel
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> vRanger cuts backup time in half-while increasing security.
>>> With the market-leading solution for virtual backup and recovery,
>>> you get blazing-fast, flexible, and affordable data protection.
>>> Download your free trial now.
>>> http://p.sf.net/sfu/quest-d2dcopy1
>>> _______________________________________________
>>> Pytables-users mailing list
>>> Pyt...@li...
>>> https://lists.sourceforge.net/lists/listinfo/pytables-users
>>>
>>>
>>
>>
>> ------------------------------------------------------------------------------
>> vRanger cuts backup time in half-while increasing security.
>> With the market-leading solution for virtual backup and recovery,
>> you get blazing-fast, flexible, and affordable data protection.
>> Download your free trial now.
>> http://p.sf.net/sfu/quest-d2dcopy1
>> _______________________________________________
>> Pytables-users mailing list
>> Pyt...@li...
>> https://lists.sourceforge.net/lists/listinfo/pytables-users
>>
>>
>
>
> ------------------------------------------------------------------------------
> vRanger cuts backup time in half-while increasing security.
> With the market-leading solution for virtual backup and recovery,
> you get blazing-fast, flexible, and affordable data protection.
> Download your free trial now.
> http://p.sf.net/sfu/quest-d2dcopy1
> _______________________________________________
> Pytables-users mailing list
> Pyt...@li...
> https://lists.sourceforge.net/lists/listinfo/pytables-users
>
>

Re: [Pytables-users] conflict between multiprocessing and pytables

From: lionel c. <lio...@gm...> - 2011-05-25 17:14:48

Hi have difficutties with my mail on my main computer where is my code so I
can't esaily paste my code here.
But what you sugggest about flushing and closing could be a good reason for
why it doesn't work.. I'll try this way (that is the simplest one to test
first)

Cheers
Lionel

2011/5/25 Anthony Scopatz <sc...@gm...>

> Hmm without knowing the code base at all, it seems like there could be
> other issues.
> Here are a few ideas:
>
> 1) Are you making sure to flush() or close() the file after each write?
> 2) It may be safest to each write event to also have its own, new file
> handler (ie call openFile again)
> 3) Have a separate process that is the only process that is allowed to
> touch the file
>
> Be Well
> Anthony
>
>
> On Wed, May 25, 2011 at 11:52 AM, lionel chiron <lio...@gm...>wrote:
>
>> Hi Antony,
>> thank you for your answer and propositions to debug the issue.
>> We knew there could be problems with i/o with hdf5 but what we do seems
>> consistent with a caching / blocking procedure..
>> The way it is done is :
>> 1)to realize a pool..
>> 2)to associate a calculus to a function and iterator (index on columns of
>> hdf5 first file).. with pool.imap
>> 3)to make a loop on the imap with writing  each result in the hdf5 (second
>> file)..
>> The loop allows in principle the calculus to be called one after an
>> other..
>> in a way it should be a blocking maneer to fill our matrix.. no?
>>
>> Best
>> Lionel
>>
>>
>>
>> 2011/5/25 Anthony Scopatz <sc...@gm...>
>>
>>> Hi Lionel,
>>>
>>> Consistent, atomic, file i/o is sort of a fundamentally serial task.
>>> Trying to do this in parallel is almost guaranteed to fail in one
>>> way or another.
>>>
>>> What you need is a caching / blocking mechanism on top of the
>>> HDF5 file.  All of your processes would write to this queue which
>>> would then write to the table when it gets the spare cycles.
>>>
>>> It wouldn't be too hard to do.  I would look into ZeroMQ and pyzmq.
>>>
>>> Perhaps other people have other suggestions...
>>>
>>> Be Well
>>> Anthony
>>>
>>> On Wed, May 25, 2011 at 4:39 AM, lionel chiron <lio...@gm...>wrote:
>>>
>>>> Hi All,
>>>>
>>>> we tried to use pytables with multiprocessing (multiprocessing module).
>>>> When reading various rows of a hdf5 file in various process there is no
>>>> problem but if we to write the result of the row calculus in an other hdf5
>>>> file it crashes unexpectedily.
>>>> If not using multiprocessing there is no issue. Did someone had the same
>>>> problem by reading and writing in DIFFERENT files?
>>>> Thanks!
>>>>
>>>> Cheers
>>>> Lionel
>>>>
>>>>
>>>> ------------------------------------------------------------------------------
>>>> vRanger cuts backup time in half-while increasing security.
>>>> With the market-leading solution for virtual backup and recovery,
>>>> you get blazing-fast, flexible, and affordable data protection.
>>>> Download your free trial now.
>>>> http://p.sf.net/sfu/quest-d2dcopy1
>>>> _______________________________________________
>>>> Pytables-users mailing list
>>>> Pyt...@li...
>>>> https://lists.sourceforge.net/lists/listinfo/pytables-users
>>>>
>>>>
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> vRanger cuts backup time in half-while increasing security.
>>> With the market-leading solution for virtual backup and recovery,
>>> you get blazing-fast, flexible, and affordable data protection.
>>> Download your free trial now.
>>> http://p.sf.net/sfu/quest-d2dcopy1
>>> _______________________________________________
>>> Pytables-users mailing list
>>> Pyt...@li...
>>> https://lists.sourceforge.net/lists/listinfo/pytables-users
>>>
>>>
>>
>>
>> ------------------------------------------------------------------------------
>> vRanger cuts backup time in half-while increasing security.
>> With the market-leading solution for virtual backup and recovery,
>> you get blazing-fast, flexible, and affordable data protection.
>> Download your free trial now.
>> http://p.sf.net/sfu/quest-d2dcopy1
>> _______________________________________________
>> Pytables-users mailing list
>> Pyt...@li...
>> https://lists.sourceforge.net/lists/listinfo/pytables-users
>>
>>
>
>
> ------------------------------------------------------------------------------
> vRanger cuts backup time in half-while increasing security.
> With the market-leading solution for virtual backup and recovery,
> you get blazing-fast, flexible, and affordable data protection.
> Download your free trial now.
> http://p.sf.net/sfu/quest-d2dcopy1
> _______________________________________________
> Pytables-users mailing list
> Pyt...@li...
> https://lists.sourceforge.net/lists/listinfo/pytables-users
>
>