From: lionel c. <lio...@gm...> - 2011-05-25 09:39:35
|
Hi All, we tried to use pytables with multiprocessing (multiprocessing module). When reading various rows of a hdf5 file in various process there is no problem but if we to write the result of the row calculus in an other hdf5 file it crashes unexpectedily. If not using multiprocessing there is no issue. Did someone had the same problem by reading and writing in DIFFERENT files? Thanks! Cheers Lionel |
From: Anthony S. <sc...@gm...> - 2011-05-25 16:20:39
|
Hi Lionel, Consistent, atomic, file i/o is sort of a fundamentally serial task. Trying to do this in parallel is almost guaranteed to fail in one way or another. What you need is a caching / blocking mechanism on top of the HDF5 file. All of your processes would write to this queue which would then write to the table when it gets the spare cycles. It wouldn't be too hard to do. I would look into ZeroMQ and pyzmq. Perhaps other people have other suggestions... Be Well Anthony On Wed, May 25, 2011 at 4:39 AM, lionel chiron <lio...@gm...>wrote: > Hi All, > > we tried to use pytables with multiprocessing (multiprocessing module). > When reading various rows of a hdf5 file in various process there is no > problem but if we to write the result of the row calculus in an other hdf5 > file it crashes unexpectedily. > If not using multiprocessing there is no issue. Did someone had the same > problem by reading and writing in DIFFERENT files? > Thanks! > > Cheers > Lionel > > > ------------------------------------------------------------------------------ > vRanger cuts backup time in half-while increasing security. > With the market-leading solution for virtual backup and recovery, > you get blazing-fast, flexible, and affordable data protection. > Download your free trial now. > http://p.sf.net/sfu/quest-d2dcopy1 > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-users > > |
From: lionel c. <lio...@gm...> - 2011-05-25 16:52:26
|
Hi Antony, thank you for your answer and propositions to debug the issue. We knew there could be problems with i/o with hdf5 but what we do seems consistent with a caching / blocking procedure.. The way it is done is : 1)to realize a pool.. 2)to associate a calculus to a function and iterator (index on columns of hdf5 first file).. with pool.imap 3)to make a loop on the imap with writing each result in the hdf5 (second file).. The loop allows in principle the calculus to be called one after an other.. in a way it should be a blocking maneer to fill our matrix.. no? Best Lionel 2011/5/25 Anthony Scopatz <sc...@gm...> > Hi Lionel, > > Consistent, atomic, file i/o is sort of a fundamentally serial task. > Trying to do this in parallel is almost guaranteed to fail in one > way or another. > > What you need is a caching / blocking mechanism on top of the > HDF5 file. All of your processes would write to this queue which > would then write to the table when it gets the spare cycles. > > It wouldn't be too hard to do. I would look into ZeroMQ and pyzmq. > > Perhaps other people have other suggestions... > > Be Well > Anthony > > On Wed, May 25, 2011 at 4:39 AM, lionel chiron <lio...@gm...>wrote: > >> Hi All, >> >> we tried to use pytables with multiprocessing (multiprocessing module). >> When reading various rows of a hdf5 file in various process there is no >> problem but if we to write the result of the row calculus in an other hdf5 >> file it crashes unexpectedily. >> If not using multiprocessing there is no issue. Did someone had the same >> problem by reading and writing in DIFFERENT files? >> Thanks! >> >> Cheers >> Lionel >> >> >> ------------------------------------------------------------------------------ >> vRanger cuts backup time in half-while increasing security. >> With the market-leading solution for virtual backup and recovery, >> you get blazing-fast, flexible, and affordable data protection. >> Download your free trial now. >> http://p.sf.net/sfu/quest-d2dcopy1 >> _______________________________________________ >> Pytables-users mailing list >> Pyt...@li... >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> >> > > > ------------------------------------------------------------------------------ > vRanger cuts backup time in half-while increasing security. > With the market-leading solution for virtual backup and recovery, > you get blazing-fast, flexible, and affordable data protection. > Download your free trial now. > http://p.sf.net/sfu/quest-d2dcopy1 > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-users > > |
From: Anthony S. <sc...@gm...> - 2011-05-25 16:59:45
|
Hmm without knowing the code base at all, it seems like there could be other issues. Here are a few ideas: 1) Are you making sure to flush() or close() the file after each write? 2) It may be safest to each write event to also have its own, new file handler (ie call openFile again) 3) Have a separate process that is the only process that is allowed to touch the file Be Well Anthony On Wed, May 25, 2011 at 11:52 AM, lionel chiron <lio...@gm...>wrote: > Hi Antony, > thank you for your answer and propositions to debug the issue. > We knew there could be problems with i/o with hdf5 but what we do seems > consistent with a caching / blocking procedure.. > The way it is done is : > 1)to realize a pool.. > 2)to associate a calculus to a function and iterator (index on columns of > hdf5 first file).. with pool.imap > 3)to make a loop on the imap with writing each result in the hdf5 (second > file).. > The loop allows in principle the calculus to be called one after an other.. > > in a way it should be a blocking maneer to fill our matrix.. no? > > Best > Lionel > > > > 2011/5/25 Anthony Scopatz <sc...@gm...> > >> Hi Lionel, >> >> Consistent, atomic, file i/o is sort of a fundamentally serial task. >> Trying to do this in parallel is almost guaranteed to fail in one >> way or another. >> >> What you need is a caching / blocking mechanism on top of the >> HDF5 file. All of your processes would write to this queue which >> would then write to the table when it gets the spare cycles. >> >> It wouldn't be too hard to do. I would look into ZeroMQ and pyzmq. >> >> Perhaps other people have other suggestions... >> >> Be Well >> Anthony >> >> On Wed, May 25, 2011 at 4:39 AM, lionel chiron <lio...@gm...>wrote: >> >>> Hi All, >>> >>> we tried to use pytables with multiprocessing (multiprocessing module). >>> When reading various rows of a hdf5 file in various process there is no >>> problem but if we to write the result of the row calculus in an other hdf5 >>> file it crashes unexpectedily. >>> If not using multiprocessing there is no issue. Did someone had the same >>> problem by reading and writing in DIFFERENT files? >>> Thanks! >>> >>> Cheers >>> Lionel >>> >>> >>> ------------------------------------------------------------------------------ >>> vRanger cuts backup time in half-while increasing security. >>> With the market-leading solution for virtual backup and recovery, >>> you get blazing-fast, flexible, and affordable data protection. >>> Download your free trial now. >>> http://p.sf.net/sfu/quest-d2dcopy1 >>> _______________________________________________ >>> Pytables-users mailing list >>> Pyt...@li... >>> https://lists.sourceforge.net/lists/listinfo/pytables-users >>> >>> >> >> >> ------------------------------------------------------------------------------ >> vRanger cuts backup time in half-while increasing security. >> With the market-leading solution for virtual backup and recovery, >> you get blazing-fast, flexible, and affordable data protection. >> Download your free trial now. >> http://p.sf.net/sfu/quest-d2dcopy1 >> _______________________________________________ >> Pytables-users mailing list >> Pyt...@li... >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> >> > > > ------------------------------------------------------------------------------ > vRanger cuts backup time in half-while increasing security. > With the market-leading solution for virtual backup and recovery, > you get blazing-fast, flexible, and affordable data protection. > Download your free trial now. > http://p.sf.net/sfu/quest-d2dcopy1 > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-users > > |
From: lionel c. <lio...@gm...> - 2011-05-25 17:14:48
|
Hi have difficutties with my mail on my main computer where is my code so I can't esaily paste my code here. But what you sugggest about flushing and closing could be a good reason for why it doesn't work.. I'll try this way (that is the simplest one to test first) Cheers Lionel 2011/5/25 Anthony Scopatz <sc...@gm...> > Hmm without knowing the code base at all, it seems like there could be > other issues. > Here are a few ideas: > > 1) Are you making sure to flush() or close() the file after each write? > 2) It may be safest to each write event to also have its own, new file > handler (ie call openFile again) > 3) Have a separate process that is the only process that is allowed to > touch the file > > Be Well > Anthony > > > On Wed, May 25, 2011 at 11:52 AM, lionel chiron <lio...@gm...>wrote: > >> Hi Antony, >> thank you for your answer and propositions to debug the issue. >> We knew there could be problems with i/o with hdf5 but what we do seems >> consistent with a caching / blocking procedure.. >> The way it is done is : >> 1)to realize a pool.. >> 2)to associate a calculus to a function and iterator (index on columns of >> hdf5 first file).. with pool.imap >> 3)to make a loop on the imap with writing each result in the hdf5 (second >> file).. >> The loop allows in principle the calculus to be called one after an >> other.. >> in a way it should be a blocking maneer to fill our matrix.. no? >> >> Best >> Lionel >> >> >> >> 2011/5/25 Anthony Scopatz <sc...@gm...> >> >>> Hi Lionel, >>> >>> Consistent, atomic, file i/o is sort of a fundamentally serial task. >>> Trying to do this in parallel is almost guaranteed to fail in one >>> way or another. >>> >>> What you need is a caching / blocking mechanism on top of the >>> HDF5 file. All of your processes would write to this queue which >>> would then write to the table when it gets the spare cycles. >>> >>> It wouldn't be too hard to do. I would look into ZeroMQ and pyzmq. >>> >>> Perhaps other people have other suggestions... >>> >>> Be Well >>> Anthony >>> >>> On Wed, May 25, 2011 at 4:39 AM, lionel chiron <lio...@gm...>wrote: >>> >>>> Hi All, >>>> >>>> we tried to use pytables with multiprocessing (multiprocessing module). >>>> When reading various rows of a hdf5 file in various process there is no >>>> problem but if we to write the result of the row calculus in an other hdf5 >>>> file it crashes unexpectedily. >>>> If not using multiprocessing there is no issue. Did someone had the same >>>> problem by reading and writing in DIFFERENT files? >>>> Thanks! >>>> >>>> Cheers >>>> Lionel >>>> >>>> >>>> ------------------------------------------------------------------------------ >>>> vRanger cuts backup time in half-while increasing security. >>>> With the market-leading solution for virtual backup and recovery, >>>> you get blazing-fast, flexible, and affordable data protection. >>>> Download your free trial now. >>>> http://p.sf.net/sfu/quest-d2dcopy1 >>>> _______________________________________________ >>>> Pytables-users mailing list >>>> Pyt...@li... >>>> https://lists.sourceforge.net/lists/listinfo/pytables-users >>>> >>>> >>> >>> >>> ------------------------------------------------------------------------------ >>> vRanger cuts backup time in half-while increasing security. >>> With the market-leading solution for virtual backup and recovery, >>> you get blazing-fast, flexible, and affordable data protection. >>> Download your free trial now. >>> http://p.sf.net/sfu/quest-d2dcopy1 >>> _______________________________________________ >>> Pytables-users mailing list >>> Pyt...@li... >>> https://lists.sourceforge.net/lists/listinfo/pytables-users >>> >>> >> >> >> ------------------------------------------------------------------------------ >> vRanger cuts backup time in half-while increasing security. >> With the market-leading solution for virtual backup and recovery, >> you get blazing-fast, flexible, and affordable data protection. >> Download your free trial now. >> http://p.sf.net/sfu/quest-d2dcopy1 >> _______________________________________________ >> Pytables-users mailing list >> Pyt...@li... >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> >> > > > ------------------------------------------------------------------------------ > vRanger cuts backup time in half-while increasing security. > With the market-leading solution for virtual backup and recovery, > you get blazing-fast, flexible, and affordable data protection. > Download your free trial now. > http://p.sf.net/sfu/quest-d2dcopy1 > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-users > > |