From: Ben E. <bj...@ai...> - 2012-11-02 20:22:42
|
My reading of the PyTables FAQ is that concurrent read access should be safe with PyTables. However, when using a pool of worker processes to read different parts of a large blosc-compressed CArray, I see: HDF5-DIAG: Error detected in HDF5 (1.8.8) thread 140476163647232: #000: ../../../src/H5Dio.c line 174 in H5Dread(): can't read data major: Dataset minor: Read failed #001: ../../../src/H5Dio.c line 448 in H5D_read(): can't read data major: Dataset minor: Read failed etc. Am I misunderstanding something? Thanks, Ben |
From: Francesc A. <fa...@gm...> - 2012-11-02 20:41:26
|
On 11/2/12 4:22 PM, Ben Elliston wrote: > My reading of the PyTables FAQ is that concurrent read access should > be safe with PyTables. However, when using a pool of worker processes > to read different parts of a large blosc-compressed CArray, I see: > > HDF5-DIAG: Error detected in HDF5 (1.8.8) thread 140476163647232: > #000: ../../../src/H5Dio.c line 174 in H5Dread(): can't read data > major: Dataset > minor: Read failed > #001: ../../../src/H5Dio.c line 448 in H5D_read(): can't read data > major: Dataset > minor: Read failed > etc. Hmm, now that I think, Blosc is not thread safe, and that can bring these sort of problems if you use it from several threads (but it should be safe when using several *processes*). In case your worker processes are threads, then it might help to deactivate threading in Blosc by setting the MAX_BLOSC_THREADS parameter: http://pytables.github.com/usersguide/parameter_files.html?#tables.parameters.MAX_BLOSC_THREADS to 1. HTH, -- Francesc Alted |
From: Ben E. <bj...@ai...> - 2012-11-02 20:49:35
Attachments:
signature.asc
|
Hi Francesc > Hmm, now that I think, Blosc is not thread safe, and that can bring > these sort of problems if you use it from several threads (but it > should be safe when using several *processes*). I am using multiprocessing.Pool, like so: if __name__ == '__main__': pool = Pool(processes=2) # start 2 worker processes items = load_items () pool.map (process_items, items) Ben |
From: Francesc A. <fa...@gm...> - 2012-11-02 20:57:03
|
On 11/2/12 4:49 PM, Ben Elliston wrote: > Hi Francesc > >> Hmm, now that I think, Blosc is not thread safe, and that can bring >> these sort of problems if you use it from several threads (but it >> should be safe when using several *processes*). > I am using multiprocessing.Pool, like so: > > if __name__ == '__main__': > pool = Pool(processes=2) # start 2 worker processes > items = load_items () > pool.map (process_items, items) > Hmm, that's strange. Using lzo or zlib works for you? -- Francesc Alted |
From: Ben E. <bj...@ai...> - 2012-11-02 21:19:50
Attachments:
signature.asc
|
On Fri, Nov 02, 2012 at 04:56:55PM -0400, Francesc Alted wrote: > Hmm, that's strange. Using lzo or zlib works for you? Well, it seems that switching compression algorithms could be a nightmare (or can I do this with ptrepack?). However, I may have a workaround: I now open the HDF5 file with tables.openFile at the start of each process rather than inherit the file descriptor from the parent. That works, since it's just concurrent file I/O on the same read-only file, and the start-up overhead is acceptable in this case. Happy to try lzo or zlib, though, if you like. Cheers, Ben |
From: Francesc A. <fa...@gm...> - 2012-11-02 21:30:10
|
On 11/2/12 5:19 PM, Ben Elliston wrote: > On Fri, Nov 02, 2012 at 04:56:55PM -0400, Francesc Alted wrote: > >> Hmm, that's strange. Using lzo or zlib works for you? > Well, it seems that switching compression algorithms could be a > nightmare (or can I do this with ptrepack?). Yes, ptrepack can do that very easily. > However, I may have a > workaround: I now open the HDF5 file with tables.openFile at the start > of each process rather than inherit the file descriptor from the > parent. That works, since it's just concurrent file I/O on the same > read-only file, and the start-up overhead is acceptable in this case. Mmh, I think that makes sense. I think the problem before was that you was sharing the same file description with different processes, and hence you ended with sync problems. Having different descriptors for different processes is definitely the way to go. > > Happy to try lzo or zlib, though, if you like. Provided the above, I don't think you need to (I mean, I'd say that lzo and zlib would have exactly the same problem). -- Francesc Alted |
From: Owen M. <owe...@bc...> - 2012-11-03 16:04:25
|
If you're reading the data out of the file from inside a generator (ie - if load_items returns a generator that accesses the HDF5 file) then as the worker processes consume the work items the file is actually being opened and read from a worker thread in the master process. Regards, Owen On 2 November 2012 21:49, Ben Elliston <bj...@ai...> wrote: > Hi Francesc > > > Hmm, now that I think, Blosc is not thread safe, and that can bring > > these sort of problems if you use it from several threads (but it > > should be safe when using several *processes*). > > I am using multiprocessing.Pool, like so: > > if __name__ == '__main__': > pool = Pool(processes=2) # start 2 worker processes > items = load_items () > pool.map (process_items, items) > > Ben > > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.4.10 (GNU/Linux) > > iQIVAwUBUJQx0+ZqTimv57y9AQigFQ//XWpDoIve2PDag4SG/JBu6Y4D5X8pZcfA > froFDmju5dGtlCaRKUc/puioFRukyD2s5QCV9hvYIGQ2EYkts1eKLQnYRvcV/37D > J6QVEGUcuqfdj6lnGZSiDHr24rCeT3oGozbYO0/6casyV4iIuRjOzghWgKjV7ko1 > N/dy2UGK0S1S2Ws/OnkzDlbXiZShHfjLw3au2TtCdXaPcA4X1aMs1qLzEzvd+gJb > MLHy3MVtGCxjtnB3Vzi/2UmgMfB6hFuGugD2Yp2i0SxFIlTS7cIXYB2beL+x+y4Q > MtcZZO7QOvTvoExRje0BWR0e0BAOumKACKiU7Uq8L0+rwT6NWPfOVfe6KhieHvi/ > bh8oiNl2tekB+UE5JQ6Yi13YwfReyA1M8RFRsrQ3fXCWaQ6+Hx3m8+t/q4bfDhy4 > wFrC4N3hIkqMNI589aju8vVWerSdKDrzqFjcLBp8zfY718qm0ulGz0bBsKNPzr8y > 7AN+/5StSos29oBTv3p3s7YDLjXoDi4XwZPR0aMt9pO2JrPxVUda+oi8/1AT46RS > QoyFGmkqotucR22rWXbv9wZdHm9SsxGZakNpeQYxN9uf2aaqP0eodbgmSPx1ST8P > 8vU+uGqrCb564KW5eJmQCgNZMXo1uCAqCdF5LZT+h1ncWy9//bv9gW2mityDNs6j > YOqKMSOj9e0= > =ZvxS > -----END PGP SIGNATURE----- > > > ------------------------------------------------------------------------------ > LogMeIn Central: Instant, anywhere, Remote PC access and management. > Stay in control, update software, and manage PCs from one command center > Diagnose problems and improve visibility into emerging IT issues > Automate, monitor and manage. Do more in less time with Central > http://p.sf.net/sfu/logmein12331_d2d > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-users > > |
From: Jim K. <jim...@sp...> - 2012-11-09 04:04:37
|
I would like to squeeze out as much compression as I can get. I do not mind spending time on the front end as long as I do not kill my read performance. Seems like 7Zip is well suited to my data. Is it possible to have 7Zip used as the native internal compression for a pytable? If not now hard would it be to add this option? ________________________________ Jim Knoll Data Developer Spot Trading L.L.C 440 South LaSalle St., Suite 2800 Chicago, IL 60605 Office: 312.362.4550 Direct: 312-362-4798 Fax: 312.362.4551 jim...@sp... www.spottradingllc.com<http://www.spottradingllc.com/> ________________________________ The information contained in this message may be privileged and confidential and protected from disclosure. If the reader of this message is not the intended recipient, or an employee or agent responsible for delivering this message to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this communication in error, please notify us immediately by replying to the message and deleting it from your computer. Thank you. Spot Trading, LLC |
From: Anthony S. <sc...@gm...> - 2012-11-09 05:57:24
|
Hello Jim, The major hurdle here is exposing 7Zip to HDF5. Luckily it appears as if this may have been taken care of for you by the HDF-group already [1]. You should google around to see what has already been done and how hard it is to install. The next step is to expose this as a compression option for filters [2]. I am fairly certain that this is just a matter of adding a simple flag and making sure 7Zip works if available. This should not be too difficult at all and we would happily consider/review any pull request that implemented this. Barring any major concerns, I feel that it would likely be accepted. Be Well Anthony 1. http://www.hdfgroup.org/ftp/HDF5/releases/hdf5-1.6/hdf5-1.6.7/src/unpacked/release_docs/INSTALL_Windows_From_Command_Line.txt 2. http://pytables.github.com/usersguide/libref/helper_classes.html#the-filters-class On Thu, Nov 8, 2012 at 9:52 PM, Jim Knoll <jim...@sp...>wrote: > I would like to squeeze out as much compression as I can get. I do not > mind spending time on the front end as long as I do not kill my read > performance. Seems like 7Zip is well suited to my data. Is it possible to > have 7Zip used as the native internal compression for a pytable?**** > > ** ** > > If not now hard would it be to add this option?**** > > > ------------------------------ > > * Jim Knoll* * > **Data Developer* > > Spot Trading L.L.C > 440 South LaSalle St., Suite 2800 > Chicago, IL 60605 > Office: 312.362.4550 > Direct: 312-362-4798 > Fax: 312.362.4551 > jim...@sp... > www.spottradingllc.com > ------------------------------ > > The information contained in this message may be privileged and > confidential and protected from disclosure. If the reader of this message > is not the intended recipient, or an employee or agent responsible for > delivering this message to the intended recipient, you are hereby notified > that any dissemination, distribution or copying of this communication is > strictly prohibited. If you have received this communication in error, > please notify us immediately by replying to the message and deleting it > from your computer. Thank you. Spot Trading, LLC > > > > > ------------------------------------------------------------------------------ > Everyone hates slow websites. So do we. > Make your web apps faster with AppDynamics > Download AppDynamics Lite for free today: > http://p.sf.net/sfu/appdyn_d2d_nov > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-users > > |