From: Aquil H. A. <aqu...@gm...> - 2012-11-08 16:02:36
|
I create the tables in an HDF5 file from three different python processes. I needed to modify one of the processes, but not the others. Is there an easy way to copy the two tables that did not change to the new file? -- Aquil H. Abdullah "I never think of the future. It comes soon enough" - Albert Einstein |
From: Anthony S. <sc...@gm...> - 2012-11-08 16:19:58
|
Hey Aquil, I think File.copyNode() [1] with the newparent argument as group on another file will do what you want. Be Well Anthony 1. http://pytables.github.com/usersguide/libref/file_class.html?highlight=copy#tables.File.copyNode On Thu, Nov 8, 2012 at 10:02 AM, Aquil H. Abdullah <aqu...@gm... > wrote: > I create the tables in an HDF5 file from three different python processes. > I needed to modify one of the processes, but not the others. Is there an > easy way to copy the two tables that did not change to the new file? > > -- > Aquil H. Abdullah > "I never think of the future. It comes soon enough" - Albert Einstein > > > > ------------------------------------------------------------------------------ > Everyone hates slow websites. So do we. > Make your web apps faster with AppDynamics > Download AppDynamics Lite for free today: > http://p.sf.net/sfu/appdyn_d2d_nov > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-users > > |
From: Aquil H. A. <aqu...@gm...> - 2012-11-08 16:58:21
|
Thanks Anthony, This also did the trick: import tables h5f_in = tables.open('CO.h5) tbl_in = h5f_in.root.CO.DATA h5f_out = tables.openFile('test.h5', 'w') g = h5f_out.createGroup('/','CO') ot = tbl.copy(newparent=g) -- Aquil H. Abdullah "I never think of the future. It comes soon enough" - Albert Einstein On Thursday, November 8, 2012 at 11:19 AM, Anthony Scopatz wrote: > Hey Aquil, > > I think File.copyNode() [1] with the newparent argument as group on another file will do what you want. > > Be Well > Anthony > > 1. http://pytables.github.com/usersguide/libref/file_class.html?highlight=copy#tables.File.copyNode > > > On Thu, Nov 8, 2012 at 10:02 AM, Aquil H. Abdullah <aqu...@gm... (mailto:aqu...@gm...)> wrote: > > I create the tables in an HDF5 file from three different python processes. I needed to modify one of the processes, but not the others. Is there an easy way to copy the two tables that did not change to the new file? > > > > -- > > Aquil H. Abdullah > > "I never think of the future. It comes soon enough" - Albert Einstein > > > > > > ------------------------------------------------------------------------------ > > Everyone hates slow websites. So do we. > > Make your web apps faster with AppDynamics > > Download AppDynamics Lite for free today: > > http://p.sf.net/sfu/appdyn_d2d_nov > > _______________________________________________ > > Pytables-users mailing list > > Pyt...@li... (mailto:Pyt...@li...) > > https://lists.sourceforge.net/lists/listinfo/pytables-users > > > > ------------------------------------------------------------------------------ > Everyone hates slow websites. So do we. > Make your web apps faster with AppDynamics > Download AppDynamics Lite for free today: > http://p.sf.net/sfu/appdyn_d2d_nov > > _______________________________________________ > Pytables-users mailing list > Pyt...@li... (mailto:Pyt...@li...) > https://lists.sourceforge.net/lists/listinfo/pytables-users > > |
From: Jim K. <jim...@sp...> - 2012-11-09 04:19:40
|
I love the index function and promote the internal use of PyTables at my company. The availability of a indexed method to speed the search is the main reason why. We are a mixed shop using c++ to create H5 (just for the raw speed … need to keep up with streaming data) End users start with python pyTables to consume the data. (Often after we have created indexes from python pytables.col.col1.createIndex()) Sometimes the users come up with something we want to do thousands of times and performance is critical. But then we are falling back to c++ We can use our own index method but would like to make dbl use of the PyTables index. I know the python table.where( is implemented in C. Is there a way to access that from c or c++? Don’t mind if I need to do work to get the result I think in my case the work may be worth it. ________________________________ Jim Knoll Data Developer Spot Trading L.L.C 440 South LaSalle St., Suite 2800 Chicago, IL 60605 Office: 312.362.4550 Direct: 312-362-4798 Fax: 312.362.4551 jim...@sp... www.spottradingllc.com<http://www.spottradingllc.com/> ________________________________ The information contained in this message may be privileged and confidential and protected from disclosure. If the reader of this message is not the intended recipient, or an employee or agent responsible for delivering this message to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this communication in error, please notify us immediately by replying to the message and deleting it from your computer. Thank you. Spot Trading, LLC |
From: Anthony S. <sc...@gm...> - 2012-11-09 06:24:36
|
On Thu, Nov 8, 2012 at 10:19 PM, Jim Knoll <jim...@sp...>wrote: > I love the index function and promote the internal use of PyTables at > my company. The availability of a indexed method to speed the search is > the main reason why.**** > > ** ** > > We are a mixed shop using c++ to create H5 (just for the raw speed … need > to keep up with streaming data) End users start with python pyTables to > consume the data. (Often after we have created indexes from python > pytables.col.col1.createIndex()) **** > > ** ** > > Sometimes the users come up with something we want to do thousands of > times and performance is critical. But then we are falling back to c++ We > can use our own index method but would like to make dbl use of the PyTables > index. **** > > ** ** > > I know the python table.where( is implemented in C. > Hi Jim, This is only kind of true. Querying (ie all of the where*() methods) are actually mostly written in Python in the tables.py and expressions.py files. However, they make use of numexpr [1]. > **** > > ** Is there a way to access that from c or c++? Don’t mind if I need > to do work to get the result I think in my case the work may be worth it. > *PLAN 1:* One possibility is that the parts of PyTables are written in Cython. We could maybe try (without making any edits to these files) to convert them to Cython. This has the advantage that for Cython files, if you write the appropriate C++ header file and link against the shared library correctly, it is possible to access certain functions from C/C++. BUT, I am not sure how much of speed boost you would get out of this since you would still be calling out to the Python interpreter to get these result. You are just calling Python's virtual machine from C++ rather than calling it from Python (like normal). This has the advantage that you would basically get access to these functions acting on tables from C++. *PLAN 2:* Alternatively, numexpr itself is mostly written in C++ already. You should be able to call core numexpr functions directly. However, you would have to feed it data that you read from the tables yourself. These could even be table indexes. On a personal note, if you get code working that does this, I would be interested in seeing your implementation. (I have another project where I have tables that I want to query from C++) Let us know what route you ultimately end up taking or if you have any further questions! Be Well Anthony 1. http://code.google.com/p/numexpr/source/browse/#hg%2Fnumexpr > > > ------------------------------ > > * Jim Knoll* * > **Data Developer* > > Spot Trading L.L.C > 440 South LaSalle St., Suite 2800 > Chicago, IL 60605 > Office: 312.362.4550 > Direct: 312-362-4798 > Fax: 312.362.4551 > jim...@sp... > www.spottradingllc.com > ------------------------------ > > The information contained in this message may be privileged and > confidential and protected from disclosure. If the reader of this message > is not the intended recipient, or an employee or agent responsible for > delivering this message to the intended recipient, you are hereby notified > that any dissemination, distribution or copying of this communication is > strictly prohibited. If you have received this communication in error, > please notify us immediately by replying to the message and deleting it > from your computer. Thank you. Spot Trading, LLC > > > > > ------------------------------------------------------------------------------ > Everyone hates slow websites. So do we. > Make your web apps faster with AppDynamics > Download AppDynamics Lite for free today: > http://p.sf.net/sfu/appdyn_d2d_nov > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-users > > |
From: Jim K. <jim...@sp...> - 2012-11-09 16:12:56
|
Thanks for the reply. I will put some investigation of C++ access on my list for items to look at over the slow holiday season. For the short term we will store a C++ ready index as a different table object in the same h5 file. It will work... just a bit of a waste on disk space. One follow up question Why would my performance of for row in node.where('stringField == "SomeString"'): not be noticeably faster than for row in node: if row.stringField == "SomeString" : Specifically when there is no index. I understand and see the speed improvement only when I have a index. I expected to see some benefit from numexpr even with no index. I expected node.where() to be much faster. What I see is identical performance. Is numexpr benefit only seen for complex math like (floatField ** intField > otherFloatField) I did not see that to be the case on my first attempt.... Seems that I only benefit from a index. From: Anthony Scopatz [mailto:sc...@gm...] Sent: Friday, November 09, 2012 12:24 AM To: Discussion list for PyTables Subject: Re: [Pytables-users] pyTable index from c++ On Thu, Nov 8, 2012 at 10:19 PM, Jim Knoll <jim...@sp...<mailto:jim...@sp...>> wrote: I love the index function and promote the internal use of PyTables at my company. The availability of a indexed method to speed the search is the main reason why. We are a mixed shop using c++ to create H5 (just for the raw speed ... need to keep up with streaming data) End users start with python pyTables to consume the data. (Often after we have created indexes from python pytables.col.col1.createIndex()) Sometimes the users come up with something we want to do thousands of times and performance is critical. But then we are falling back to c++ We can use our own index method but would like to make dbl use of the PyTables index. I know the python table.where( is implemented in C. Hi Jim, This is only kind of true. Querying (ie all of the where*() methods) are actually mostly written in Python in the tables.py and expressions.py files. However, they make use of numexpr [1]. Is there a way to access that from c or c++? Don't mind if I need to do work to get the result I think in my case the work may be worth it. PLAN 1: One possibility is that the parts of PyTables are written in Cython. We could maybe try (without making any edits to these files) to convert them to Cython. This has the advantage that for Cython files, if you write the appropriate C++ header file and link against the shared library correctly, it is possible to access certain functions from C/C++. BUT, I am not sure how much of speed boost you would get out of this since you would still be calling out to the Python interpreter to get these result. You are just calling Python's virtual machine from C++ rather than calling it from Python (like normal). This has the advantage that you would basically get access to these functions acting on tables from C++. PLAN 2: Alternatively, numexpr itself is mostly written in C++ already. You should be able to call core numexpr functions directly. However, you would have to feed it data that you read from the tables yourself. These could even be table indexes. On a personal note, if you get code working that does this, I would be interested in seeing your implementation. (I have another project where I have tables that I want to query from C++) Let us know what route you ultimately end up taking or if you have any further questions! Be Well Anthony 1. http://code.google.com/p/numexpr/source/browse/#hg%2Fnumexpr ________________________________ Jim Knoll Data Developer Spot Trading L.L.C 440 South LaSalle St., Suite 2800 Chicago, IL 60605 Office: 312.362.4550<tel:312.362.4550> Direct: 312-362-4798<tel:312-362-4798> Fax: 312.362.4551<tel:312.362.4551> jim...@sp...<mailto:jim...@sp...> www.spottradingllc.com<http://www.spottradingllc.com/> ________________________________ The information contained in this message may be privileged and confidential and protected from disclosure. If the reader of this message is not the intended recipient, or an employee or agent responsible for delivering this message to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this communication in error, please notify us immediately by replying to the message and deleting it from your computer. Thank you. Spot Trading, LLC ------------------------------------------------------------------------------ Everyone hates slow websites. So do we. Make your web apps faster with AppDynamics Download AppDynamics Lite for free today: http://p.sf.net/sfu/appdyn_d2d_nov _______________________________________________ Pytables-users mailing list Pyt...@li...<mailto:Pyt...@li...> https://lists.sourceforge.net/lists/listinfo/pytables-users |
From: Francesc A. <fa...@gm...> - 2012-11-09 17:27:14
|
Well, expected performance of in-kernel (numexpr powered) queries wrt regular (python) queries largely depends on where the bottleneck is. If your table has a lot of columns, then the bottleneck is going to be more on the I/O side, so you cannot expect a large difference in performance. However, if your table has a small number of columns, then there is more likelihood that bottleneck is CPU, and your chances to experiment a difference are higher. Of course, having complex queries (i.e. queries that take conditions over several columns, or just combinations of conditions in the same column) makes the query more CPU intensive, and in-kernel normally wins by a comfortable margin. Finally, what indexing is doing is to reduce the number of rows where the conditions have to be evaluated, so depending on the cardinality of the query and the associated index, you can get more or less speedup. Francesc On 11/9/12 5:12 PM, Jim Knoll wrote: > > Thanks for the reply. I will put some investigation of C++ access on > my list for items to look at over the slow holiday season. > > For the short term we will store a C++ ready index as a different > table object in the same h5 file. It will work… just a bit of a waste > on disk space. > > One follow up question > > Why would my performance of > > for row in node.where('stringField == "SomeString"'): > > *not*be noticeably faster than > > for row in node: > > if row.stringField == "SomeString" : > > Specifically when there is no index. I understand and see the speed > improvement only when I have a index. I expected to see some benefit > from numexpr even with no index. I expected node.where() to be much > faster. What I see is identical performance. Is numexpr benefit only > seen for complex math like (floatField ** intField > otherFloatField) > I did not see that to be the case on my first attempt…. Seems that I > only benefit from a index. > > *From:*Anthony Scopatz [mailto:sc...@gm...] > *Sent:* Friday, November 09, 2012 12:24 AM > *To:* Discussion list for PyTables > *Subject:* Re: [Pytables-users] pyTable index from c++ > > On Thu, Nov 8, 2012 at 10:19 PM, Jim Knoll > <jim...@sp... <mailto:jim...@sp...>> > wrote: > > I love the index function and promote the internal use of PyTables at > my company. The availability of a indexed method to speed the search > is the main reason why. > > We are a mixed shop using c++ to create H5 (just for the raw speed … > need to keep up with streaming data) End users start with python > pyTables to consume the data. (Often after we have created indexes > from python pytables.col.col1.createIndex()) > > Sometimes the users come up with something we want to do thousands of > times and performance is critical. But then we are falling back to c++ > We can use our own index method but would like to make dbl use of the > PyTables index. > > I know the python table.where( is implemented in C. > > Hi Jim, > > This is only kind of true. Querying (ie all of the where*() methods) > are actually mostly written in Python in the tables.py and > expressions.py files. However, they make use of numexpr [1]. > > Is there a way to access that from c or c++? Don’t mind if I need > to do work to get the result I think in my case the work may be > worth it. > > *PLAN 1:* One possibility is that the parts of PyTables are written in > Cython. We could maybe try (without making any edits to these files) > to convert them to Cython. This has the advantage that for Cython > files, if you write the appropriate C++ header file and link against > the shared library correctly, it is possible to access certain > functions from C/C++. BUT, I am not sure how much of speed boost you > would get out of this since you would still be calling out to the > Python interpreter to get these result. You are just calling Python's > virtual machine from C++ rather than calling it from Python (like > normal). This has the advantage that you would basically get access to > these functions acting on tables from C++. > > *PLAN 2:* Alternatively, numexpr itself is mostly written in C++ > already. You should be able to call core numexpr functions directly. > However, you would have to feed it data that you read from the tables > yourself. These could even be table indexes. On a personal note, if > you get code working that does this, I would be interested in seeing > your implementation. (I have another project where I have tables that > I want to query from C++) > > Let us know what route you ultimately end up taking or if you have any > further questions! > > Be Well > > Anthony > > 1. http://code.google.com/p/numexpr/source/browse/#hg%2Fnumexpr > > ------------------------------------------------------------------------ > > *Jim Knoll** > *Data Developer** > > Spot Trading L.L.C > 440 South LaSalle St., Suite 2800 > Chicago, IL 60605 > Office: 312.362.4550 <tel:312.362.4550> > Direct: 312-362-4798 <tel:312-362-4798> > Fax: 312.362.4551 <tel:312.362.4551> > jim...@sp... <mailto:jim...@sp...> > www.spottradingllc.com <http://www.spottradingllc.com/> > > ------------------------------------------------------------------------ > > The information contained in this message may be privileged and > confidential and protected from disclosure. If the reader of this > message is not the intended recipient, or an employee or agent > responsible for delivering this message to the intended recipient, > you are hereby notified that any dissemination, distribution or > copying of this communication is strictly prohibited. If you have > received this communication in error, please notify us immediately > by replying to the message and deleting it from your computer. > Thank you. Spot Trading, LLC > > > ------------------------------------------------------------------------------ > Everyone hates slow websites. So do we. > Make your web apps faster with AppDynamics > Download AppDynamics Lite for free today: > http://p.sf.net/sfu/appdyn_d2d_nov > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > <mailto:Pyt...@li...> > https://lists.sourceforge.net/lists/listinfo/pytables-users > > > > ------------------------------------------------------------------------------ > Everyone hates slow websites. So do we. > Make your web apps faster with AppDynamics > Download AppDynamics Lite for free today: > http://p.sf.net/sfu/appdyn_d2d_nov > > > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-users -- Francesc Alted |
From: Jim K. <jim...@sp...> - 2012-11-09 21:26:46
|
Thanks for taking the time. Most of our tables are very wide lots of col.... and simple conditions are common.... so that is why in-kernel makes almost no impact for me. -----Original Message----- From: Francesc Alted [mailto:fa...@gm...] Sent: Friday, November 09, 2012 11:27 AM To: pyt...@li... Subject: Re: [Pytables-users] pyTable index from c++ Well, expected performance of in-kernel (numexpr powered) queries wrt regular (python) queries largely depends on where the bottleneck is. If your table has a lot of columns, then the bottleneck is going to be more on the I/O side, so you cannot expect a large difference in performance. However, if your table has a small number of columns, then there is more likelihood that bottleneck is CPU, and your chances to experiment a difference are higher. Of course, having complex queries (i.e. queries that take conditions over several columns, or just combinations of conditions in the same column) makes the query more CPU intensive, and in-kernel normally wins by a comfortable margin. Finally, what indexing is doing is to reduce the number of rows where the conditions have to be evaluated, so depending on the cardinality of the query and the associated index, you can get more or less speedup. Francesc On 11/9/12 5:12 PM, Jim Knoll wrote: > > Thanks for the reply. I will put some investigation of C++ access on > my list for items to look at over the slow holiday season. > > For the short term we will store a C++ ready index as a different > table object in the same h5 file. It will work... just a bit of a waste > on disk space. > > One follow up question > > Why would my performance of > > for row in node.where('stringField == "SomeString"'): > > *not*be noticeably faster than > > for row in node: > > if row.stringField == "SomeString" : > > Specifically when there is no index. I understand and see the speed > improvement only when I have a index. I expected to see some benefit > from numexpr even with no index. I expected node.where() to be much > faster. What I see is identical performance. Is numexpr benefit only > seen for complex math like (floatField ** intField > otherFloatField) > I did not see that to be the case on my first attempt.... Seems that I > only benefit from a index. > > *From:*Anthony Scopatz [mailto:sc...@gm...] > *Sent:* Friday, November 09, 2012 12:24 AM > *To:* Discussion list for PyTables > *Subject:* Re: [Pytables-users] pyTable index from c++ > > On Thu, Nov 8, 2012 at 10:19 PM, Jim Knoll > <jim...@sp... <mailto:jim...@sp...>> > wrote: > > I love the index function and promote the internal use of PyTables at > my company. The availability of a indexed method to speed the search > is the main reason why. > > We are a mixed shop using c++ to create H5 (just for the raw speed ... > need to keep up with streaming data) End users start with python > pyTables to consume the data. (Often after we have created indexes > from python pytables.col.col1.createIndex()) > > Sometimes the users come up with something we want to do thousands of > times and performance is critical. But then we are falling back to c++ > We can use our own index method but would like to make dbl use of the > PyTables index. > > I know the python table.where( is implemented in C. > > Hi Jim, > > This is only kind of true. Querying (ie all of the where*() methods) > are actually mostly written in Python in the tables.py and > expressions.py files. However, they make use of numexpr [1]. > > Is there a way to access that from c or c++? Don't mind if I need > to do work to get the result I think in my case the work may be > worth it. > > *PLAN 1:* One possibility is that the parts of PyTables are written in > Cython. We could maybe try (without making any edits to these files) > to convert them to Cython. This has the advantage that for Cython > files, if you write the appropriate C++ header file and link against > the shared library correctly, it is possible to access certain > functions from C/C++. BUT, I am not sure how much of speed boost you > would get out of this since you would still be calling out to the > Python interpreter to get these result. You are just calling Python's > virtual machine from C++ rather than calling it from Python (like > normal). This has the advantage that you would basically get access to > these functions acting on tables from C++. > > *PLAN 2:* Alternatively, numexpr itself is mostly written in C++ > already. You should be able to call core numexpr functions directly. > However, you would have to feed it data that you read from the tables > yourself. These could even be table indexes. On a personal note, if > you get code working that does this, I would be interested in seeing > your implementation. (I have another project where I have tables that > I want to query from C++) > > Let us know what route you ultimately end up taking or if you have any > further questions! > > Be Well > > Anthony > > 1. http://code.google.com/p/numexpr/source/browse/#hg%2Fnumexpr > > ------------------------------------------------------------------------ > > *Jim Knoll** > *Data Developer** > > Spot Trading L.L.C > 440 South LaSalle St., Suite 2800 > Chicago, IL 60605 > Office: 312.362.4550 <tel:312.362.4550> > Direct: 312-362-4798 <tel:312-362-4798> > Fax: 312.362.4551 <tel:312.362.4551> > jim...@sp... <mailto:jim...@sp...> > www.spottradingllc.com <http://www.spottradingllc.com/> > > ------------------------------------------------------------------------ > > The information contained in this message may be privileged and > confidential and protected from disclosure. If the reader of this > message is not the intended recipient, or an employee or agent > responsible for delivering this message to the intended recipient, > you are hereby notified that any dissemination, distribution or > copying of this communication is strictly prohibited. If you have > received this communication in error, please notify us immediately > by replying to the message and deleting it from your computer. > Thank you. Spot Trading, LLC > > > ------------------------------------------------------------------------------ > Everyone hates slow websites. So do we. > Make your web apps faster with AppDynamics > Download AppDynamics Lite for free today: > http://p.sf.net/sfu/appdyn_d2d_nov > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > <mailto:Pyt...@li...> > https://lists.sourceforge.net/lists/listinfo/pytables-users > > > > ------------------------------------------------------------------------------ > Everyone hates slow websites. So do we. > Make your web apps faster with AppDynamics > Download AppDynamics Lite for free today: > http://p.sf.net/sfu/appdyn_d2d_nov > > > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-users -- Francesc Alted ------------------------------------------------------------------------------ Everyone hates slow websites. So do we. Make your web apps faster with AppDynamics Download AppDynamics Lite for free today: http://p.sf.net/sfu/appdyn_d2d_nov _______________________________________________ Pytables-users mailing list Pyt...@li... https://lists.sourceforge.net/lists/listinfo/pytables-users |