From: Lukas S. <luk...@gm...> - 2012-11-18 08:10:42
|
2012. 11. 17. 오후 12:46에 <pyt...@li...>님이 작성: > > Send Pytables-users mailing list submissions to > pyt...@li... > > To subscribe or unsubscribe via the World Wide Web, visit > https://lists.sourceforge.net/lists/listinfo/pytables-users > or, via email, send a message with subject or body 'help' to > pyt...@li... > > You can reach the person managing the list at > pyt...@li... > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of Pytables-users digest..." > > > Today's Topics: > > 1. Re: pyTable index from c++ (Jim Knoll) > 2. Store a reference to a dataset (Juan Manuel V?zquez Tovar) > 3. Histogramming 1000x too slow (Jon Wilson) > 4. Re: Histogramming 1000x too slow (Anthony Scopatz) > 5. Re: Histogramming 1000x too slow (Jon Wilson) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Fri, 9 Nov 2012 15:26:38 -0600 > From: Jim Knoll <jim...@sp...> > Subject: Re: [Pytables-users] pyTable index from c++ > To: 'Discussion list for PyTables' > <pyt...@li...> > Message-ID: > < 142...@SP...> > Content-Type: text/plain; charset="us-ascii" > > Thanks for taking the time. > > Most of our tables are very wide lots of col.... and simple conditions are common.... so that is why in-kernel makes almost no impact for me. > > -----Original Message----- > From: Francesc Alted [mailto:fa...@gm...] > Sent: Friday, November 09, 2012 11:27 AM > To: pyt...@li... > Subject: Re: [Pytables-users] pyTable index from c++ > > Well, expected performance of in-kernel (numexpr powered) queries wrt > regular (python) queries largely depends on where the bottleneck is. If > your table has a lot of columns, then the bottleneck is going to be more > on the I/O side, so you cannot expect a large difference in performance. > However, if your table has a small number of columns, then there is more > likelihood that bottleneck is CPU, and your chances to experiment a > difference are higher. > > Of course, having complex queries (i.e. queries that take conditions > over several columns, or just combinations of conditions in the same > column) makes the query more CPU intensive, and in-kernel normally wins > by a comfortable margin. > > Finally, what indexing is doing is to reduce the number of rows where > the conditions have to be evaluated, so depending on the cardinality of > the query and the associated index, you can get more or less speedup. > > Francesc > > On 11/9/12 5:12 PM, Jim Knoll wrote: > > > > Thanks for the reply. I will put some investigation of C++ access on > > my list for items to look at over the slow holiday season. > > > > For the short term we will store a C++ ready index as a different > > table object in the same h5 file. It will work... just a bit of a waste > > on disk space. > > > > One follow up question > > > > Why would my performance of > > > > for row in node.where('stringField == "SomeString"'): > > > > *not*be noticeably faster than > > > > for row in node: > > > > if row.stringField == "SomeString" : > > > > Specifically when there is no index. I understand and see the speed > > improvement only when I have a index. I expected to see some benefit > > from numexpr even with no index. I expected node.where() to be much > > faster. What I see is identical performance. Is numexpr benefit only > > seen for complex math like (floatField ** intField > otherFloatField) > > I did not see that to be the case on my first attempt.... Seems that I > > only benefit from a index. > > > > *From:*Anthony Scopatz [mailto:sc...@gm...] > > *Sent:* Friday, November 09, 2012 12:24 AM > > *To:* Discussion list for PyTables > > *Subject:* Re: [Pytables-users] pyTable index from c++ > > > > On Thu, Nov 8, 2012 at 10:19 PM, Jim Knoll > > <jim...@sp... <mailto:jim...@sp...>> > > wrote: > > > > I love the index function and promote the internal use of PyTables at > > my company. The availability of a indexed method to speed the search > > is the main reason why. > > > > We are a mixed shop using c++ to create H5 (just for the raw speed ... > > need to keep up with streaming data) End users start with python > > pyTables to consume the data. (Often after we have created indexes > > from python pytables.col.col1.createIndex()) > > > > Sometimes the users come up with something we want to do thousands of > > times and performance is critical. But then we are falling back to c++ > > We can use our own index method but would like to make dbl use of the > > PyTables index. > > > > I know the python table.where( is implemented in C. > > > > Hi Jim, > > > > This is only kind of true. Querying (ie all of the where*() methods) > > are actually mostly written in Python in the tables.py and > > expressions.py files. However, they make use of numexpr [1]. > > > > Is there a way to access that from c or c++? Don't mind if I need > > to do work to get the result I think in my case the work may be > > worth it. > > > > *PLAN 1:* One possibility is that the parts of PyTables are written in > > Cython. We could maybe try (without making any edits to these files) > > to convert them to Cython. This has the advantage that for Cython > > files, if you write the appropriate C++ header file and link against > > the shared library correctly, it is possible to access certain > > functions from C/C++. BUT, I am not sure how much of speed boost you > > would get out of this since you would still be calling out to the > > Python interpreter to get these result. You are just calling Python's > > virtual machine from C++ rather than calling it from Python (like > > normal). This has the advantage that you would basically get access to > > these functions acting on tables from C++. > > > > *PLAN 2:* Alternatively, numexpr itself is mostly written in C++ > > already. You should be able to call core numexpr functions directly. > > However, you would have to feed it data that you read from the tables > > yourself. These could even be table indexes. On a personal note, if > > you get code working that does this, I would be interested in seeing > > your implementation. (I have another project where I have tables that > > I want to query from C++) > > > > Let us know what route you ultimately end up taking or if you have any > > further questions! > > > > Be Well > > > > Anthony > > > > 1. http://code.google.com/p/numexpr/source/browse/#hg%2Fnumexpr > > > > ------------------------------------------------------------------------ > > > > *Jim Knoll** > > *Data Developer** > > > > Spot Trading L.L.C > > 440 South LaSalle St., Suite 2800 > > Chicago, IL 60605 > > Office: 312.362.4550 <tel:312.362.4550> > > Direct: 312-362-4798 <tel:312-362-4798> > > Fax: 312.362.4551 <tel:312.362.4551> > > jim...@sp... <mailto:jim...@sp...> > > www.spottradingllc.com <http://www.spottradingllc.com/> > > > > ------------------------------------------------------------------------ > > > > The information contained in this message may be privileged and > > confidential and protected from disclosure. If the reader of this > > message is not the intended recipient, or an employee or agent > > responsible for delivering this message to the intended recipient, > > you are hereby notified that any dissemination, distribution or > > copying of this communication is strictly prohibited. If you have > > received this communication in error, please notify us immediately > > by replying to the message and deleting it from your computer. > > Thank you. Spot Trading, LLC > > > > > > ------------------------------------------------------------------------------ > > Everyone hates slow websites. So do we. > > Make your web apps faster with AppDynamics > > Download AppDynamics Lite for free today: > > http://p.sf.net/sfu/appdyn_d2d_nov > > _______________________________________________ > > Pytables-users mailing list > > Pyt...@li... > > <mailto:Pyt...@li...> > > https://lists.sourceforge.net/lists/listinfo/pytables-users > > > > > > > > ------------------------------------------------------------------------------ > > Everyone hates slow websites. So do we. > > Make your web apps faster with AppDynamics > > Download AppDynamics Lite for free today: > > http://p.sf.net/sfu/appdyn_d2d_nov > > > > > > _______________________________________________ > > Pytables-users mailing list > > Pyt...@li... > > https://lists.sourceforge.net/lists/listinfo/pytables-users > > > -- > Francesc Alted > > > ------------------------------------------------------------------------------ > Everyone hates slow websites. So do we. > Make your web apps faster with AppDynamics > Download AppDynamics Lite for free today: > http://p.sf.net/sfu/appdyn_d2d_nov > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-users > > > > ------------------------------ > > Message: 2 > Date: Sun, 11 Nov 2012 01:39:33 +0100 > From: Juan Manuel V?zquez Tovar <jmv...@gm...> > Subject: [Pytables-users] Store a reference to a dataset > To: Discussion list for PyTables > <pyt...@li...> > Message-ID: > < CAD...@ma...> > Content-Type: text/plain; charset="iso-8859-1" > > Hello, > > I have to deal in pytables with a very large dataset. The file already > compressed with blosc5 is a |